7 Commits

Author SHA1 Message Date
Rene Fichtmueller
fd3476f5c4 fix(scraper): FiberMall URL schema + price parser + Flexoptix EUR comma bug
FiberMall:
- Correct /store-XXXXX-name.htm category URLs (was /c/xxx/ → HTTP 404)
- Parser: split on new_proList_mainListLi, price from data-price on
  currency_price span — fix 0.00 false-match from SKU variant items
- Also scrape SKU brand variant links from .sku_item divs
- Result: 3,410 prices now in DB (was 0)

Flexoptix:
- Fix extractPrice regex for EUR thousand-separator format
  (2,921.60 EUR was parsed as 2 EUR)
- Add OSFP224 / 1.6T search queries (4 new, form factor was missing)
- Fix O.138HG2.C.05 stale price 3009.60→2921.60 EUR

Schema: competitor_verified + competitor_verified_at columns
added via ALTER TABLE (were referenced in code but missing in DB)

CHANGELOG: added 6 entries for 2026-04-12
2026-04-12 04:26:35 +02:00
Rene Fichtmueller
5c9cf0e9b5 feat: blog engine v4 (reduction+style-lock passes) + flexoptix scraper fixes
Blog engine (fo-blog-pipeline.ts):
- Add STEP8b_REDUCTION: cuts article 25-35%, removes repeated concepts
- Add STEP8c_STYLE_LOCK: enforces tone consistency, fixes scope/OPM confusion,
  removes inline SKUs from article flow
- Add Gold Standard 3 to calibration (Style B troubleshooting example 2026-04-04)
- Pipeline now 12 steps (was 10), version bumped to v4-reduction-stylelock

blog.ts:
- Wire STEP8b and STEP8c into pipeline between Kill-AI-Tone and QA Check
- Update progress tracking to 12 total steps
- Update pipeline_version to 'v4-reduction-stylelock'

flexoptix-catalog.ts:
- Fix contentHash call: pass object directly, not JSON.stringify(object)

db.ts:
- price_verified=true set in content_hash early-return path (no new observation)
- image_verified=true auto-set in findOrCreateScrapedTransceiver on INSERT/UPDATE
2026-04-04 07:50:01 +02:00
Rene Fichtmueller
441d9721c9 fix: flexoptix catalog scraper — 1G SFP coverage + SKU suffix + pagination
- Add 1G SFP search queries ("1G SFP", "SFP LX", "SFP SX", "SFP ZX") — were completely missing
- Strip vendor-compat suffix from SKU (S.1303.10.DG:Sx → S.1303.10.DG) to match existing records
- Remove 200-product cap, use full API pagination (page >= 50 limit only)
- Result: FLEXOPTIX 1G SFP coverage 50% → 97%, overall price coverage 62% → 88%
2026-04-04 07:26:13 +02:00
Rene Fichtmueller
edc9311d7b feat: add proxy network, image backfill, and scraper improvements
- Add TIP Proxy Network (packages/proxy-agent): SOCKS5 proxy agent
  for residential IP bypass of CloudFront WAF blocks
- Add /api/proxy/* routes: node registration, heartbeat, load balancing
- Add image extraction to Flexoptix catalog scraper (GraphQL small_image)
- Add image extraction to Optcore scraper (Playwright gallery img)
- Fix Fluxlight price scraping (BigCommerce HTML structure: data-product-price-without-tax)
- Add SmartOptics scraper (8 DWDM/coherent products, og:image extraction)
- Fix findOrCreateScrapedTransceiver to update image_url for existing records
- Add image backfill script (backfill-images.ts): 178 Flexoptix images added
- Fix DB connection pool: max 5, idleTimeoutMillis 10s (was unlimited, caused >100 connections)
- Add proxy.ts utility for scraper proxy rotation
2026-04-03 21:13:03 +02:00
Rene Fichtmueller
5a0cbed5a2 feat: dashboard v2, blog expansion, market/cable MCP tools, switch asset scrapers, scraper utilities 2026-03-30 08:07:12 +02:00
Rene Fichtmueller
c6308e93c0 feat: massive scraper expansion + hype cycle engine + lifecycle prediction
New scrapers:
- GBICS.com (BigCommerce, GBP prices, 10 categories, 78 products)
- Juniper HCT (Next.js SSR parser, 475 transceivers with specs/EOL)
- SFPcables.com (Magento store, 16 categories, 78 products)
- Fluxlight (BigCommerce, 6 pages, 118 products)
- Champion ONE (compatible vendor scraper)

Scraper fixes:
- 10Gtek: rewritten to parse HTML spec tables (152 products)
- Flexoptix: fix price extraction from Magento Hyva HTML
- Register all scrapers in CLI (--gbics, --juniper, --sfpcables, etc.)

Hype Cycle Engine enhancements:
- Data-driven enrichment from scraped vendor/price data
- Revenue lifecycle prediction (peak year, decline, revenue index)
- Regional adoption model (NA, China, APAC, Europe, RoW with lag coefficients)
- New API endpoints: /enriched, /lifecycle, /regional/:tech

DB growth: 89 → 1,168 transceivers, 0 → 416 prices, 6 vendors
Qdrant: 1,162 products embedded with nomic-embed-text

Research: Norton-Bass model, standards-to-market timelines, hype signals
2026-03-28 02:30:19 +13:00
Rene Fichtmueller
d43b98e91b feat: add Flexoptix product catalog scraper, register in CLI
Scrapes flexoptix.net product catalog across 9 categories (SFP through OSFP).
Extracts product names, prices, form factors, reach, fiber type, wavelength.
CLI: --flexoptix flag, integrated into --all.
2026-03-28 01:02:34 +13:00