Rene Fichtmueller
6febb9c88e
refactor(prolabs): replace Playwright+Firefox with fetch-based catalog scraper
...
ProLabs uses B2B quote model - prices require reseller account and are
not shown publicly (schema.org always shows price=0.00). Fighting
CloudFront WAF with Firefox automation is pointless.
New approach:
- Sitemap-driven: downloads all 14 sitemaps to collect product URLs
- fetch-based: curl-compatible HTTP requests bypass CloudFront TLS detection
- catalog-only: writes part numbers + specs to transceivers table
- Rate-limited: 300ms between requests (~3 req/sec)
- No proxy needed: Pi nodes no longer consumed for ProLabs
2026-04-11 02:57:13 +02:00
Rene Fichtmueller
cf75eee8ad
feat: linecard system support, Cisco 8000 accuracy, price anomaly detection
...
API/finder:
- Add modular chassis support: sibling linecards fetched when is_linecard=true
- Add chassis linecards when system_type=modular
- Extend switch response: system_type, is_linecard, chassis_model, slot_type,
flexbox_compat_mode, flexbox_notes, description, switching_capacity_tbps,
total_ports, category, lifecycle_status, features, use_cases, linecards[]
API/transceivers:
- Filter price_observations with COALESCE(is_anomalous, false) = false
(direct prices + comparable market prices)
Scraper/db:
- Add PRICE_BOUNDS map (per form-factor min/max USD sanity bounds)
- Add isPriceAnomalous() — marks DB price_observations as is_anomalous=true
- Add competitor_verified flag: set true when valid competitor price stored
- upsertPriceObservation: skip prices outside sanity bounds, set competitor_verified
Scraper/hash:
- contentHash() now accepts Record<string,unknown> | string (union type)
to support both structured objects and legacy string callers
Scrapers (skylane, tscom, wiitek):
- Fix contentHash() call signature: pass objects not JSON.stringify strings
- Fix wiitek: remove invalid 'name' param, fix t.id → transceiverId
Migrations:
- Add is_anomalous, competitor_verified, competitor_verified_at,
image_primary columns
- Recreate sync_fully_verified trigger to include competitor_verified
- Add is_linecard, chassis_model, system_type, slot_type,
flexbox_compat_mode, flexbox_notes to switches table
2026-04-09 09:06:22 +02:00
Rene Fichtmueller
240e7f46f2
feat(scraper): add SOCKS5 proxy rotation for fs-com, atgbics, gbics scrapers
...
Routes requests through CT130/131/132 proxy pool (192.168.178.77/76/74:1080)
when PROXY_URLS env var is set. Uses ProxyConfiguration from crawlee for
PlaywrightCrawler scrapers and socks-proxy-agent for fetch-based scrapers.
2026-04-08 08:17:49 +02:00
Rene Fichtmueller
51af249361
Merge remote-tracking branch 'github/main'
...
# Conflicts:
# packages/api/src/llm/fo-blog-pipeline.ts
# packages/api/src/routes/blog.ts
# packages/scraper/src/scheduler.ts
# packages/scraper/src/scrapers/fs-com.ts
# packages/scraper/src/scrapers/gbics.ts
2026-04-06 18:03:36 +02:00
Rene Fichtmueller
2e852e0a2f
fix(scrapers): replace bot User-Agents with Chrome UA + disable dead domain
...
- 16 commercial scrapers: replace TIP-Bot/1.0 with Chrome/120 UA
(GBICS confirmed returning 0 bytes for bot UA, Chrome UA returns 200KB)
- gbics.ts: fix User-Agent (was returning empty HTML, now returns products)
- optictransceiver.ts: disable — domain repurposed as plant shop (2026-04-06)
Alocasia Regal Shield is not a transceiver.
2026-04-06 02:17:50 +02:00
Rene Fichtmueller
dfe86fb347
fix(scraper): switch fs-com to de.fs.com for EUR prices as primary source
...
EUR prices scraped verbatim from de.fs.com — no conversion needed.
USD derivation (EUR→USD) happens downstream, not EUR←USD.
Fixes price discrepancy: TIP showed USD 999×0.92=EUR 866 vs real €948 on de.fs.com.
2026-04-06 01:24:47 +02:00
Rene Fichtmueller
e6d042f827
fix: resolve merge conflict in index.ts + add untracked blog-sll, news, sql migration
2026-04-05 11:51:07 +02:00
Rene Fichtmueller
931588fffd
fix(verification): 100% Verified Badge war dramatisch zu großzügig
...
KERNPROBLEME BEHOBEN:
1. ATGBICS part_number = URL slug statt echte OEM-Nummer
extractOemPartNumber() entfernt -r-compatible-transceiver-* Suffix
+ trailing Vendor-Namen (nokia, cisco, juniper, ...)
Ergebnis: 3he16564aa-nokia-r-compatible-transceiver-... → 3HE16564AA
2. reach_label = '' (leer) wurde als details_verified akzeptiert
IS NOT NULL erlaubt leere Strings → Fix: AND reach_label != ''
3. details_verified = true trotz garbled part_number
Neue Kriterien: NOT ILIKE '%-compatible-transceiver%'
NOT ILIKE '%-r-compatible%'
4. data_confidence Werte falsch in Funktion ('scraped_unverified' etc)
Echte Werte: low/medium/high/garbage → NOT IN ('garbage','unknown')
ERGEBNIS nach recompute_all_verification():
fully_verified: 3.654 → 581 (Badge war 6x übertrieben)
details_verified: inflated → 1.075 (korrekt)
ATGBICS Scraper:
- extractOemPartNumber() für collection und product detail pages
- detectReach() jetzt auch auf URL-slug (120km im slug → reach_label)
Price Anomaly Detection:
- API: price_anomaly field wenn max/min ratio ≥ 10x
- Dashboard: ⚠ Preisanomalie Banner mit Ratio + EUR Range
SQL 025: Part number cleanup (30 records), reach from slug (12 records)
2026-04-04 15:41:57 +02:00
Rene Fichtmueller
f616e0ebbe
feat: blog engine v4 (reduction+style-lock passes) + flexoptix scraper fixes
...
Blog engine (fo-blog-pipeline.ts):
- Add STEP8b_REDUCTION: cuts article 25-35%, removes repeated concepts
- Add STEP8c_STYLE_LOCK: enforces tone consistency, fixes scope/OPM confusion,
removes inline SKUs from article flow
- Add Gold Standard 3 to calibration (Style B troubleshooting example 2026-04-04)
- Pipeline now 12 steps (was 10), version bumped to v4-reduction-stylelock
blog.ts:
- Wire STEP8b and STEP8c into pipeline between Kill-AI-Tone and QA Check
- Update progress tracking to 12 total steps
- Update pipeline_version to 'v4-reduction-stylelock'
flexoptix-catalog.ts:
- Fix contentHash call: pass object directly, not JSON.stringify(object)
db.ts:
- price_verified=true set in content_hash early-return path (no new observation)
- image_verified=true auto-set in findOrCreateScrapedTransceiver on INSERT/UPDATE
2026-04-04 07:50:01 +02:00
Rene Fichtmueller
0ac932a304
fix: flexoptix catalog scraper — 1G SFP coverage + SKU suffix + pagination
...
- Add 1G SFP search queries ("1G SFP", "SFP LX", "SFP SX", "SFP ZX") — were completely missing
- Strip vendor-compat suffix from SKU (S.1303.10.DG:Sx → S.1303.10.DG) to match existing records
- Remove 200-product cap, use full API pagination (page >= 50 limit only)
- Result: FLEXOPTIX 1G SFP coverage 50% → 97%, overall price coverage 62% → 88%
2026-04-04 07:26:13 +02:00
Rene Fichtmueller
1026787318
feat: add proxy network, image backfill, and scraper improvements
...
- Add TIP Proxy Network (packages/proxy-agent): SOCKS5 proxy agent
for residential IP bypass of CloudFront WAF blocks
- Add /api/proxy/* routes: node registration, heartbeat, load balancing
- Add image extraction to Flexoptix catalog scraper (GraphQL small_image)
- Add image extraction to Optcore scraper (Playwright gallery img)
- Fix Fluxlight price scraping (BigCommerce HTML structure: data-product-price-without-tax)
- Add SmartOptics scraper (8 DWDM/coherent products, og:image extraction)
- Fix findOrCreateScrapedTransceiver to update image_url for existing records
- Add image backfill script (backfill-images.ts): 178 Flexoptix images added
- Fix DB connection pool: max 5, idleTimeoutMillis 10s (was unlimited, caused >100 connections)
- Add proxy.ts utility for scraper proxy rotation
2026-04-03 21:13:03 +02:00
Rene Fichtmueller
c7697308f6
feat: NOG conference talks scraper + hot topics integration
...
NOG Talks Scraper (packages/scraper/src/scrapers/nog-talks.ts):
- Crawls DENOG (15-17), NANOG (91-93), RIPE (87-89), ENOG, NLNOG, Euro-IX
- Relevance scoring: optical keywords (+3pts each), network keywords (+1pt)
Only talks scoring ≥2 stored, high-relevance (≥6) also to market_intelligence
- CtxEvent cross-DB bridge: when ctxmeet DB has ConferenceTalk rows,
pulls directly via dblink (same Postgres instance, no network hop)
- Runs weekly Monday 06:00 UTC (pg-boss schedule)
- Output: news_articles (source='NOG Talks: EVENT') + market_intelligence
Hot Topics (packages/api/src/routes/hot-topics.ts):
- SOURCE 3c: NOG talk clusters displayed as conference topics in hot list
Grouped by event (DENOG15, NANOG93...) with speaker + abstract preview
Filtered: source LIKE 'NOG Talks:%' AND relevance > 0.4 AND < 6 months
- Limit raised to 20 topics (was 15)
- Added nog_talks to sources metadata
Scheduler & Pi fleet:
- scrape:nog-talks queue registered in scheduler.ts + index-pi.ts
- Weekly cron: Monday 06:00 UTC (every Pi can handle it independently)
- First job triggered immediately
2026-04-02 22:38:00 +02:00
Rene Fichtmueller
f146ac873e
feat: add 5 form-factor coverage scrapers with worker registrations
...
Add Comms-Express, Router-Switch.com, Multimode Inc, OpticTransceiver.com,
and Wiitek scrapers covering CFP2-DCO, CFP4, OSFP224, QSFP112, CXP, GBIC,
XENPAK, CSFP, SFP-DD, SFP56, QSFP56 and other previously-uncovered form
factors. Each scheduled every 8h. Worker registrations added to scheduler.
Also export db alias in utils/db.ts to fix eBay enricher + community scrapers
crashing with 'Cannot read properties of undefined (reading query)'.
2026-04-02 08:39:17 +02:00
Rene Fichtmueller
370c1d8801
feat: 6 prediction signal scrapers + forecast engine
...
New scrapers (all registered in pg-boss, 50 total jobs):
- sec-edgar.ts : SEC EDGAR XBRL API — hyperscaler CapEx from 10-Q/10-K
- github-signals.ts : GitHub Search/Stats API — tech adoption metrics weekly
- ebay-velocity.ts : eBay completed listings — sold count + price distribution
- ai-clusters.ts : RSS feeds (6 sources) — AI cluster & DC announcements
- distributor-leads.ts : Mouser, Digi-Key, RS Components — lead time + stock
- standards-tracker.ts : IEEE 802.3, OIF, IETF — draft/ballot/published status
New utilities:
- forecast-engine.ts : Weighted signal aggregator → demand_index + price_direction
6 signal types, 4 horizons (3/9/12/18 months), 5 technologies tracked
New DB tables (migration 022):
hyperscaler_capex, distributor_lead_times, github_tech_signals,
marketplace_velocity, ai_cluster_announcements, standards_activity,
forecast_signals
Schedules:
- EDGAR: weekly Mon 06:00
- GitHub: weekly Sun 05:00
- eBay velocity: every 12h
- AI clusters: every 4h (news-speed)
- Distributor leads: daily 03:30
- Standards: weekly Wed 04:00
- Forecast engine: daily 08:00 (after all nightly scrapers)
2026-04-02 02:02:44 +02:00
Rene Fichtmueller
072978f1a4
feat: 24/7 scraping fleet — 8 new vendors + continuous schedule + Pi setup
...
New scrapers (8):
- BlueOptics (EUR, every 4h)
- ShopFiber24 (EUR, every 4h)
- T&S Communication (USD, every 4h)
- SmartOptics (catalog, every 8h)
- HUBER+SUHNER (catalog, every 8h)
- Skylane Optics (USD, every 4h)
- AscentOptics (USD, every 4h)
- GAO Tek (USD, every 4h)
Scheduler: nightly window → 24/7 continuous (42 jobs total)
- Playwright scrapers: every 8h (FS.com, 10Gtek, ATGBICS, ProLabs)
- Fetch/Cheerio: every 4h (11 lightweight vendors)
- Flexoptix catalog: every 2h (primary price source)
- eBay enrichment: every 6h
- Compatibility matrices: every 12h
- Compute jobs: every 4h
Pi fleet: scripts/pi-scraper-setup.sh for one-command Pi node setup
2026-04-02 01:09:05 +02:00
Rene Fichtmueller
732d7c3246
fix: switch seed lifecycle_status casing (Active not active)
2026-04-01 22:50:10 +02:00
Rene Fichtmueller
4020ec77d9
feat: product intelligence layer — eBay enricher, community issues, datasheets+manuals API
...
- Migration 020: product_issues table, condition/marketplace on price_observations, features JSONB
- eBay enricher: switch features/description/refurb prices + transceiver condition pricing
- Community issues scraper: Reddit/ServeTheHome/Arista/Cisco community bug reports
- 7 pre-seeded issues (DCS-7800R3, SG350, QFX5120, CRS326, USW-Pro etc.)
- API: /switches/:id/issues + /switches/:id/documents endpoints
- Dashboard switch modal: features from DB, description, eBay refurb price, issues+docs async
- Datasheet finder for Arista/Cisco/Juniper/HPE vendor pages
- Scheduler: 4 new jobs (ebay enrichment nightly, community issues weekly)
2026-04-01 22:46:27 +02:00
Rene Fichtmueller
64074f988f
feat: SMB/campus switch seed 26 models (Cisco/HPE/Ubiquiti/MikroTik/Netgear/Zyxel) + fix forecast.ts fiveYearProjection accessor
2026-04-01 22:34:58 +02:00
Rene Fichtmueller
681da54523
feat: Procurement Intelligence Engine (WS0c)
...
- Migration 019: stock_snapshots, abc_classification, reorder_signals,
product_lifecycle_events, market_intelligence, crawler_llm_log tables
- Seeded 7 market intel events (OFC 2026, AWS/Azure CapEx, Coherent lead times,
EU TED tenders, ECOC 2026, IEEE 802.3df)
- Seeded 4 lifecycle events (Cisco SFP-10G-LR EOL, Juniper EOL,
400ZR ratified, 800G MSA draft)
- Crawler LLM: core.ts (Ollama-based extractor), stock-schema.ts (typed schemas
+ vendor profiles for Flexoptix/FS.com/10Gtek/ATGBICS/ProLabs/Farnell/Mouser),
validator.ts (rule-based sanity checks + cross-validation)
- market-intelligence.ts scraper: OFC/ECOC, LightReading, IEEE 802.3, EU TED,
Farnell/Mouser lead times, FierceTelecom — weekly via pg-boss
- computeAbcClassification(): dynamic A/B/C classification from price obs +
compat count + vendor breadth
- computeReorderSignals(): buy_now/wait/hold/monitor with reasons + signal strength
- API: GET /api/procurement/overview|signals|signals/:id|abc|market-intel|
stock-trends/:id|lifecycle
- Dashboard: Procurement Intel tab with Reorder Signals, ABC table,
Market Intel cards, Lifecycle Events
2026-04-01 22:04:33 +02:00
Rene Fichtmueller
e4c89de6c0
feat: fs.com scraper Phase 2 — crawl product detail pages for verified specs
...
- New spec-updater utility: parseSpecTable() + updateVerifiedSpecs()
- fs.com scraper now has 2 phases:
Phase 1: Category pages → prices + stock (existing)
Phase 2: Product detail pages → fiber_type, connector, wavelength, power, image, datasheet
- Updates data_confidence from 'enriched_estimated' to 'scraped_unverified'
- Processes up to 200 product pages per scraper run
2026-03-31 09:18:27 +02:00
Rene Fichtmueller
a69acc4588
feat(v0.2.0): Sales Intelligence Engine — Phase 0+A
...
New API routes:
- GET /api/finder — Switch→Flexoptix transceiver finder with FlexBox coding
- GET /api/competitor-alerts — Competitor intelligence (price changes, new products, stock)
- GET /api/forecast/:technology — Sales forecast 3/9/12/18 months + buy/wait/hold signal
- POST /api/transport/plan — Transport system planner (city→city BOM with fiber providers)
New MCP tools:
- find_flexoptix_for_switch — Customer switch → Flexoptix products
- get_competitor_alerts — Competitor monitoring
- plan_transport — Network transport planning
- forecast_sales — Volume/revenue prediction
- generate_blog — Enhanced blog generation
New DB tables (migration 013):
- competitor_alerts, price_changes, flexoptix_product_map
- sales_forecasts, fiber_providers, fiber_routes, cities
- generated_datasheets, blog_series
- Views: v_price_coverage, v_image_coverage, v_switch_flexoptix_finder
Seed data (migration 014):
- 25 European cities with IX/DC locations + coordinates
- 15 fiber providers (euNetworks, Telia, DTAG, Colt, Zayo, etc.)
- 16 fiber routes with pricing (Germany focus)
Infrastructure:
- Scraper scheduler: 2h Flexoptix, 4h FS.com/Optcore (was 6-8h)
- Change detector for competitor price/stock monitoring
- Image downloader utility with coverage tracking
2026-03-31 08:51:22 +02:00
Rene Fichtmueller
0b07490114
chore: sync local changes
2026-03-31 07:32:02 +02:00
Rene Fichtmueller
2348238888
feat: add NADDOD, QSFPTEK, and AddOn Networks scrapers
...
Three new fetch-based price scrapers for compatible optics vendors:
- NADDOD: WooCommerce, USD, ~800+ SKUs
- QSFPTEK: Custom PHP shop, USD, ~1000+ SKUs
- AddOn Networks: Magento/custom, USD, ~2500 SKUs
All registered in scheduler (8-12h intervals) and index.ts --flags.
Build: 0 TypeScript errors.
2026-03-30 21:20:23 +02:00
Rene Fichtmueller
fcddd1f27b
fix: contentHash type errors + fs-com scraper improvements
...
Remove JSON.stringify wrapper from contentHash calls — function
expects Record<string,unknown>, not string. Fixes TS build for
6 scrapers. Update fs-com category URLs and add currency/lang cookies.
2026-03-30 21:07:27 +02:00
Rene Fichtmueller
814325b349
feat: dashboard v2, blog expansion, market/cable MCP tools, switch asset scrapers, scraper utilities
2026-03-30 08:07:12 +02:00
Rene Fichtmueller
6f7c834752
feat(scrapers+mcp): ATGBICS + ProLabs scrapers, MCP HTTP/SSE server
...
Scrapers:
- atgbics.ts: PlaywrightCrawler for UK vendor ATGBICS (Shopify store),
scrapes SFP/SFP+/SFP28/QSFP+/QSFP28/QSFP-DD in GBP, max 50 pages/run
- prolabs.ts: HttpCrawler for ProLabs (Legrand subsidiary), USD pricing,
category-driven crawl with reach/fiber/speed detection
- Both registered in scheduler (every 8h, staggered) and index.ts CLI
MCP HTTP Server:
- packages/mcp-server/src/http-server.ts: Express + SSEServerTransport
- Exposes all 12 TIP tools via GET /sse + POST /message
- Bearer token auth (MCP_SECRET env), CORS-configurable
- GET /health → { status: "ok", tools: 12 }
- Port: MCP_HTTP_PORT (default 3201)
SQL + tools:
- sql/006-009: seed scripts for whitebox switches, vendors, assets
- switch-docs.ts: MCP tool for switch documentation queries
2026-03-29 02:26:45 +08:00
Rene Fichtmueller
70447def02
feat: massive scraper expansion + hype cycle engine + lifecycle prediction
...
New scrapers:
- GBICS.com (BigCommerce, GBP prices, 10 categories, 78 products)
- Juniper HCT (Next.js SSR parser, 475 transceivers with specs/EOL)
- SFPcables.com (Magento store, 16 categories, 78 products)
- Fluxlight (BigCommerce, 6 pages, 118 products)
- Champion ONE (compatible vendor scraper)
Scraper fixes:
- 10Gtek: rewritten to parse HTML spec tables (152 products)
- Flexoptix: fix price extraction from Magento Hyva HTML
- Register all scrapers in CLI (--gbics, --juniper, --sfpcables, etc.)
Hype Cycle Engine enhancements:
- Data-driven enrichment from scraped vendor/price data
- Revenue lifecycle prediction (peak year, decline, revenue index)
- Regional adoption model (NA, China, APAC, Europe, RoW with lag coefficients)
- New API endpoints: /enriched, /lifecycle, /regional/:tech
DB growth: 89 → 1,168 transceivers, 0 → 416 prices, 6 vendors
Qdrant: 1,162 products embedded with nomic-embed-text
Research: Norton-Bass model, standards-to-market timelines, hype signals
2026-03-28 02:30:19 +13:00
Rene Fichtmueller
204e99763c
feat: add Flexoptix product catalog scraper, register in CLI
...
Scrapes flexoptix.net product catalog across 9 categories (SFP through OSFP).
Extracts product names, prices, form factors, reach, fiber type, wavelength.
CLI: --flexoptix flag, integrated into --all.
2026-03-28 01:02:34 +13:00
Rene Fichtmueller
bd3a02ae4b
feat: add Flexoptix vendor scraper, 10Gtek pricing scraper, expand news feeds
...
- Flexoptix vendor scraper: 285 supported switch vendors ingested from
flexoptix.net/en/supported-vendors/ (our own data, no restrictions)
- 10Gtek Playwright scraper: Chinese OEM competitor pricing (SFP+, SFP28,
QSFP+, QSFP28, QSFP-DD categories)
- News feeds expanded: added Lightwave, Fierce Telecom, Data Center Knowledge,
SDxCentral, Cisco Blogs, Arista Blog (11 total sources)
- Scheduler updated: 8 job queues with appropriate intervals
- DB now: 297 vendors, 89 transceivers, 33 news articles (13 relevant)
2026-03-27 23:17:42 +13:00
Rene Fichtmueller
e9fb50a248
feat: TIP Phase 0+1 — monorepo, DB schema, API, scraper engine
...
Phase 0 - Foundation:
- Restructure into npm workspace monorepo (packages/core, api, scraper)
- PostgreSQL 17 + TimescaleDB schema (15 tables incl. hypertables)
- Docker Compose for local dev (PostgreSQL on 5433 + Qdrant)
- Express 5 API on port 3200 with 6 routes
- Seed script to migrate 159 transceivers + 42 standards from npm package
- Erik server setup script + PM2 ecosystem config
Phase 1 - Scraper Engine:
- Crawlee + Playwright framework with pg-boss scheduler
- FS.com scraper (PlaywrightCrawler, anti-bot workaround)
- Optcore.net scraper (WP REST API enumeration + PlaywrightCrawler)
- Uses /wp-json/wp/v2/product to get 2000+ product URLs
- Playwright renders individual product pages for price extraction
- Cisco TMG Matrix scraper (compatibility data)
- News RSS aggregator (optics.org, SPIE, Network World, Nature Photonics)
- Keyword relevance scoring for transceiver/fiber topics
- xml2js with malformed XML sanitization
- SHA-256 content hashing for change detection (skip unchanged records)
- pg-boss v10 with explicit queue creation before scheduling
2026-03-27 16:27:31 +13:00