9 Commits

Author SHA1 Message Date
Rene Fichtmueller
199f36be48 fix(scraper): auto-create pg-boss queues before scheduling + worker/schedule order
- scheduler: patch boss.schedule() to call createQueue() first (idempotent),
  fixing FK constraint errors after DB reset — no need to touch 277 call sites
- index: registerWorkers() before registerSchedules() since boss.work() must
  register handlers before schedules fire
- dashboard: fix switchBlogLlm() to use api() helper (adds Bearer auth token)
  instead of raw fetch() which was returning 401 Unauthorized
2026-04-29 16:14:25 +02:00
Rene Fichtmueller
b0ed54f386 feat: register fiber24 + fibermall in index, move atgbics to fetch-only section 2026-04-18 22:50:52 +02:00
Rene Fichtmueller
93d825dc04 fix: daemon stability + health monitor accuracy
- Add global unhandledRejection handler in scheduler daemon to swallow
  Crawlee's benign post-run ENOENT lock-file races (prevents process.exit(1))
- Add SKIP_FS_SCRAPER env var: skip FS.com worker on Erik where Cloudflare
  WAF blocks datacenter IPs (Mac launchd handles FS.com from residential IP)
- Remove FS.COM from health monitor EXPECTED_VENDORS (skipped on Erik)
- Health monitor: extend pg-boss lookup from 12h → 26h, add completed-job
  map; if job ran OK in last 26h + vendor has historical prices → mark
  STABLE instead of CRITICAL (fixes ATGBICS/Fluxlight hash-dedup false positives)
- Install Playwright Chromium on Erik (fixes ATGBICS BrowserLaunchError)
- Create missing Crawlee storage dirs on Erik (storage-fs-phase1/2,
  storage-ebay-transceivers) to prevent ENOENT on first Crawlee run
2026-04-18 03:16:59 +02:00
Rene Fichtmueller
45c48755e4 feat(scraper): add NADDOD/QSFPTEK/AddOn to scheduler, fix pre-existing TS build errors
- Register scrape:pricing:naddod (48 */2), qsfptek (52 */2), addon (55 */2) in pg-boss
- Add boss.work() handlers for all three (fetch-based, run on Erik)
- Fix findOrCreateScrapedTransceiver callers: remove invalid `name`/`url` params,
  fix `t.id` → `t` (function already returns string ID)
- Fix ebay-enricher: remove invalid `extractType` option, use extraction.standard_name
  instead of non-existent `.description`, fix cheerio type incompatibility
- Fix community-issues: description → summary, publishedDate → published_at
- Startup zombie cleanup already deployed (index.ts) — no changes needed
- ProLabs rewritten to fetch-based catalog scraper (no Playwright, bypasses WAF)
2026-04-11 03:17:33 +02:00
Rene Fichtmueller
2348238888 feat: add NADDOD, QSFPTEK, and AddOn Networks scrapers
Three new fetch-based price scrapers for compatible optics vendors:
- NADDOD: WooCommerce, USD, ~800+ SKUs
- QSFPTEK: Custom PHP shop, USD, ~1000+ SKUs
- AddOn Networks: Magento/custom, USD, ~2500 SKUs

All registered in scheduler (8-12h intervals) and index.ts --flags.
Build: 0 TypeScript errors.
2026-03-30 21:20:23 +02:00
Rene Fichtmueller
6f7c834752 feat(scrapers+mcp): ATGBICS + ProLabs scrapers, MCP HTTP/SSE server
Scrapers:
- atgbics.ts: PlaywrightCrawler for UK vendor ATGBICS (Shopify store),
  scrapes SFP/SFP+/SFP28/QSFP+/QSFP28/QSFP-DD in GBP, max 50 pages/run
- prolabs.ts: HttpCrawler for ProLabs (Legrand subsidiary), USD pricing,
  category-driven crawl with reach/fiber/speed detection
- Both registered in scheduler (every 8h, staggered) and index.ts CLI

MCP HTTP Server:
- packages/mcp-server/src/http-server.ts: Express + SSEServerTransport
- Exposes all 12 TIP tools via GET /sse + POST /message
- Bearer token auth (MCP_SECRET env), CORS-configurable
- GET /health → { status: "ok", tools: 12 }
- Port: MCP_HTTP_PORT (default 3201)

SQL + tools:
- sql/006-009: seed scripts for whitebox switches, vendors, assets
- switch-docs.ts: MCP tool for switch documentation queries
2026-03-29 02:26:45 +08:00
Rene Fichtmueller
70447def02 feat: massive scraper expansion + hype cycle engine + lifecycle prediction
New scrapers:
- GBICS.com (BigCommerce, GBP prices, 10 categories, 78 products)
- Juniper HCT (Next.js SSR parser, 475 transceivers with specs/EOL)
- SFPcables.com (Magento store, 16 categories, 78 products)
- Fluxlight (BigCommerce, 6 pages, 118 products)
- Champion ONE (compatible vendor scraper)

Scraper fixes:
- 10Gtek: rewritten to parse HTML spec tables (152 products)
- Flexoptix: fix price extraction from Magento Hyva HTML
- Register all scrapers in CLI (--gbics, --juniper, --sfpcables, etc.)

Hype Cycle Engine enhancements:
- Data-driven enrichment from scraped vendor/price data
- Revenue lifecycle prediction (peak year, decline, revenue index)
- Regional adoption model (NA, China, APAC, Europe, RoW with lag coefficients)
- New API endpoints: /enriched, /lifecycle, /regional/:tech

DB growth: 89 → 1,168 transceivers, 0 → 416 prices, 6 vendors
Qdrant: 1,162 products embedded with nomic-embed-text

Research: Norton-Bass model, standards-to-market timelines, hype signals
2026-03-28 02:30:19 +13:00
Rene Fichtmueller
204e99763c feat: add Flexoptix product catalog scraper, register in CLI
Scrapes flexoptix.net product catalog across 9 categories (SFP through OSFP).
Extracts product names, prices, form factors, reach, fiber type, wavelength.
CLI: --flexoptix flag, integrated into --all.
2026-03-28 01:02:34 +13:00
Rene Fichtmueller
e9fb50a248 feat: TIP Phase 0+1 — monorepo, DB schema, API, scraper engine
Phase 0 - Foundation:
- Restructure into npm workspace monorepo (packages/core, api, scraper)
- PostgreSQL 17 + TimescaleDB schema (15 tables incl. hypertables)
- Docker Compose for local dev (PostgreSQL on 5433 + Qdrant)
- Express 5 API on port 3200 with 6 routes
- Seed script to migrate 159 transceivers + 42 standards from npm package
- Erik server setup script + PM2 ecosystem config

Phase 1 - Scraper Engine:
- Crawlee + Playwright framework with pg-boss scheduler
- FS.com scraper (PlaywrightCrawler, anti-bot workaround)
- Optcore.net scraper (WP REST API enumeration + PlaywrightCrawler)
  - Uses /wp-json/wp/v2/product to get 2000+ product URLs
  - Playwright renders individual product pages for price extraction
- Cisco TMG Matrix scraper (compatibility data)
- News RSS aggregator (optics.org, SPIE, Network World, Nature Photonics)
  - Keyword relevance scoring for transceiver/fiber topics
  - xml2js with malformed XML sanitization
- SHA-256 content hashing for change detection (skip unchanged records)
- pg-boss v10 with explicit queue creation before scheduling
2026-03-27 16:27:31 +13:00