159 Commits

Author SHA1 Message Date
Rene Fichtmueller
24481b09e6 fix: eBay enricher Crawlee isolation + ephemeral queues
- Add makeCrawleeConfig isolation to CheerioCrawler instances
- Switch from named persistent RequestQueue to ephemeral null queues:
  named queues retain 'handled' state and skip all URLs on re-runs,
  causing 0 observations on every run after the first.
- Applies to both enrichSwitchFromEbay and enrichTransceiversFromEbay.
2026-04-18 01:42:08 +02:00
Rene Fichtmueller
c7d7456de9 fix: instance-level Crawlee storage isolation + eBay vendor type
- Add utils/crawlee-config.ts: makeCrawleeConfig(name) returns a
  Crawlee Configuration with isolated localDataDirectory per scraper.
  Uses storageClientOptions (not global CRAWLEE_STORAGE_DIR) so
  concurrent pg-boss workers in the same process don't race on
  the shared env var.

- Apply makeCrawleeConfig to all 6 Crawlee-based scrapers:
  optcore (PlaywrightCrawler), atgbics (PlaywrightCrawler),
  community-issues (CheerioCrawler + RequestQueue),
  edgecore (CheerioCrawler), ufispace (CheerioCrawler),
  market-intelligence (CheerioCrawler).

- scheduler.ts: add withIsolatedStorage for optcore and market-intel
  workers (was missing, caused storage-fs path bleed from fs scraper).

- ebay-enricher.ts: fix vendor type 'marketplace' -> 'reseller' to
  satisfy vendors_type_check constraint
  ['manufacturer','distributor','oem','reseller','compatible'].
2026-04-18 01:35:57 +02:00
Rene Fichtmueller
4b751a771b fix: NADDOD stockLevel 'unknown' → 'on_request' — invalid value for price_observations check constraint 2026-04-18 01:21:31 +02:00
Rene Fichtmueller
2b770aa1a9 chore: cleanup — rename digikey→mouser, remove orphan files, gitignore Crawlee artifacts
- Rename scrapers/digikey.ts → scrapers/mouser.ts: export scrapeMouser()
  (file was Mouser API implementation mislabeled from task origin)
- Fix scheduler.ts mouser-oem worker: import scrapeMouser from ./scrapers/mouser
- Delete switch-seed-smb.ts (unreferenced, no CLI flag, no scheduler job)
- Add storage/, storage-fs/, .crawlee/ to .gitignore (Crawlee runtime artifacts)
2026-04-18 01:09:10 +02:00
Rene Fichtmueller
1c8dec52c9 feat: Price Comparison dashboard + Eoptolink OEM scraper
- Add public /api/price-comparison API (summary, top-50, per-SKU detail)
  — no auth required, 3 Express routes, DISTINCT ON latest-price logic
- Add '💲 Price Comparison' dashboard tab: stat cards, form-factor
  breakdown, top-50 SKU table (clickable rows → SKU detail), per-vendor
  price + stock + spread% lookup panel
- Add Eoptolink OEM catalog scraper (93 product-solution pages,
  part-number regex EOLO-*/EOLQ-* etc., no prices, seeds transceivers
  table as manufacturer entries)
- Register scrape:catalog:eoptolink in scheduler: schedule every 4h
  (40 */4 * * *), lazy-import worker, added to known-jobs array
2026-04-18 01:02:08 +02:00
Rene Fichtmueller
e9fcda2811 feat: wire finder.ts + switch-docs + Ollama LLM tools to MCP server
MCP Server (packages/mcp-server/src/index.ts):
- Register registerSwitchDocTools (switch-docs.ts) — switch documentation lookup
- Register finderTools dynamically (finder.ts) — find_flexoptix_for_switch, get_competitor_alerts
- Add analyze_market_with_llm tool: qwen2.5:14b via Ollama, enriched with live hype cycle + pricing + news
- Add generate_blog_post tool: fo-blog-v5 (fine-tuned) with qwen2.5:14b fallback, enriched with live pricing data
- OLLAMA_BASE_URL env var (default: https://ollama.fichtmueller.org)

Also includes scraper improvements (ascentoptics, atgbics, gbics, skylane, ebay-enricher),
API route updates (blog, blog-sll, health, hot-topics, transceivers, queries),
and dashboard hot-topics refresh.
2026-04-18 00:21:58 +02:00
Rene Fichtmueller
9d3019d0c0 feat: Norton-Bass Hype Cycle Engine — market_metrics seed + Bass fitting + Gartner phase detection 2026-04-18 00:09:08 +02:00
Rene Fichtmueller
75cea9fe90 feat: Mouser Electronics API scraper for OEM reference prices (Juniper/Cisco/Arista PIDs) 2026-04-18 00:04:35 +02:00
Rene Fichtmueller
861243ea3f feat: stock confidence badges, multi-vendor price comparison, expanded Cisco TMG + Juniper HCT
Stock API & Dashboard:
- /api/stock/summary: vendor_breakdown adds avg_confidence, currencies, conf_per_warehouse/aggregated/boolean
- /api/stock/summary: new price_comparison endpoint (multi-vendor SKUs, min/max/avg price)
- /api/stock/summary: totals adds multi_vendor_skus count
- Dashboard: 6th stat card (Multi-Vendor SKUs), confidence badge column (🟢 L3 / 🟡 L2 /  L1)
- Dashboard: price comparison table with vendor-by-vendor price breakdown
- Dashboard: subtitle updated to include QSFPTEK + NADDOD
- Dashboard: top sellers link to product URLs

Cisco TMG improvements:
- Added 5 new platform families: 8000 Series, NCS5500, NCS540, NCS560, NCS1000
- Per-device query strategy: iterates all switch model IDs from family filter
  instead of getting only 1 switch per family → 58 switches per N9300 run
- Graceful error handling per device with rate limiting (1s between requests)

Juniper HCT: ran manually → 475 Juniper-brand transceivers seeded
2026-04-17 23:33:31 +02:00
Rene Fichtmueller
5393f73c17 feat: stock quality schema + QSFPTEK/NADDOD v2 scrapers with real-time stock counts
- Migration 028 (retroactive): document warehouse columns added to stock_observations
- Migration 037: composite indexes for DISTINCT ON (transceiver_id, source_vendor_id) queries
- Migration 038: add stock_confidence (1/2/3), price_currency, price_includes_tax,
  stock_vendor_ts to stock_observations + TRUNCATE test-run data

db.ts: upsertStockObservation now accepts stockConfidence, priceCurrency,
priceIncludesTax, stockVendorTs; delta detection includes quantity_available

fs-com.ts: passes stockConfidence=3 + priceCurrency=EUR + priceIncludesTax=false

qsfptek.ts v2: Phase 1 API listing + Phase 2 detail-page stock extraction
- Parses 'X in real-time stock, DATE' from product detail pages
- Writes stock_observations with confidence=2 + stockVendorTs
- Up to 500 detail pages/run at 2s rate limit

naddod.ts v2: complete rewrite from WooCommerce to Astro sitemap-based
- Discovers products via /sitemaps/products.xml (600+ products)
- URL format: /products/XXXXX.html
- Extracts 'In Stock: X' exact counts from SSR HTML
- Writes both price + stock observations (confidence 1 or 2)
2026-04-17 22:54:40 +02:00
Rene Fichtmueller
5b35b2b8be feat(scraper+api): warehouse stock data pipeline — FS.com v2, SmartOptics v2, Stock API
Scraper changes:
- fs-com.ts v2: Playwright stealth patches + www.fs.com/de/ URL fix (de.fs.com DNS NXDOMAIN).
  Extracts DE-Lager, Global-Lager, Nachlieferung, units_sold, compatible_brands, price_net.
  Mac-side runner (run-fs-scraper-mac.sh) via SSH tunnel for residential IP access.
  Fast-fail connectivity check on datacenter IPs that are blocked by Cloudflare.
- smartoptics.ts v2: WooCommerce REST API fallback + 8 catalog categories + relative URL fix.
  Was finding only 8 products, now discovers 18+ with multi-category crawl.

DB layer:
- db.ts: add upsertStockObservation() — writes 10 new stock_observations columns
  (warehouse_de_qty, warehouse_global_qty, backorder_qty, units_sold, compatible_brands,
  price_net, product_url, delivery dates) with dedup check.

API:
- routes/stock.ts: GET /api/stock, /api/stock/summary, /api/stock/:id
  Warehouse breakdowns per transceiver/vendor with top-sellers and vendor summary.
- routes/review.ts: equivalence review queue (approve/reject/bulk-approve).
- index.ts: register /api/stock and /api/review routes.

Dashboard:
- index.html: 🏭 Stock tab with stat cards (DE-Lager, Global-Lager, Nachlieferung totals),
  top-sellers table, vendor breakdown, recently-restocked events, part-number lookup.

SQL migrations:
- 034: blog-review-tag, 035: price-observations is_anomalous, 036: transceiver-equivalences.
2026-04-17 10:45:59 +02:00
Rene Fichtmueller
662cd1f90b fix(scraper): FiberMall URL schema + price parser + Flexoptix EUR comma bug
FiberMall:
- Correct /store-XXXXX-name.htm category URLs (was /c/xxx/ → HTTP 404)
- Parser: split on new_proList_mainListLi, price from data-price on
  currency_price span — fix 0.00 false-match from SKU variant items
- Also scrape SKU brand variant links from .sku_item divs
- Result: 3,410 prices now in DB (was 0)

Flexoptix:
- Fix extractPrice regex for EUR thousand-separator format
  (2,921.60 EUR was parsed as 2 EUR)
- Add OSFP224 / 1.6T search queries (4 new, form factor was missing)
- Fix O.138HG2.C.05 stale price 3009.60→2921.60 EUR

Schema: competitor_verified + competitor_verified_at columns
added via ALTER TABLE (were referenced in code but missing in DB)

CHANGELOG: added 6 entries for 2026-04-12
2026-04-12 04:26:35 +02:00
Rene Fichtmueller
cdb8ef6e61 feat(scraper): add FiberMall/Vcelink/OpticsBay scrapers, fix QSFPTEK API migration
- New scrapers: fibermall.ts (WooCommerce), vcelink.ts (Shopify), opticsbay.ts (WooCommerce)
- QSFPTEK rewritten to use /mall/commodity/list API (old OpenCart /c/*.html paths gone 404)
  - New: attribute-based filtering by data rate (1G/10G/25G/40G/100G/200G/400G/800G)
  - Scrapes HTML fragments, extracts US$ prices and product URLs
- scheduler.ts: +3 queues/schedules/workers (fibermall, vcelink, opticsbay) → 61 total workers
- index-pi.ts: Pi fleet picks up all 3 new scrapers
2026-04-11 19:13:36 +02:00
Rene Fichtmueller
148d2e1000 fix(scraper): set CRAWLEE_PURGE_ON_START=1 in withIsolatedStorage
Crawlee's SessionPool throws 'Could not find SDK_SESSION_POOL_STATE.json'
when initializing against a freshly-created isolated storage dir.
Setting CRAWLEE_PURGE_ON_START=1 tells Crawlee to start fresh instead
of trying to load non-existent session state — fixes FS.com and ATGBICS
crashes at the start of every 2h cycle after the dirs were cleaned up.
2026-04-11 07:27:24 +02:00
Rene Fichtmueller
45c48755e4 feat(scraper): add NADDOD/QSFPTEK/AddOn to scheduler, fix pre-existing TS build errors
- Register scrape:pricing:naddod (48 */2), qsfptek (52 */2), addon (55 */2) in pg-boss
- Add boss.work() handlers for all three (fetch-based, run on Erik)
- Fix findOrCreateScrapedTransceiver callers: remove invalid `name`/`url` params,
  fix `t.id` → `t` (function already returns string ID)
- Fix ebay-enricher: remove invalid `extractType` option, use extraction.standard_name
  instead of non-existent `.description`, fix cheerio type incompatibility
- Fix community-issues: description → summary, publishedDate → published_at
- Startup zombie cleanup already deployed (index.ts) — no changes needed
- ProLabs rewritten to fetch-based catalog scraper (no Playwright, bypasses WAF)
2026-04-11 03:17:33 +02:00
Rene Fichtmueller
6febb9c88e refactor(prolabs): replace Playwright+Firefox with fetch-based catalog scraper
ProLabs uses B2B quote model - prices require reseller account and are
not shown publicly (schema.org always shows price=0.00). Fighting
CloudFront WAF with Firefox automation is pointless.

New approach:
- Sitemap-driven: downloads all 14 sitemaps to collect product URLs
- fetch-based: curl-compatible HTTP requests bypass CloudFront TLS detection
- catalog-only: writes part numbers + specs to transceivers table
- Rate-limited: 300ms between requests (~3 req/sec)
- No proxy needed: Pi nodes no longer consumed for ProLabs
2026-04-11 02:57:13 +02:00
Rene Fichtmueller
bf626f9de6 fix: route Pi-destined scrapers exclusively to Pi worker fleet
Remove boss.work() registrations for lightweight fetch/cheerio scrapers
from Erik's scheduler. Pis are now the SOLE consumers of these queues:
fluxlight, gbics, optcore, champion-one, sfpcables, blueoptics, fiber24,
tscom, skylane, ascentoptics, gaotek, smartoptics, hubersuhner, news,
market-intel.
2026-04-09 20:50:57 +02:00
Rene Fichtmueller
cf75eee8ad feat: linecard system support, Cisco 8000 accuracy, price anomaly detection
API/finder:
- Add modular chassis support: sibling linecards fetched when is_linecard=true
- Add chassis linecards when system_type=modular
- Extend switch response: system_type, is_linecard, chassis_model, slot_type,
  flexbox_compat_mode, flexbox_notes, description, switching_capacity_tbps,
  total_ports, category, lifecycle_status, features, use_cases, linecards[]

API/transceivers:
- Filter price_observations with COALESCE(is_anomalous, false) = false
  (direct prices + comparable market prices)

Scraper/db:
- Add PRICE_BOUNDS map (per form-factor min/max USD sanity bounds)
- Add isPriceAnomalous() — marks DB price_observations as is_anomalous=true
- Add competitor_verified flag: set true when valid competitor price stored
- upsertPriceObservation: skip prices outside sanity bounds, set competitor_verified

Scraper/hash:
- contentHash() now accepts Record<string,unknown> | string (union type)
  to support both structured objects and legacy string callers

Scrapers (skylane, tscom, wiitek):
- Fix contentHash() call signature: pass objects not JSON.stringify strings
- Fix wiitek: remove invalid 'name' param, fix t.id → transceiverId

Migrations:
- Add is_anomalous, competitor_verified, competitor_verified_at,
  image_primary columns
- Recreate sync_fully_verified trigger to include competitor_verified
- Add is_linecard, chassis_model, system_type, slot_type,
  flexbox_compat_mode, flexbox_notes to switches table
2026-04-09 09:06:22 +02:00
Rene Fichtmueller
240e7f46f2 feat(scraper): add SOCKS5 proxy rotation for fs-com, atgbics, gbics scrapers
Routes requests through CT130/131/132 proxy pool (192.168.178.77/76/74:1080)
when PROXY_URLS env var is set. Uses ProxyConfiguration from crawlee for
PlaywrightCrawler scrapers and socks-proxy-agent for fetch-based scrapers.
2026-04-08 08:17:49 +02:00
Rene Fichtmueller
51af249361 Merge remote-tracking branch 'github/main'
# Conflicts:
#	packages/api/src/llm/fo-blog-pipeline.ts
#	packages/api/src/routes/blog.ts
#	packages/scraper/src/scheduler.ts
#	packages/scraper/src/scrapers/fs-com.ts
#	packages/scraper/src/scrapers/gbics.ts
2026-04-06 18:03:36 +02:00
Rene Fichtmueller
2e852e0a2f fix(scrapers): replace bot User-Agents with Chrome UA + disable dead domain
- 16 commercial scrapers: replace TIP-Bot/1.0 with Chrome/120 UA
  (GBICS confirmed returning 0 bytes for bot UA, Chrome UA returns 200KB)
- gbics.ts: fix User-Agent (was returning empty HTML, now returns products)
- optictransceiver.ts: disable — domain repurposed as plant shop (2026-04-06)
  Alocasia Regal Shield is not a transceiver.
2026-04-06 02:17:50 +02:00
Rene Fichtmueller
dfe86fb347 fix(scraper): switch fs-com to de.fs.com for EUR prices as primary source
EUR prices scraped verbatim from de.fs.com — no conversion needed.
USD derivation (EUR→USD) happens downstream, not EUR←USD.
Fixes price discrepancy: TIP showed USD 999×0.92=EUR 866 vs real €948 on de.fs.com.
2026-04-06 01:24:47 +02:00
Rene Fichtmueller
e4e432d9fa fix: parsePrice requires currency symbol + uses largest number to avoid misreads
Root cause of fake prices (e.g. 1.30 for 800G OSFP):
- parsePrice accepted any bare number without currency symbol
- Could misread stock counts, page numbers, or CSS values as prices
- Also picked the first number, not the main price

Fix:
- Require explicit currency symbol or decimal format (1234.56)
- Use the LARGEST number found in the price string
- Returns price=0 (rejected) when no valid price pattern found
2026-04-06 01:19:25 +02:00
Rene Fichtmueller
1434037d29 fix(scraper): use false instead of null for image_verified on insert 2026-04-05 12:15:10 +02:00
Rene Fichtmueller
e6d042f827 fix: resolve merge conflict in index.ts + add untracked blog-sll, news, sql migration 2026-04-05 11:51:07 +02:00
Rene Fichtmueller
8cc19011f4 feat(scraper): all pricing scrapers to 2h 24/7 — full competitor coverage, no gaps 2026-04-05 01:32:08 +02:00
Rene Fichtmueller
44244a22a1 feat: 4th verification criterion (Competitor) + scraper frequency FS/10Gtek/ProLabs to 2h 2026-04-05 01:28:46 +02:00
Rene Fichtmueller
931588fffd fix(verification): 100% Verified Badge war dramatisch zu großzügig
KERNPROBLEME BEHOBEN:
1. ATGBICS part_number = URL slug statt echte OEM-Nummer
   extractOemPartNumber() entfernt -r-compatible-transceiver-* Suffix
   + trailing Vendor-Namen (nokia, cisco, juniper, ...)
   Ergebnis: 3he16564aa-nokia-r-compatible-transceiver-... → 3HE16564AA

2. reach_label = '' (leer) wurde als details_verified akzeptiert
   IS NOT NULL erlaubt leere Strings → Fix: AND reach_label != ''

3. details_verified = true trotz garbled part_number
   Neue Kriterien: NOT ILIKE '%-compatible-transceiver%'
                   NOT ILIKE '%-r-compatible%'

4. data_confidence Werte falsch in Funktion ('scraped_unverified' etc)
   Echte Werte: low/medium/high/garbage → NOT IN ('garbage','unknown')

ERGEBNIS nach recompute_all_verification():
  fully_verified: 3.654 → 581 (Badge war 6x übertrieben)
  details_verified: inflated → 1.075 (korrekt)

ATGBICS Scraper:
  - extractOemPartNumber() für collection und product detail pages
  - detectReach() jetzt auch auf URL-slug (120km im slug → reach_label)

Price Anomaly Detection:
  - API: price_anomaly field wenn max/min ratio ≥ 10x
  - Dashboard: ⚠ Preisanomalie Banner mit Ratio + EUR Range

SQL 025: Part number cleanup (30 records), reach from slug (12 records)
2026-04-04 15:41:57 +02:00
Rene Fichtmueller
f616e0ebbe feat: blog engine v4 (reduction+style-lock passes) + flexoptix scraper fixes
Blog engine (fo-blog-pipeline.ts):
- Add STEP8b_REDUCTION: cuts article 25-35%, removes repeated concepts
- Add STEP8c_STYLE_LOCK: enforces tone consistency, fixes scope/OPM confusion,
  removes inline SKUs from article flow
- Add Gold Standard 3 to calibration (Style B troubleshooting example 2026-04-04)
- Pipeline now 12 steps (was 10), version bumped to v4-reduction-stylelock

blog.ts:
- Wire STEP8b and STEP8c into pipeline between Kill-AI-Tone and QA Check
- Update progress tracking to 12 total steps
- Update pipeline_version to 'v4-reduction-stylelock'

flexoptix-catalog.ts:
- Fix contentHash call: pass object directly, not JSON.stringify(object)

db.ts:
- price_verified=true set in content_hash early-return path (no new observation)
- image_verified=true auto-set in findOrCreateScrapedTransceiver on INSERT/UPDATE
2026-04-04 07:50:01 +02:00
Rene Fichtmueller
0ac932a304 fix: flexoptix catalog scraper — 1G SFP coverage + SKU suffix + pagination
- Add 1G SFP search queries ("1G SFP", "SFP LX", "SFP SX", "SFP ZX") — were completely missing
- Strip vendor-compat suffix from SKU (S.1303.10.DG:Sx → S.1303.10.DG) to match existing records
- Remove 200-product cap, use full API pagination (page >= 50 limit only)
- Result: FLEXOPTIX 1G SFP coverage 50% → 97%, overall price coverage 62% → 88%
2026-04-04 07:26:13 +02:00
Rene Fichtmueller
c179b236d7 fix: auto-set image_verified and price_verified in db utils
- findOrCreateScrapedTransceiver now sets image_verified=true when writing image_url
- upsertPriceObservation now sets price_verified=true on the transceiver after inserting price
- Both INSERT and UPDATE paths covered for image_verified sync
- Eliminates need for manual backfill after scraper runs
2026-04-04 07:14:26 +02:00
Rene Fichtmueller
2913ad451b fix: reduce pg-boss pool size to 4, add idle_in_transaction_session_timeout
PostgreSQL max_connections was being exceeded (100/100).
- Limit pg-boss internal pool to 4 connections
- Added idle_in_transaction_session_timeout=30s to PostgreSQL config
- Already raised max_connections to 300 (container config)
System now stable at ~98/300 connections
2026-04-03 21:15:35 +02:00
Rene Fichtmueller
1026787318 feat: add proxy network, image backfill, and scraper improvements
- Add TIP Proxy Network (packages/proxy-agent): SOCKS5 proxy agent
  for residential IP bypass of CloudFront WAF blocks
- Add /api/proxy/* routes: node registration, heartbeat, load balancing
- Add image extraction to Flexoptix catalog scraper (GraphQL small_image)
- Add image extraction to Optcore scraper (Playwright gallery img)
- Fix Fluxlight price scraping (BigCommerce HTML structure: data-product-price-without-tax)
- Add SmartOptics scraper (8 DWDM/coherent products, og:image extraction)
- Fix findOrCreateScrapedTransceiver to update image_url for existing records
- Add image backfill script (backfill-images.ts): 178 Flexoptix images added
- Fix DB connection pool: max 5, idleTimeoutMillis 10s (was unlimited, caused >100 connections)
- Add proxy.ts utility for scraper proxy rotation
2026-04-03 21:13:03 +02:00
Rene Fichtmueller
c7697308f6 feat: NOG conference talks scraper + hot topics integration
NOG Talks Scraper (packages/scraper/src/scrapers/nog-talks.ts):
- Crawls DENOG (15-17), NANOG (91-93), RIPE (87-89), ENOG, NLNOG, Euro-IX
- Relevance scoring: optical keywords (+3pts each), network keywords (+1pt)
  Only talks scoring ≥2 stored, high-relevance (≥6) also to market_intelligence
- CtxEvent cross-DB bridge: when ctxmeet DB has ConferenceTalk rows,
  pulls directly via dblink (same Postgres instance, no network hop)
- Runs weekly Monday 06:00 UTC (pg-boss schedule)
- Output: news_articles (source='NOG Talks: EVENT') + market_intelligence

Hot Topics (packages/api/src/routes/hot-topics.ts):
- SOURCE 3c: NOG talk clusters displayed as conference topics in hot list
  Grouped by event (DENOG15, NANOG93...) with speaker + abstract preview
  Filtered: source LIKE 'NOG Talks:%' AND relevance > 0.4 AND < 6 months
- Limit raised to 20 topics (was 15)
- Added nog_talks to sources metadata

Scheduler & Pi fleet:
- scrape:nog-talks queue registered in scheduler.ts + index-pi.ts
- Weekly cron: Monday 06:00 UTC (every Pi can handle it independently)
- First job triggered immediately
2026-04-02 22:38:00 +02:00
Rene Fichtmueller
fe81b27248 fix: correct import paths in index-pi.ts (fs-com, tenGtek, utils/forecast-engine) 2026-04-02 09:36:51 +02:00
Rene Fichtmueller
6ccaa03932 feat: add index-pi.ts with all 44 workers for Pi fleet scraper nodes
Complete Pi scraper entry point covering all pricing, catalog, compat,
intelligence and prediction signal scrapers. Includes 5 new form-factor
coverage scrapers (comms-express, router-switch, multimode-inc,
optictransceiver, wiitek). Erik runs only API+DB, all scraping on Pis.
2026-04-02 09:34:05 +02:00
Rene Fichtmueller
f146ac873e feat: add 5 form-factor coverage scrapers with worker registrations
Add Comms-Express, Router-Switch.com, Multimode Inc, OpticTransceiver.com,
and Wiitek scrapers covering CFP2-DCO, CFP4, OSFP224, QSFP112, CXP, GBIC,
XENPAK, CSFP, SFP-DD, SFP56, QSFP56 and other previously-uncovered form
factors. Each scheduled every 8h. Worker registrations added to scheduler.

Also export db alias in utils/db.ts to fix eBay enricher + community scrapers
crashing with 'Cannot read properties of undefined (reading query)'.
2026-04-02 08:39:17 +02:00
Rene Fichtmueller
370c1d8801 feat: 6 prediction signal scrapers + forecast engine
New scrapers (all registered in pg-boss, 50 total jobs):
  - sec-edgar.ts       : SEC EDGAR XBRL API — hyperscaler CapEx from 10-Q/10-K
  - github-signals.ts  : GitHub Search/Stats API — tech adoption metrics weekly
  - ebay-velocity.ts   : eBay completed listings — sold count + price distribution
  - ai-clusters.ts     : RSS feeds (6 sources) — AI cluster & DC announcements
  - distributor-leads.ts : Mouser, Digi-Key, RS Components — lead time + stock
  - standards-tracker.ts : IEEE 802.3, OIF, IETF — draft/ballot/published status

New utilities:
  - forecast-engine.ts : Weighted signal aggregator → demand_index + price_direction
    6 signal types, 4 horizons (3/9/12/18 months), 5 technologies tracked

New DB tables (migration 022):
  hyperscaler_capex, distributor_lead_times, github_tech_signals,
  marketplace_velocity, ai_cluster_announcements, standards_activity,
  forecast_signals

Schedules:
  - EDGAR: weekly Mon 06:00
  - GitHub: weekly Sun 05:00
  - eBay velocity: every 12h
  - AI clusters: every 4h (news-speed)
  - Distributor leads: daily 03:30
  - Standards: weekly Wed 04:00
  - Forecast engine: daily 08:00 (after all nightly scrapers)
2026-04-02 02:02:44 +02:00
Rene Fichtmueller
c156e8d9f6 feat: download datasheets + manuals to Fearghas NAS in nightly sync
- downloadDocuments(): fetches PDFs from product_documents and documents tables
  using curl, organises into switches/ transceivers/ whitepapers/ other/ subdirs
- Integrated into runNightlyNasSync() — runs after JSON exports
- rsync incremental — only new/changed files transferred
- NAS dir structure: /volume1/tip-data/datasheets/{switches,transceivers,whitepapers,other}
- max-filesize 50MB guard per file
2026-04-02 01:47:16 +02:00
Rene Fichtmueller
5abe6397c4 feat: add logger utility + WireGuard setup in pi-scraper-setup.sh
- utils/logger.ts: minimal console-based logger (debug/info/warn/error)
  used by community-issues and ebay-enricher scrapers
- scripts/pi-scraper-setup.sh: step 7 adds optional WireGuard setup
  (pass WG_PRIVKEY + WG_ADDR env vars) — connects Pi to Erik for DB access
  auto-detects dead ethernet and routes WG traffic via working interface
2026-04-02 01:42:25 +02:00
Rene Fichtmueller
072978f1a4 feat: 24/7 scraping fleet — 8 new vendors + continuous schedule + Pi setup
New scrapers (8):
- BlueOptics (EUR, every 4h)
- ShopFiber24 (EUR, every 4h)
- T&S Communication (USD, every 4h)
- SmartOptics (catalog, every 8h)
- HUBER+SUHNER (catalog, every 8h)
- Skylane Optics (USD, every 4h)
- AscentOptics (USD, every 4h)
- GAO Tek (USD, every 4h)

Scheduler: nightly window → 24/7 continuous (42 jobs total)
- Playwright scrapers: every 8h (FS.com, 10Gtek, ATGBICS, ProLabs)
- Fetch/Cheerio: every 4h (11 lightweight vendors)
- Flexoptix catalog: every 2h (primary price source)
- eBay enrichment: every 6h
- Compatibility matrices: every 12h
- Compute jobs: every 4h

Pi fleet: scripts/pi-scraper-setup.sh for one-command Pi node setup
2026-04-02 01:09:05 +02:00
Rene Fichtmueller
1c1fb28189 fix: pre-create Crawlee subdirs to prevent ENOENT race in withIsolatedStorage
Remove orphan schedules (addon/naddod/qsfptek) that had no registered workers.
Pre-create request_queues/default, datasets/default, key_value_stores/default
before each scraper run to avoid ENOENT when Crawlee tries to write lock files.
2026-04-02 00:45:48 +02:00
Rene Fichtmueller
4f8170dc36 feat: register ALL scrapers in nightly 00:00-08:00 window (30 jobs)
Previously missing from scheduler:
- Champion ONE, Fluxlight, GBICs, SFPCables pricing
- Juniper HCT, SONiC HCL, Ufispace, Edgecore compatibility
- Flexoptix supported vendors
- Switch assets enrichment

Full nightly sequence now covers every scraper in the fleet.
All jobs staggered with 15-30 min gaps to respect vendor rate limits.
2026-04-01 23:39:08 +02:00
Rene Fichtmueller
48218a553d feat: nightly scraper window 00-08 + NAS Fearghas sync + procurement demo data
- All scrapers now run nightly 00:00-08:00 (staggered, every day)
- NAS sync module: rsync JSON exports + weekly pg_dump to Fearghas via WireGuard
- 07:45 daily: price_observations, switches, transceivers, signals, issues exported as JSON
- Migration 021: 200 ABC classifications, 150 reorder signals, 300 stock snapshots demo data
- 9 market intelligence entries (LightReading, FierceTelecom, Farnell, Mouser, EU TED, Arista)
- 6 lifecycle events (ZR, 800G OSFP, 100G DR4 price floor, SFP-10G-SR EOL)
2026-04-01 23:07:26 +02:00
Rene Fichtmueller
732d7c3246 fix: switch seed lifecycle_status casing (Active not active) 2026-04-01 22:50:10 +02:00
Rene Fichtmueller
4020ec77d9 feat: product intelligence layer — eBay enricher, community issues, datasheets+manuals API
- Migration 020: product_issues table, condition/marketplace on price_observations, features JSONB
- eBay enricher: switch features/description/refurb prices + transceiver condition pricing
- Community issues scraper: Reddit/ServeTheHome/Arista/Cisco community bug reports
- 7 pre-seeded issues (DCS-7800R3, SG350, QFX5120, CRS326, USW-Pro etc.)
- API: /switches/:id/issues + /switches/:id/documents endpoints
- Dashboard switch modal: features from DB, description, eBay refurb price, issues+docs async
- Datasheet finder for Arista/Cisco/Juniper/HPE vendor pages
- Scheduler: 4 new jobs (ebay enrichment nightly, community issues weekly)
2026-04-01 22:46:27 +02:00
Rene Fichtmueller
64074f988f feat: SMB/campus switch seed 26 models (Cisco/HPE/Ubiquiti/MikroTik/Netgear/Zyxel) + fix forecast.ts fiveYearProjection accessor 2026-04-01 22:34:58 +02:00
Rene Fichtmueller
681da54523 feat: Procurement Intelligence Engine (WS0c)
- Migration 019: stock_snapshots, abc_classification, reorder_signals,
  product_lifecycle_events, market_intelligence, crawler_llm_log tables
- Seeded 7 market intel events (OFC 2026, AWS/Azure CapEx, Coherent lead times,
  EU TED tenders, ECOC 2026, IEEE 802.3df)
- Seeded 4 lifecycle events (Cisco SFP-10G-LR EOL, Juniper EOL,
  400ZR ratified, 800G MSA draft)
- Crawler LLM: core.ts (Ollama-based extractor), stock-schema.ts (typed schemas
  + vendor profiles for Flexoptix/FS.com/10Gtek/ATGBICS/ProLabs/Farnell/Mouser),
  validator.ts (rule-based sanity checks + cross-validation)
- market-intelligence.ts scraper: OFC/ECOC, LightReading, IEEE 802.3, EU TED,
  Farnell/Mouser lead times, FierceTelecom — weekly via pg-boss
- computeAbcClassification(): dynamic A/B/C classification from price obs +
  compat count + vendor breadth
- computeReorderSignals(): buy_now/wait/hold/monitor with reasons + signal strength
- API: GET /api/procurement/overview|signals|signals/:id|abc|market-intel|
  stock-trends/:id|lifecycle
- Dashboard: Procurement Intel tab with Reorder Signals, ABC table,
  Market Intel cards, Lifecycle Events
2026-04-01 22:04:33 +02:00
Rene Fichtmueller
e4c89de6c0 feat: fs.com scraper Phase 2 — crawl product detail pages for verified specs
- New spec-updater utility: parseSpecTable() + updateVerifiedSpecs()
- fs.com scraper now has 2 phases:
  Phase 1: Category pages → prices + stock (existing)
  Phase 2: Product detail pages → fiber_type, connector, wavelength, power, image, datasheet
- Updates data_confidence from 'enriched_estimated' to 'scraped_unverified'
- Processes up to 200 product pages per scraper run
2026-03-31 09:18:27 +02:00
Rene Fichtmueller
a69acc4588 feat(v0.2.0): Sales Intelligence Engine — Phase 0+A
New API routes:
- GET /api/finder — Switch→Flexoptix transceiver finder with FlexBox coding
- GET /api/competitor-alerts — Competitor intelligence (price changes, new products, stock)
- GET /api/forecast/:technology — Sales forecast 3/9/12/18 months + buy/wait/hold signal
- POST /api/transport/plan — Transport system planner (city→city BOM with fiber providers)

New MCP tools:
- find_flexoptix_for_switch — Customer switch → Flexoptix products
- get_competitor_alerts — Competitor monitoring
- plan_transport — Network transport planning
- forecast_sales — Volume/revenue prediction
- generate_blog — Enhanced blog generation

New DB tables (migration 013):
- competitor_alerts, price_changes, flexoptix_product_map
- sales_forecasts, fiber_providers, fiber_routes, cities
- generated_datasheets, blog_series
- Views: v_price_coverage, v_image_coverage, v_switch_flexoptix_finder

Seed data (migration 014):
- 25 European cities with IX/DC locations + coordinates
- 15 fiber providers (euNetworks, Telia, DTAG, Colt, Zayo, etc.)
- 16 fiber routes with pricing (Germany focus)

Infrastructure:
- Scraper scheduler: 2h Flexoptix, 4h FS.com/Optcore (was 6-8h)
- Change detector for competitor price/stock monitoring
- Image downloader utility with coverage tracking
2026-03-31 08:51:22 +02:00