109 Commits

Author SHA1 Message Date
Rene Fichtmueller
e9b8cb95db feat(scraper): batch 35 OEM seeds — Sierra Wireless, Senao, EMCORE, Reflex Photonics
Added 4 new OEM transceiver catalog seed scrapers (75 PIDs total):
- sierra-wireless-oem.ts: 18 PIDs — RV55/RV50X/LX60 SFP/SFP+/QSFP+ incl. Industrial -40~85°C
- senao-oem.ts: 20 PIDs — EnGenius ECS switches 1G–100G SFP/SFP+/SFP28/QSFP28 + DAC
- emcore-oem.ts: 20 PIDs — ORION coherent ZR/ZR+/CFP2-DCO 400G + MIL-grade avionics
- reflex-photonics-oem.ts: 17 PIDs — LightABLE MIL-STD-810H + RAD-HARD space-grade

Scheduler: wired at 03:30/03:45/04:00/04:15 UTC. All 75 PIDs seeded to TIP DB.
2026-04-28 23:24:53 +02:00
Rene Fichtmueller
32d3ded169 feat: add Finisar, Acacia, Inphi OEM scrapers (batch 34)
- finisar-oem: 17 PIDs (FTLX/FTLC historical BoM series, 1G-100G, widely referenced)
- acacia-oem: 14 PIDs (AC400/AC1200 coherent CFP2-DCO/QSFP-DD/OSFP up to 1.2T)
- inphi-oem: 13 PIDs (ColorZ/COLORZ-II DWDM QSFP28/QSFP-DD + 800G OSFP)
- scheduler: wired all 3 at 02:45/03:00/03:15 UTC
2026-04-28 23:14:06 +02:00
Rene Fichtmueller
1023b24fd0 feat: add Black Box, Radiflow, DragonWave, Teledyne LeCroy OEM scrapers (batch 33)
- black-box-oem: 19 PIDs (enterprise LAN SFP/SFP+/SFP28/QSFP28 + BiDi + DAC)
- radiflow-oem: 17 PIDs (OT/ICS security, 100M-100G incl. substation BiDi, category=Industrial)
- dragonwave-oem: 17 PIDs (microwave backhaul fiber uplinks 100M-100G, market_status=Legacy)
- teledyne-lecroy-oem: 18 PIDs (T&M oscilloscopes/analyzers SFP+-QSFP-DD up to 400G ZR)
- scheduler: wired all 4 at 01:45/02:00/02:15/02:30 UTC
2026-04-28 23:07:26 +02:00
Rene Fichtmueller
7f59f445b6 feat: add Cambium Networks, Tektronix, Clearfield, Lanner OEM scrapers (batch 32)
- cambium-networks-oem: 18 PIDs (cnMatrix/PTP820 1G-100G + BiDi + DAC)
- tektronix-oem: 19 PIDs (T&M SFP/SFP+/SFP28/QSFP28/QSFP-DD up to 400G ZR coherent)
- clearfield-oem: 16 PIDs (FTTP/FTTx GPON/XGS-PON OLT+ONT + 1G-100G backhaul, heavy Telecom)
- lanner-oem: 20 PIDs (NFVI/uCPE 1G-100G + BiDi + DAC stack)
- scheduler: wired all 4 at 00:45/01:00/01:15/01:30 UTC
2026-04-28 23:04:08 +02:00
Rene Fichtmueller
22788db26b feat: add Rohde & Schwarz, L3Harris, Zhone OEM scrapers (batch 31)
- rohde-schwarz-oem: 19 PIDs (T&M optical modules, SFP/SFP+/SFP28/QSFP28/QSFP-DD up to 400G ZR coherent)
- l3harris-oem: 18 PIDs (MIL-grade ruggedized SFP/SFP+/SFP28/QSFP+/QSFP28, category=Industrial)
- zhone-oem: 18 PIDs (GPON/XGS-PON/EPON OLT+ONT plus 1G-100G uplinks, heavy Telecom set)
- scheduler: wired all 3 at 00:00/00:15/00:30 UTC with workers
2026-04-28 22:57:23 +02:00
Rene Fichtmueller
ab6888fec8 feat: add OEM seed scrapers batch 29-30 (8 vendors, 147 PIDs)
Adds scrapers for:
- AudioCodes (12 PIDs) — SBC/media gateway transceivers
- Anritsu (19 PIDs) — T&M platform optical modules
- NETSCOUT (19 PIDs) — nGenius probe + InfiniStream optics
- Curtiss-Wright (19 PIDs) — MIL-grade ruggedized transceivers
- ECI Telecom (18 PIDs) — DWDM/OTN/SONET carrier platform
- UTStarcom (17 PIDs) — GPON/XGS-PON/EPON broadband access
- Turbolink (23 PIDs) — Taiwanese OEM transceiver manufacturer
- Chelsio (20 PIDs) — iWARP RDMA NIC optical modules

Scheduler: 8 new cron slots 22:00-23:45 UTC daily.
DB: 12,937 → 13,084 transceivers, 181 → 189 vendors.
2026-04-27 00:44:18 +02:00
Rene Fichtmueller
d7144731e0 feat(scraper): add 100+ OEM seed scrapers + tip-llm-guided inference layer
New OEM transceiver seed scrapers (94 cron-scheduled, 24/7):
- Media/Broadcast: Evertz, Grass Valley, Haivision, Viasat
- Asian Optical: FiberHome, Oplink, Accelink, Hisense Broadband
- Optical Mfrs: Lumentum, II-VI/Coherent, Source Photonics, O-Net,
  InnoLight, AOI, Sumitomo Electric, NeoPhotonics
- Industrial: GE Grid, Schweitzer, Moxa Industrial, Cisco IE,
  Phoenix Contact, Beckhoff, Omron, ABB, Siemens, Schneider, Rockwell, Belden
- Enterprise/DC: Arista, Pica8, Pluribus, DriveNets, Cisco (Meraki/Catalyst/Nexus/ASR)
- Cloud: AWS, Azure, Google Cloud, Meta
- Storage: NetApp, Pure Storage, HPE Storage, IBM Storage, Dell Storage, Hitachi Vantara
- 5G/RAN: Samsung Networks, Nokia AirScale, Ericsson RAN, Mavenir
- Security: Check Point, Barracuda, Fortinet, Palo Alto
- Telecom Optical: ADVA, PacketLight, FiberHome, Accelink, Hisense

API: tip-llm-guided inference layer (strict schema + repair-retry + safe fallback)
- POST /api/tip-llm/infer|research-plan|extract|finding|health
- Hard JSON schema enforcement, create_finding=false on empty evidence
- Confidence gate (>= 0.4), validation with consistency check

Build: added incremental=true to scraper tsconfig (OOM prevention)
Scheduler: 87 → 94 registered workers
2026-04-27 00:00:14 +02:00
Rene Fichtmueller
4479429b29 feat: Brocade/RUCKUS ICX OEM seed (E1MG/E10G/E40G/E100G series, 29 PIDs) 2026-04-26 20:15:54 +02:00
Rene Fichtmueller
ad295a2b4b feat: NVIDIA/Mellanox OEM seed (Spectrum + LinkX portfolio)
36 PIDs: MFM/MMA/MCP LinkX series covering SFP/SFP+/SFP28/QSFP+/
QSFP28/QSFP56/QSFP-DD/OSFP + LinkX DAC 25G-400G. Includes
SN2000-SN5600 Spectrum switch transceivers. Scheduler: 05:45 daily.
2026-04-26 19:20:12 +02:00
Rene Fichtmueller
b3a2eff776 feat: Dell EMC + Extreme Networks OEM transceiver seeds
Dell EMC (34 PIDs): PowerSwitch OS10 + legacy Force10 naming,
1G-400G QSFP-DD + DAC. Maps to Dell Technologies vendor.

Extreme Networks (33 PIDs): Summit/ExtremeSwitching 10xxH part
numbers, 1G-400G QSFP-DD + DAC. Scheduler: 05:15 + 05:30 daily.
2026-04-26 19:18:31 +02:00
Rene Fichtmueller
1c1b8e1e9d feat: Huawei OEM transceiver seed + scheduler
Add 43 Huawei OEM PIDs covering CloudEngine/NetEngine platform:
SFP-GE/SFP+-10G/SFP28-25G/QSFP+-40G/QSFP28-100G/QSFP56-200G/
QSFP-DD-400G/OSFP-800G + DAC. Includes BOM alias codes.
Scheduler: daily 05:00.
2026-04-26 19:16:16 +02:00
Rene Fichtmueller
b4c8b9b625 feat: Nokia/Alcatel-Lucent OEM seed + scheduler
Add 41 Nokia OEM transceiver PIDs: SFP-1G/SFP-10G/SFP-25G/QSFP+-40G/
QSFP28-100G/QSFPDD-200G+400G + DWDM ZR/ZR+ + DAC. Includes legacy
3HExxxxxxxx alternate part numbers in notes field.
Scheduler: daily 04:45.
2026-04-26 19:14:27 +02:00
Rene Fichtmueller
51cee266f5 feat: HPE/Aruba OEM seed + Cisco TMG upsert fix
Add 43 HPE/Aruba OEM transceiver PIDs (J/JL/JH/R series — 1G through
400G QSFP-DD + DAC/AOC). Scheduler: daily 04:30.

Cisco TMG scraper: fixed market_status/temp_range constraint violations,
switched to always-upsert pattern. Result: 423 switches, 22476 Cisco
OEM transceivers, 22476 compat entries written to DB.

Update CHANGELOG_PENDING with all session data changes.
2026-04-26 19:12:27 +02:00
Rene Fichtmueller
c9a50ad551 feat: Juniper OEM seed scraper + BlueOptics HTTP/1.1 fix
Add 59 Juniper OEM transceiver PIDs (SFP/SFP+/SFP28/QSFP+/QSFP28/
QSFP56/QSFP-DD/OSFP + DAC/AOC) to seed the transceivers table.
Register scrape:catalog:juniper-oem in scheduler (daily 04:15).

Fix BlueOptics scraper: force HTTP/1.1 via Node.js https.get() to
bypass server bug where HTTP/2 returns empty response body. Also
update catalog path from /transceivers/ to /Transceivers_1.
2026-04-26 19:08:09 +02:00
Rene Fichtmueller
cc85d3d0f8 feat: Cisco OEM + Arista OEM transceiver catalog scrapers
- cisco-tmg.ts: upsert Cisco OEM transceivers from TMG API instead of
  SELECT-only. Parsers for formFactor/speed/reach/fiberType/tempRange.
  Fixes market_status ('EOL') + temp_range ('COM'/'IND') check constraints.
- arista-oem.ts: seed scraper for 69 Arista OEM PIDs (1G→800G,
  SFP/SFP28/QSFP+/QSFP28/QSFP-DD/OSFP/QSFP-DD800) with full specs.
- scheduler.ts: daily arista-oem seed at 04:00 UTC
2026-04-26 19:00:21 +02:00
Rene Fichtmueller
ba998f4c01 fix: vendor_compat 0%→100%, price denorm, wiitek disabled, price-denorm scheduler
- Migration 094: images for 12 Cisco 8K MPA + A9K-8HG-FLEX + ASR-9000V models
- Migration 095: price denorm refresh (EUR 679→1376, USD 166→835 with 180d window)
- Migration 096: bulk vendor_compat by form_factor — all 9013 transceivers now
  have OEM compatibility patterns (was 0/9013 because all slugs are scraped-*)
- wiitek.ts: disable dead scraper (wiitek.com unreachable since 2026-04, EAI_AGAIN)
- scheduler.ts: add compute:price-denorm job (daily 05:30 UTC) to keep
  street_price_usd/price_verified_eur fresh without manual migration runs
- seed-from-npm.ts: ON CONFLICT now also updates vendor_compat (was only updated_at)
2026-04-25 08:55:21 +02:00
Rene Fichtmueller
bbc6f560dd fix: add image filter patterns and direct URL migrations for 6 vendors
- switch-image-playwright.ts + switch-image-fetcher.ts: add filter patterns
  for /webimage-404/ (Netgear 404 hero), /Brand/ + /cybersecurity.png/
  (Moxa brand marketing images not product photos)
- sql/047: Moxa 4/4 models — CDN getattachment paths (hotlink-protected,
  Referer: moxa.com required; R2 proxy needed for production display)
- sql/048: UfiSpace 6/6 models — ufispace.com/image/<hash>/ direct PNGs;
  Brocade G720+G730 — broadcom.com og:image; ICX 7850-48FS — CommScope/Ruckus
  vistancenetworks.com ImageServer (rand param is cache-bust only, not auth)
- sql/049: NVIDIA SN-series 6/6 — docscontent.nvidia.com (SN2201/3700/4700)
  and S3 direct (SN5400/5600); SN3750-SX via uvation reseller CDN
2026-04-21 07:57:55 +02:00
Rene Fichtmueller
b65e4452db fix: add error-graphic, icon-library, illustration filters to GENERIC_IMAGE_PATTERNS
- /404[-_]error/i, /error[-_]graphic/i — Broadcom 404-ERROR-GRAPHIC.png
- /\/icon[-_]library\//i — D-Link navigation/icon-library path images
- /[-_]illustration[._]/i — Arista Cloud-Legacy_Illustration and similar diagrams
- Nokia banner, Huawei marketing, banners/ path patterns (Playwright scraper)
- Cookie consent patterns synced to switch-image-fetcher.ts (was only in Playwright)
2026-04-21 07:38:01 +02:00
Rene Fichtmueller
f4afe14af4 feat: add 12 new vendor URL builders to Playwright image scraper
- Nokia, Huawei, Ciena, Moxa, D-Link, Alcatel-Lucent Enterprise,
  Asterfusion, Brocade: passthrough builders (use stored product_page_url)
- NVIDIA Networking: SN-series URL builder (sn5600 → /ethernet-switching/sn5600/)
- Netgear: lowercase model slug builder for /business/wired/switches/fully-managed/
- UfiSpace: hardcoded sitemap-verified URL map (all 6 S9xxx models)
- QCT: hardcoded URL map for T3048-LY8 and T7032-IX1
- Add Nokia banner / Huawei marketing image patterns to GENERIC_IMAGE_PATTERNS
2026-04-21 07:24:11 +02:00
Rene Fichtmueller
8f36eff956 fix(scraper): filter OneTrust/cookie-consent images + skip in img fallback
cdn.cookielaw.org logos appear as the largest DOM image on Dell/Extreme
product pages when the cookie consent overlay is present. Added to both
GENERIC_IMAGE_PATTERNS (isGenericImage filter) and img fallback skipPattern
so the next-largest actual product image can be found.
2026-04-21 06:45:41 +02:00
Rene Fichtmueller
d67fbe31da fix(scraper): fall through to img fallback when og:image is generic/logo
Previously: if og:image existed (even as a Dell logo URL), page.evaluate() returned
early and the img fallback was never tried. Now: meta tags are extracted first, then
isGenericImage() is checked in Node.js, and the img fallback runs if meta image is null
or generic. This allows vendors like Dell (og:image = logo) to still get product images
via the DOM fallback.
2026-04-21 06:36:12 +02:00
Rene Fichtmueller
09d3a60b7c fix(scraper): fix Edgecore/Extreme URL builders, broaden img fallback, fix ENOENT
- buildEdgecoreUrl: /product/<slug>/ (WooCommerce, no .html) with EDGECORE_SLUG_MAP
  for AS7712-32X→as7712-32x-ec, Minipack2→minipack-as8000-open-modular-platform
- buildFortinetUrl: returns null (all pages redirect to generic, no usable og:image)
- buildExtremeUrl: direct product URL (extremenetworks.com/product/<slug>)
- img fallback: remove strict 'product/switch/router/hardware' path requirement;
  now takes largest image >=200x150px excluding flags/icons/spinners — isGenericImage()
  filters hero/banner/logo afterward
- ENOENT fix: unique per-run Crawlee storage dir (timestamp suffix) prevents
  stale request-queue file contamination between back-to-back vendor runs
2026-04-21 06:33:32 +02:00
Rene Fichtmueller
87b9416592 fix(scraper): fix Arista series-level URL builder + bypass Crawlee URL deduplication
- buildAristaUrl() now extracts series prefix (7060X5-32QS → 7060x5-series)
  instead of individual model URLs that lack og:image
- Strip trailing sub-variant 'A' so R3A → R3 series page
- Add uniqueKey: row.id to each request — prevents Crawlee from deduplicating
  models that share the same series URL (e.g. 7060x5-series)
- For Arista: always prefer fresh builder URL over stored product_page_url
  so stale individual-model URLs don't override correct series pages
2026-04-21 06:22:41 +02:00
Rene Fichtmueller
18a9e1346e feat: Playwright image scraper for bot-blocked vendors (Arista/Dell/Edgecore/Fortinet/Extreme) 2026-04-21 06:16:05 +02:00
Rene Fichtmueller
653824f23b fix: Cisco line card URL mapping (8800/84/86 → 8000 family page, skip ASR9K logo-only) 2026-04-21 00:49:32 +02:00
Rene Fichtmueller
c9333ab5ea fix: MikroTik hardcoded slug map for + models (crs305/312/317/326) 2026-04-21 00:45:41 +02:00
Rene Fichtmueller
9618a4f0e0 fix: Cisco 8000 builder URL + MikroTik lowercase + new vendor builders
URL builder fixes:
- Cisco 8000: update to new /site/us/en/ URL scheme (family page, not per-model)
- MikroTik: fix to lowercase+underscore format (was uppercase, caused 404)
- Fortinet: set to null — JS-rendered pages, all redirect to generic page
- Alcatel-Lucent Enterprise slug added to dispatcher (was missing, caused 0 hits)
- Add Quanta, Allied Telesis, Ufispace, Netgear URL builders
- NVIDIA: skip ConnectX/BlueField non-switch models

Migration 044:
- Clear 35 wrong NCS-5500 URLs from Cisco 8000-series models
- Pre-set correct 8000-series family URL for 21 models without images
2026-04-21 00:41:31 +02:00
Rene Fichtmueller
9e6be570a3 feat: more switch image coverage + system health metrics + Erik monitor
switch-image-fetcher:
- Add Fortinet URL builder (11 FortiSwitch models)
- Add Quanta Cloud Technology, Allied Telesis, Ufispace, Netgear URL builders
- Fix alcatel-lucent-enterprise slug missing from URL_BUILDERS dispatcher
- Fix NVIDIA builder to skip ConnectX/BlueField adapters (not switches)
- Add aruba slug alias for hpe-aruba

health endpoint:
- Add system metrics: CPU load (1/5/15m), memory usage, disk usage
- Add load_status indicator (ok/busy/overloaded)
- Expose process RSS memory
- Used by external monitors

scripts/monitor-erik.sh:
- Cron-ready health check script for Claudi (.82) and Raspberry Pis
- Checks TIP API health endpoint (load, memory, disk, DB latency)
- Checks PM2 process state via SSH (errored/stopped detection)
- ntfy.sh push notifications (set NTFY_TOPIC env var)
- Includes systemd service + timer unit comments for auto-install
2026-04-21 00:31:43 +02:00
Rene Fichtmueller
823b64bd24 perf: load-aware scraper guard + higher rate limits + /tmp crawlee storage 2026-04-20 23:35:02 +02:00
Rene Fichtmueller
a2492d833b feat: Flexoptix order section per switch + reject generic/logo images 2026-04-20 23:31:36 +02:00
Rene Fichtmueller
ab059c2fd1 fix(community-issues): scrapeTransceiverCompatIssues falls back to ports_config when no compat entries 2026-04-20 23:00:00 +02:00
Rene Fichtmueller
4bf5c95824 feat: Flexoptix compatibility scraper + transceiver issue scanner
- Add flexoptix-compat.ts: maps switch models to compatible Flexoptix transceivers
  via search API (vendor_compat) with form-factor fallback (spec_match)
  Scheduled daily at 09:00 UTC as scrape:compat:flexoptix
- Enhance community-issues.ts: add vendor advisory sources (Cisco Field Notices,
  Juniper KB, SONiC GitHub Issues) + new scrapeTransceiverCompatIssues() that
  searches for switch+transceiver combination problems specifically
- Scheduler: 59 schedules, 78 workers
2026-04-20 22:50:57 +02:00
Rene Fichtmueller
a0a7a97d83 feat: switch image fetcher + og:image scheduler job + dashboard thumbnail column
- Add switch-image-fetcher.ts: og:image-based image discovery for all 86 seeded switches
  (covers Cisco, Arista, Juniper, NVIDIA, Edgecore, Celestica, Asterfusion, Dell,
   HPE/Aruba, Huawei, Nokia, Extreme, MikroTik, Ubiquiti, FS.COM, Supermicro)
- Wire fetchSwitchImages() into scheduler as scrape:images:switches (daily 08:30 UTC)
- Dashboard: add 48px thumbnail column to switch table (lazy img with gear icon fallback)
2026-04-20 22:44:08 +02:00
Rene Fichtmueller
aa91798e8d fix(vcelink): resolve TS 5.9 narrowing quirk with explicit cast in dead code
price?: number narrowing via typeof/!== undefined does not work for
arithmetic comparisons in TypeScript 5.9 dead code paths; use 'as number'
cast to keep the dead code compilable while the early-return guard above
prevents runtime execution entirely.
2026-04-20 22:18:13 +02:00
Rene Fichtmueller
1aba912a15 fix(scrapers): fix ATGBics theme migration, NADDOD URL, disable VCELink
- ATGBics: update HTML parser from old card--product theme to new
  card__info theme (Shopify template changed April 2026); name now
  extracted from href link text instead of aria-label
- NADDOD: correct ensureVendor shop URL from /collections/transceivers
  (404) to /collection/optical-transceivers
- VCELink: disable scraper — site pivoted from optical transceivers to
  audio/video/cable products; all collection URLs return 404
2026-04-20 22:11:24 +02:00
Rene Fichtmueller
b0ed54f386 feat: register fiber24 + fibermall in index, move atgbics to fetch-only section 2026-04-18 22:50:52 +02:00
Rene Fichtmueller
cb5a587d7e feat: rewrite ATGBICS scraper — static HTML, correct collection handles, GBP cookie
- Replaces Playwright with pure fetch() — static HTML has prices
- Correct collection handles (compatible-transceivers-sfpp-10g etc.)
- Cookie: cart_currency=GBP forces GBP pricing from any geo-IP
- Handles 35+ pages per category × 24 products = 840+ SFP+ products
- No IP-blocking with static HTML (Playwright was the trigger)
- Adds scripts/run-atgbics-mac.sh for Mac-side runner if needed
2026-04-18 22:48:29 +02:00
Rene Fichtmueller
785a6731ab fix: fiber24 stockLevel on_request (was unknown — violated DB constraint) 2026-04-18 22:26:45 +02:00
Rene Fichtmueller
d4ad9f4641 fix: ShopFiber24 sitemap-based scraping + Fibermall image extraction
ShopFiber24 (fiber24.ts):
- Complete rewrite: was using JS-rendered catalog (all prices = 0)
- New strategy: fetch sitemap_0.xml.gz → 310 product DE-URLs
- Each product page has Schema.org microdata: itemprop=price, sku, image
- Extracts: price (minPrice), SKU, image_url, name, specs
- Rate: 1 req/1.5s, no Playwright needed

FiberMall (fibermall.ts):
- Add imageUrl to Product interface
- Extract first fibermall.com/photo/*.jpg from product listing card
- Write image_url to transceivers table (has_image=true) on upsert
- SKU variants share parent product image
- 304 FiberMall transceivers will get images on next scraper run
2026-04-18 22:20:57 +02:00
Rene Fichtmueller
1da4abc488 fix: FS.com price extraction — DOM-based prices + shipping-context exclusion
- All 247 FS.com prices were €79 (shipping threshold, not product prices)
- Root cause: 'Gratis Versand ab 79 € (ohne MwSt.)' banner matched first
- Fix 1: DOM price extraction in page.evaluate with bad-parent skip list
- Fix 2: bodyText qualified patterns skip matches near shipping keywords
- Fix 3: waitForSelector for price DOM element before evaluate
- Fix 4: Deleted 247 invalid €79 observations from DB

Also included from previous session:
- db.ts: set has_image=true on image writes (fix 632 desync rows)
- spec-updater.ts: DR/FR/LR/ER/ZR → SMF, SR → MMF fiber type inference
2026-04-18 13:10:35 +02:00
Rene Fichtmueller
48adcd3fc9 fix: skip Optcore on Erik — Cloudflare blocks datacenter IP
optcore.net blocks Erik's IP (82.165.222.127) via Cloudflare WAF.
WP REST API returns HTML block page instead of JSON → 0 product URLs
→ 0 scraped pages every run. Add SKIP_OPTCORE_SCRAPER guard matching
the existing SKIP_FS_SCRAPER pattern. Set in ecosystem.config.js on
Erik. Residential IP (Mac launchd) would be needed to use this scraper.
2026-04-18 05:41:56 +02:00
Rene Fichtmueller
e11e351f5e fix: crawlee-config clear request queue on each run
Crawlee's FileSystemStorage marks request URLs as HANDLED (state=4,
orderNo=null) after processing. With purgeOnStart=false these entries
persist, so on the next run crawler.run(startUrls) deduplicates them
→ requestsTotal=0 → immediate finish with 0 scraped pages.

Fix: rmSync request_queues/default/ before each makeCrawleeConfig()
call. Safe: session pool state lives in key_value_stores/, not in
request_queues/. Affects all Crawlee-based scrapers (ATGBICS, Optcore,
Switch-assets, etc.).
2026-04-18 05:37:45 +02:00
Rene Fichtmueller
fcdd258369 fix: 10Gtek scraper now fetches prices from sfpcables.com
10gtek.com main site only exposes technical spec tables with no prices.
sfpcables.com is 10Gtek's own retail store and has both Model numbers
and USD prices in standard Magento product listings.

Changes:
- Switch scraping target from www.10gtek.com to sfpcables.com
- Parse Model: <part> + US.XX per product block (Magento structure)
- XFP fallback: extract part number from title after '|' separator
- Add fetchAllPages() with Magento loop-detection via seen-part dedup
- Remove QSFP-DD category (not available on sfpcables.com)
- Drop XFP-less categories from old 10gtek.com spec-table parser

Verified: 10/10 SFP prices, 10/10 SFP+ prices, 4/4 XFP prices on live site.
2026-04-18 05:27:49 +02:00
Rene Fichtmueller
2a6ec90ecd fix: fs-com Phase 1+2 crawler.run() ENOENT guard — Crawlee catches and re-throws the post-run _isTaskReadyFunction ENOENT internally, which rejected crawler.run() and aborted Phase 2 before it could start. Wrap both crawler.run() calls in try/catch to swallow ENOENT from request_queues paths; all processing is already complete at this point. 2026-04-18 03:52:49 +02:00
Rene Fichtmueller
93d825dc04 fix: daemon stability + health monitor accuracy
- Add global unhandledRejection handler in scheduler daemon to swallow
  Crawlee's benign post-run ENOENT lock-file races (prevents process.exit(1))
- Add SKIP_FS_SCRAPER env var: skip FS.com worker on Erik where Cloudflare
  WAF blocks datacenter IPs (Mac launchd handles FS.com from residential IP)
- Remove FS.COM from health monitor EXPECTED_VENDORS (skipped on Erik)
- Health monitor: extend pg-boss lookup from 12h → 26h, add completed-job
  map; if job ran OK in last 26h + vendor has historical prices → mark
  STABLE instead of CRITICAL (fixes ATGBICS/Fluxlight hash-dedup false positives)
- Install Playwright Chromium on Erik (fixes ATGBICS BrowserLaunchError)
- Create missing Crawlee storage dirs on Erik (storage-fs-phase1/2,
  storage-ebay-transceivers) to prevent ENOENT on first Crawlee run
2026-04-18 03:16:59 +02:00
Rene Fichtmueller
8391b194a5 fix: GBICS scraper — fall back to aria-label-first pattern when href-first finds no priced products
Pattern 1 (href→aria-label) finds 127 navigation links on GBICS BigCommerce
pages — none contain GBP prices. Pattern 2 (aria-label→href) correctly
finds 16-30 product links per category page with £XX.XX prices in aria-labels.
The fallback from P1 to P2 now triggers when P1 finds results but none
contain '£', rather than only when P1 finds 0 total results.
2026-04-18 03:02:39 +02:00
Rene Fichtmueller
24ff9822ac fix: improve scraper health monitor — tiered alerts, suppress stable-price false positives
Previous logic fired an alert whenever prices_6h=0, even when prices
were genuinely stable (content hash dedup prevents duplicate inserts).
This caused Flexoptix, ATGBICS and others to trigger alerts every 3h
despite their scrapers running successfully.

New logic:
  🔴 CRITICAL: last price > 7 days (genuine failure)
  🟡 WARNING:  last price 48h–7 days (possibly stale)
   STABLE:   last price ≤48h, 0 new (prices unchanged, scraper OK)

Also shows pg-boss job state/time alongside each vendor for faster
root-cause diagnosis. Trimmed EXPECTED_VENDORS to vendors with actual
scraper implementations (removed never-scraped placeholders).
2026-04-18 02:54:28 +02:00
Rene Fichtmueller
e552e08015 fix: suppress Crawlee post-run ENOENT unhandledRejection in fs-com.ts
After PlaywrightCrawler.run() resolves, Crawlee's internal task loop
schedules one final _isTaskReadyFunction call that tries to read a
request queue .json file already cleaned up during processing. This
ENOENT fires as an unhandledRejection and calls process.exit(1),
aborting Phase 2 before prices are written to the database.

Added a targeted unhandledRejection handler in the require.main block
that swallows ENOENT errors from request_queues paths (benign Crawlee
cleanup race) while re-raising all other rejections.
2026-04-18 02:51:00 +02:00
Rene Fichtmueller
419af4a24e fix: remove all withIsolatedStorage wrappers, add makeCrawleeConfig to remaining Crawlee scrapers
- scheduler.ts: remove withIsolatedStorage from ALL scrapers (atgbics,
  optcore, ufispace, edgecore, ebay-*, market-intel, community-issues,
  cisco, juniper, sonic, 10gtek, prolabs, switch-assets, fs)
  eliminates global CRAWLEE_STORAGE_DIR race condition entirely
- fs-com.ts: replace purgeDefaultStorages() with rmSync on isolated
  storage dirs (fs-phase1, fs-phase2); pass makeCrawleeConfig to both
  PlaywrightCrawler instances
- switch-assets-crawler.ts: add makeCrawleeConfig('switch-assets')
- switch-assets-playwright.ts: add makeCrawleeConfig('switch-assets-playwright')
- naddod.ts: restore clean error logging (remove debug instrumentation)
2026-04-18 02:19:53 +02:00
Rene Fichtmueller
d9e5331161 debug: widen NADDOD error slice to 300 chars, add pre-insert logging 2026-04-18 02:00:03 +02:00