97 Commits

Author SHA1 Message Date
Rene Fichtmueller
ef225c7dc5 fix: revalidate flexoptix fs prices and images 2026-05-09 05:13:37 +02:00
Rene Fichtmueller
57e20efe49 fix: NADDOD price extraction — read from LD+JSON offers.price
NADDOD uses LD+JSON for pricing (Astro/Shopify structure):
  {"offers":{"price":"731.00","priceCurrency":"USD",...}}

Old regex (/US$\s*.../) never matched → all 132 price obs were lucky
text matches, not systematic. Now: parse all ld+json blocks first,
fall back to regex.

Also broaden sitemap URL regex to capture new-style URLs without .html:
  /products/nvidia-networking/102612 (was being missed)
2026-05-06 23:55:55 +02:00
Rene Fichtmueller
1a7c928120 fix: FS.COM price extraction — use .no_tax/.price CSS selectors
FS.com changed their HTML structure; compound class names are gone.
Current layout (verified 2026-05-06):
  <div class="no_tax">5,10 € ohne MwSt.</div>  ← B2B net price (preferred)
  <div class="price">6,07 €</div>               ← gross fallback
  <div class="standard_price">6,07 €</div>      ← gross fallback

Old selectors ([class*='price-value'] etc.) matched nothing → all prices
stored as €? null. New .no_tax first gives us the correct net/B2B price.
2026-05-06 23:45:30 +02:00
Rene Fichtmueller
a8529d166b fix: resolve TS build errors — export backfillImages, add writeRobotExperience
- backfill-images.ts: rename main() → export backfillImages() to match index.ts import
- training-data-writer.ts: add writeRobotExperience export; remove hardcoded Gitea token
- fiber24.ts/fibermall.ts: scraper improvements from previous sessions
- image-downloader.ts/spec-updater.ts: utility updates
- robots/: add verification robots module
2026-05-06 23:39:00 +02:00
Rene Fichtmueller
5a77fce9f3 feat: NADDOD cursor rotation — covers all 7300+ URLs across 12 runs (24h)
Previously always sliced first 600 URLs from sitemap, missing 6700+ products.
Now stores offset in naddod-cursor.json, advances by 600 per run with wrap-around.
Full sitemap coverage in ~13 runs (26h). Also adds TIP_STORAGE_DIR env support.
2026-05-06 23:26:58 +02:00
Rene Fichtmueller
efb0c24a19 feat: rewrite ATGBICS scraper to use Shopify products.json API
Static HTML collection pages return wrong results (all redirect to same 9 products).
Switch to /collections/{handle}/products.json?limit=250&page=N API which is:
- Reliable JSON (no HTML parsing)
- Correct per-collection product lists
- Clean pagination (stop at < limit results)
- Covers 11 key transceiver collections (1G, 10G, 25G, 40G, 100G, 400G)
2026-05-06 23:17:46 +02:00
Rene Fichtmueller
5c882c3a46 fix: refresh stale price observations after 7 days + fix ATGBICS pagination wrap-around
- upsertPriceObservation: insert new observation if last one is >7 days old,
  even when price (content_hash) hasn't changed — keeps timeseries data fresh
- ATGBICS: detect Shopify catalog wrap-around by tracking per-category seen URLs;
  stop pagination when all products on a page were already seen in a prior page
- ATGBICS: improve hasNextPage to match &page=N anchored in href params
2026-05-06 23:11:15 +02:00
Rene Fichtmueller
2466cc5d82 feat(scraper): batch 37 OEM seeds — Extreme (Legacy), Nortel, 3Com, Avaya
Added 4 legacy OEM transceiver catalog seed scrapers (72 PIDs total):
- extreme-legacy-oem.ts: 18 PIDs — Summit/BlackDiamond 10052H/10318/10325 family, Legacy
- nortel-legacy-oem.ts: 18 PIDs — Passport/BayStack AA1419xxx + XFP, incl. GBIC, Legacy
- 3com-legacy-oem.ts: 18 PIDs — Switch 5500/7750 3C17770/3CSFP9x + XFP/GBIC, Legacy
- avaya-legacy-oem.ts: 18 PIDs — ERS/VSP AA1419xxx + 700480xxx QSFP28, Legacy

Scheduler: wired at 05:30/05:45/06:00/06:15 UTC. All 72 PIDs seeded clean.
2026-04-28 23:31:13 +02:00
Rene Fichtmueller
e684d3d1c3 feat(scraper): batch 36 OEM seeds — EnGenius, Palo Alto Networks, Brocade, Foundry Networks
Added 4 new OEM transceiver catalog seed scrapers (72 PIDs total):
- engenius-oem.ts: 18 PIDs — ECS switch series 1G–100G SFP/SFP+/SFP28/QSFP28 + DAC/AOC
- paloalto-networks-oem.ts: 18 PIDs — PA-3200/5200/7000/5450 NGFW SFP/SFP+/SFP28/QSFP28 + DAC
- brocade-legacy-oem.ts: 18 PIDs — ICX/FCX/VDX/MLX E1MG/10G-SFPP family, market_status=Legacy
- foundry-networks-oem.ts: 18 PIDs — FastIron/NetIron FDR- series incl. XFP, market_status=Legacy

Scheduler: wired at 04:30/04:45/05:00/05:15 UTC. All 72 PIDs seeded clean.
2026-04-28 23:28:10 +02:00
Rene Fichtmueller
e9b8cb95db feat(scraper): batch 35 OEM seeds — Sierra Wireless, Senao, EMCORE, Reflex Photonics
Added 4 new OEM transceiver catalog seed scrapers (75 PIDs total):
- sierra-wireless-oem.ts: 18 PIDs — RV55/RV50X/LX60 SFP/SFP+/QSFP+ incl. Industrial -40~85°C
- senao-oem.ts: 20 PIDs — EnGenius ECS switches 1G–100G SFP/SFP+/SFP28/QSFP28 + DAC
- emcore-oem.ts: 20 PIDs — ORION coherent ZR/ZR+/CFP2-DCO 400G + MIL-grade avionics
- reflex-photonics-oem.ts: 17 PIDs — LightABLE MIL-STD-810H + RAD-HARD space-grade

Scheduler: wired at 03:30/03:45/04:00/04:15 UTC. All 75 PIDs seeded to TIP DB.
2026-04-28 23:24:53 +02:00
Rene Fichtmueller
32d3ded169 feat: add Finisar, Acacia, Inphi OEM scrapers (batch 34)
- finisar-oem: 17 PIDs (FTLX/FTLC historical BoM series, 1G-100G, widely referenced)
- acacia-oem: 14 PIDs (AC400/AC1200 coherent CFP2-DCO/QSFP-DD/OSFP up to 1.2T)
- inphi-oem: 13 PIDs (ColorZ/COLORZ-II DWDM QSFP28/QSFP-DD + 800G OSFP)
- scheduler: wired all 3 at 02:45/03:00/03:15 UTC
2026-04-28 23:14:06 +02:00
Rene Fichtmueller
1023b24fd0 feat: add Black Box, Radiflow, DragonWave, Teledyne LeCroy OEM scrapers (batch 33)
- black-box-oem: 19 PIDs (enterprise LAN SFP/SFP+/SFP28/QSFP28 + BiDi + DAC)
- radiflow-oem: 17 PIDs (OT/ICS security, 100M-100G incl. substation BiDi, category=Industrial)
- dragonwave-oem: 17 PIDs (microwave backhaul fiber uplinks 100M-100G, market_status=Legacy)
- teledyne-lecroy-oem: 18 PIDs (T&M oscilloscopes/analyzers SFP+-QSFP-DD up to 400G ZR)
- scheduler: wired all 4 at 01:45/02:00/02:15/02:30 UTC
2026-04-28 23:07:26 +02:00
Rene Fichtmueller
7f59f445b6 feat: add Cambium Networks, Tektronix, Clearfield, Lanner OEM scrapers (batch 32)
- cambium-networks-oem: 18 PIDs (cnMatrix/PTP820 1G-100G + BiDi + DAC)
- tektronix-oem: 19 PIDs (T&M SFP/SFP+/SFP28/QSFP28/QSFP-DD up to 400G ZR coherent)
- clearfield-oem: 16 PIDs (FTTP/FTTx GPON/XGS-PON OLT+ONT + 1G-100G backhaul, heavy Telecom)
- lanner-oem: 20 PIDs (NFVI/uCPE 1G-100G + BiDi + DAC stack)
- scheduler: wired all 4 at 00:45/01:00/01:15/01:30 UTC
2026-04-28 23:04:08 +02:00
Rene Fichtmueller
22788db26b feat: add Rohde & Schwarz, L3Harris, Zhone OEM scrapers (batch 31)
- rohde-schwarz-oem: 19 PIDs (T&M optical modules, SFP/SFP+/SFP28/QSFP28/QSFP-DD up to 400G ZR coherent)
- l3harris-oem: 18 PIDs (MIL-grade ruggedized SFP/SFP+/SFP28/QSFP+/QSFP28, category=Industrial)
- zhone-oem: 18 PIDs (GPON/XGS-PON/EPON OLT+ONT plus 1G-100G uplinks, heavy Telecom set)
- scheduler: wired all 3 at 00:00/00:15/00:30 UTC with workers
2026-04-28 22:57:23 +02:00
Rene Fichtmueller
ab6888fec8 feat: add OEM seed scrapers batch 29-30 (8 vendors, 147 PIDs)
Adds scrapers for:
- AudioCodes (12 PIDs) — SBC/media gateway transceivers
- Anritsu (19 PIDs) — T&M platform optical modules
- NETSCOUT (19 PIDs) — nGenius probe + InfiniStream optics
- Curtiss-Wright (19 PIDs) — MIL-grade ruggedized transceivers
- ECI Telecom (18 PIDs) — DWDM/OTN/SONET carrier platform
- UTStarcom (17 PIDs) — GPON/XGS-PON/EPON broadband access
- Turbolink (23 PIDs) — Taiwanese OEM transceiver manufacturer
- Chelsio (20 PIDs) — iWARP RDMA NIC optical modules

Scheduler: 8 new cron slots 22:00-23:45 UTC daily.
DB: 12,937 → 13,084 transceivers, 181 → 189 vendors.
2026-04-27 00:44:18 +02:00
Rene Fichtmueller
d7144731e0 feat(scraper): add 100+ OEM seed scrapers + tip-llm-guided inference layer
New OEM transceiver seed scrapers (94 cron-scheduled, 24/7):
- Media/Broadcast: Evertz, Grass Valley, Haivision, Viasat
- Asian Optical: FiberHome, Oplink, Accelink, Hisense Broadband
- Optical Mfrs: Lumentum, II-VI/Coherent, Source Photonics, O-Net,
  InnoLight, AOI, Sumitomo Electric, NeoPhotonics
- Industrial: GE Grid, Schweitzer, Moxa Industrial, Cisco IE,
  Phoenix Contact, Beckhoff, Omron, ABB, Siemens, Schneider, Rockwell, Belden
- Enterprise/DC: Arista, Pica8, Pluribus, DriveNets, Cisco (Meraki/Catalyst/Nexus/ASR)
- Cloud: AWS, Azure, Google Cloud, Meta
- Storage: NetApp, Pure Storage, HPE Storage, IBM Storage, Dell Storage, Hitachi Vantara
- 5G/RAN: Samsung Networks, Nokia AirScale, Ericsson RAN, Mavenir
- Security: Check Point, Barracuda, Fortinet, Palo Alto
- Telecom Optical: ADVA, PacketLight, FiberHome, Accelink, Hisense

API: tip-llm-guided inference layer (strict schema + repair-retry + safe fallback)
- POST /api/tip-llm/infer|research-plan|extract|finding|health
- Hard JSON schema enforcement, create_finding=false on empty evidence
- Confidence gate (>= 0.4), validation with consistency check

Build: added incremental=true to scraper tsconfig (OOM prevention)
Scheduler: 87 → 94 registered workers
2026-04-27 00:00:14 +02:00
Rene Fichtmueller
4479429b29 feat: Brocade/RUCKUS ICX OEM seed (E1MG/E10G/E40G/E100G series, 29 PIDs) 2026-04-26 20:15:54 +02:00
Rene Fichtmueller
ad295a2b4b feat: NVIDIA/Mellanox OEM seed (Spectrum + LinkX portfolio)
36 PIDs: MFM/MMA/MCP LinkX series covering SFP/SFP+/SFP28/QSFP+/
QSFP28/QSFP56/QSFP-DD/OSFP + LinkX DAC 25G-400G. Includes
SN2000-SN5600 Spectrum switch transceivers. Scheduler: 05:45 daily.
2026-04-26 19:20:12 +02:00
Rene Fichtmueller
b3a2eff776 feat: Dell EMC + Extreme Networks OEM transceiver seeds
Dell EMC (34 PIDs): PowerSwitch OS10 + legacy Force10 naming,
1G-400G QSFP-DD + DAC. Maps to Dell Technologies vendor.

Extreme Networks (33 PIDs): Summit/ExtremeSwitching 10xxH part
numbers, 1G-400G QSFP-DD + DAC. Scheduler: 05:15 + 05:30 daily.
2026-04-26 19:18:31 +02:00
Rene Fichtmueller
1c1b8e1e9d feat: Huawei OEM transceiver seed + scheduler
Add 43 Huawei OEM PIDs covering CloudEngine/NetEngine platform:
SFP-GE/SFP+-10G/SFP28-25G/QSFP+-40G/QSFP28-100G/QSFP56-200G/
QSFP-DD-400G/OSFP-800G + DAC. Includes BOM alias codes.
Scheduler: daily 05:00.
2026-04-26 19:16:16 +02:00
Rene Fichtmueller
b4c8b9b625 feat: Nokia/Alcatel-Lucent OEM seed + scheduler
Add 41 Nokia OEM transceiver PIDs: SFP-1G/SFP-10G/SFP-25G/QSFP+-40G/
QSFP28-100G/QSFPDD-200G+400G + DWDM ZR/ZR+ + DAC. Includes legacy
3HExxxxxxxx alternate part numbers in notes field.
Scheduler: daily 04:45.
2026-04-26 19:14:27 +02:00
Rene Fichtmueller
51cee266f5 feat: HPE/Aruba OEM seed + Cisco TMG upsert fix
Add 43 HPE/Aruba OEM transceiver PIDs (J/JL/JH/R series — 1G through
400G QSFP-DD + DAC/AOC). Scheduler: daily 04:30.

Cisco TMG scraper: fixed market_status/temp_range constraint violations,
switched to always-upsert pattern. Result: 423 switches, 22476 Cisco
OEM transceivers, 22476 compat entries written to DB.

Update CHANGELOG_PENDING with all session data changes.
2026-04-26 19:12:27 +02:00
Rene Fichtmueller
c9a50ad551 feat: Juniper OEM seed scraper + BlueOptics HTTP/1.1 fix
Add 59 Juniper OEM transceiver PIDs (SFP/SFP+/SFP28/QSFP+/QSFP28/
QSFP56/QSFP-DD/OSFP + DAC/AOC) to seed the transceivers table.
Register scrape:catalog:juniper-oem in scheduler (daily 04:15).

Fix BlueOptics scraper: force HTTP/1.1 via Node.js https.get() to
bypass server bug where HTTP/2 returns empty response body. Also
update catalog path from /transceivers/ to /Transceivers_1.
2026-04-26 19:08:09 +02:00
Rene Fichtmueller
cc85d3d0f8 feat: Cisco OEM + Arista OEM transceiver catalog scrapers
- cisco-tmg.ts: upsert Cisco OEM transceivers from TMG API instead of
  SELECT-only. Parsers for formFactor/speed/reach/fiberType/tempRange.
  Fixes market_status ('EOL') + temp_range ('COM'/'IND') check constraints.
- arista-oem.ts: seed scraper for 69 Arista OEM PIDs (1G→800G,
  SFP/SFP28/QSFP+/QSFP28/QSFP-DD/OSFP/QSFP-DD800) with full specs.
- scheduler.ts: daily arista-oem seed at 04:00 UTC
2026-04-26 19:00:21 +02:00
Rene Fichtmueller
ba998f4c01 fix: vendor_compat 0%→100%, price denorm, wiitek disabled, price-denorm scheduler
- Migration 094: images for 12 Cisco 8K MPA + A9K-8HG-FLEX + ASR-9000V models
- Migration 095: price denorm refresh (EUR 679→1376, USD 166→835 with 180d window)
- Migration 096: bulk vendor_compat by form_factor — all 9013 transceivers now
  have OEM compatibility patterns (was 0/9013 because all slugs are scraped-*)
- wiitek.ts: disable dead scraper (wiitek.com unreachable since 2026-04, EAI_AGAIN)
- scheduler.ts: add compute:price-denorm job (daily 05:30 UTC) to keep
  street_price_usd/price_verified_eur fresh without manual migration runs
- seed-from-npm.ts: ON CONFLICT now also updates vendor_compat (was only updated_at)
2026-04-25 08:55:21 +02:00
Rene Fichtmueller
bbc6f560dd fix: add image filter patterns and direct URL migrations for 6 vendors
- switch-image-playwright.ts + switch-image-fetcher.ts: add filter patterns
  for /webimage-404/ (Netgear 404 hero), /Brand/ + /cybersecurity.png/
  (Moxa brand marketing images not product photos)
- sql/047: Moxa 4/4 models — CDN getattachment paths (hotlink-protected,
  Referer: moxa.com required; R2 proxy needed for production display)
- sql/048: UfiSpace 6/6 models — ufispace.com/image/<hash>/ direct PNGs;
  Brocade G720+G730 — broadcom.com og:image; ICX 7850-48FS — CommScope/Ruckus
  vistancenetworks.com ImageServer (rand param is cache-bust only, not auth)
- sql/049: NVIDIA SN-series 6/6 — docscontent.nvidia.com (SN2201/3700/4700)
  and S3 direct (SN5400/5600); SN3750-SX via uvation reseller CDN
2026-04-21 07:57:55 +02:00
Rene Fichtmueller
b65e4452db fix: add error-graphic, icon-library, illustration filters to GENERIC_IMAGE_PATTERNS
- /404[-_]error/i, /error[-_]graphic/i — Broadcom 404-ERROR-GRAPHIC.png
- /\/icon[-_]library\//i — D-Link navigation/icon-library path images
- /[-_]illustration[._]/i — Arista Cloud-Legacy_Illustration and similar diagrams
- Nokia banner, Huawei marketing, banners/ path patterns (Playwright scraper)
- Cookie consent patterns synced to switch-image-fetcher.ts (was only in Playwright)
2026-04-21 07:38:01 +02:00
Rene Fichtmueller
f4afe14af4 feat: add 12 new vendor URL builders to Playwright image scraper
- Nokia, Huawei, Ciena, Moxa, D-Link, Alcatel-Lucent Enterprise,
  Asterfusion, Brocade: passthrough builders (use stored product_page_url)
- NVIDIA Networking: SN-series URL builder (sn5600 → /ethernet-switching/sn5600/)
- Netgear: lowercase model slug builder for /business/wired/switches/fully-managed/
- UfiSpace: hardcoded sitemap-verified URL map (all 6 S9xxx models)
- QCT: hardcoded URL map for T3048-LY8 and T7032-IX1
- Add Nokia banner / Huawei marketing image patterns to GENERIC_IMAGE_PATTERNS
2026-04-21 07:24:11 +02:00
Rene Fichtmueller
8f36eff956 fix(scraper): filter OneTrust/cookie-consent images + skip in img fallback
cdn.cookielaw.org logos appear as the largest DOM image on Dell/Extreme
product pages when the cookie consent overlay is present. Added to both
GENERIC_IMAGE_PATTERNS (isGenericImage filter) and img fallback skipPattern
so the next-largest actual product image can be found.
2026-04-21 06:45:41 +02:00
Rene Fichtmueller
d67fbe31da fix(scraper): fall through to img fallback when og:image is generic/logo
Previously: if og:image existed (even as a Dell logo URL), page.evaluate() returned
early and the img fallback was never tried. Now: meta tags are extracted first, then
isGenericImage() is checked in Node.js, and the img fallback runs if meta image is null
or generic. This allows vendors like Dell (og:image = logo) to still get product images
via the DOM fallback.
2026-04-21 06:36:12 +02:00
Rene Fichtmueller
09d3a60b7c fix(scraper): fix Edgecore/Extreme URL builders, broaden img fallback, fix ENOENT
- buildEdgecoreUrl: /product/<slug>/ (WooCommerce, no .html) with EDGECORE_SLUG_MAP
  for AS7712-32X→as7712-32x-ec, Minipack2→minipack-as8000-open-modular-platform
- buildFortinetUrl: returns null (all pages redirect to generic, no usable og:image)
- buildExtremeUrl: direct product URL (extremenetworks.com/product/<slug>)
- img fallback: remove strict 'product/switch/router/hardware' path requirement;
  now takes largest image >=200x150px excluding flags/icons/spinners — isGenericImage()
  filters hero/banner/logo afterward
- ENOENT fix: unique per-run Crawlee storage dir (timestamp suffix) prevents
  stale request-queue file contamination between back-to-back vendor runs
2026-04-21 06:33:32 +02:00
Rene Fichtmueller
87b9416592 fix(scraper): fix Arista series-level URL builder + bypass Crawlee URL deduplication
- buildAristaUrl() now extracts series prefix (7060X5-32QS → 7060x5-series)
  instead of individual model URLs that lack og:image
- Strip trailing sub-variant 'A' so R3A → R3 series page
- Add uniqueKey: row.id to each request — prevents Crawlee from deduplicating
  models that share the same series URL (e.g. 7060x5-series)
- For Arista: always prefer fresh builder URL over stored product_page_url
  so stale individual-model URLs don't override correct series pages
2026-04-21 06:22:41 +02:00
Rene Fichtmueller
18a9e1346e feat: Playwright image scraper for bot-blocked vendors (Arista/Dell/Edgecore/Fortinet/Extreme) 2026-04-21 06:16:05 +02:00
Rene Fichtmueller
653824f23b fix: Cisco line card URL mapping (8800/84/86 → 8000 family page, skip ASR9K logo-only) 2026-04-21 00:49:32 +02:00
Rene Fichtmueller
c9333ab5ea fix: MikroTik hardcoded slug map for + models (crs305/312/317/326) 2026-04-21 00:45:41 +02:00
Rene Fichtmueller
9618a4f0e0 fix: Cisco 8000 builder URL + MikroTik lowercase + new vendor builders
URL builder fixes:
- Cisco 8000: update to new /site/us/en/ URL scheme (family page, not per-model)
- MikroTik: fix to lowercase+underscore format (was uppercase, caused 404)
- Fortinet: set to null — JS-rendered pages, all redirect to generic page
- Alcatel-Lucent Enterprise slug added to dispatcher (was missing, caused 0 hits)
- Add Quanta, Allied Telesis, Ufispace, Netgear URL builders
- NVIDIA: skip ConnectX/BlueField non-switch models

Migration 044:
- Clear 35 wrong NCS-5500 URLs from Cisco 8000-series models
- Pre-set correct 8000-series family URL for 21 models without images
2026-04-21 00:41:31 +02:00
Rene Fichtmueller
9e6be570a3 feat: more switch image coverage + system health metrics + Erik monitor
switch-image-fetcher:
- Add Fortinet URL builder (11 FortiSwitch models)
- Add Quanta Cloud Technology, Allied Telesis, Ufispace, Netgear URL builders
- Fix alcatel-lucent-enterprise slug missing from URL_BUILDERS dispatcher
- Fix NVIDIA builder to skip ConnectX/BlueField adapters (not switches)
- Add aruba slug alias for hpe-aruba

health endpoint:
- Add system metrics: CPU load (1/5/15m), memory usage, disk usage
- Add load_status indicator (ok/busy/overloaded)
- Expose process RSS memory
- Used by external monitors

scripts/monitor-erik.sh:
- Cron-ready health check script for Claudi (.82) and Raspberry Pis
- Checks TIP API health endpoint (load, memory, disk, DB latency)
- Checks PM2 process state via SSH (errored/stopped detection)
- ntfy.sh push notifications (set NTFY_TOPIC env var)
- Includes systemd service + timer unit comments for auto-install
2026-04-21 00:31:43 +02:00
Rene Fichtmueller
823b64bd24 perf: load-aware scraper guard + higher rate limits + /tmp crawlee storage 2026-04-20 23:35:02 +02:00
Rene Fichtmueller
a2492d833b feat: Flexoptix order section per switch + reject generic/logo images 2026-04-20 23:31:36 +02:00
Rene Fichtmueller
ab059c2fd1 fix(community-issues): scrapeTransceiverCompatIssues falls back to ports_config when no compat entries 2026-04-20 23:00:00 +02:00
Rene Fichtmueller
4bf5c95824 feat: Flexoptix compatibility scraper + transceiver issue scanner
- Add flexoptix-compat.ts: maps switch models to compatible Flexoptix transceivers
  via search API (vendor_compat) with form-factor fallback (spec_match)
  Scheduled daily at 09:00 UTC as scrape:compat:flexoptix
- Enhance community-issues.ts: add vendor advisory sources (Cisco Field Notices,
  Juniper KB, SONiC GitHub Issues) + new scrapeTransceiverCompatIssues() that
  searches for switch+transceiver combination problems specifically
- Scheduler: 59 schedules, 78 workers
2026-04-20 22:50:57 +02:00
Rene Fichtmueller
a0a7a97d83 feat: switch image fetcher + og:image scheduler job + dashboard thumbnail column
- Add switch-image-fetcher.ts: og:image-based image discovery for all 86 seeded switches
  (covers Cisco, Arista, Juniper, NVIDIA, Edgecore, Celestica, Asterfusion, Dell,
   HPE/Aruba, Huawei, Nokia, Extreme, MikroTik, Ubiquiti, FS.COM, Supermicro)
- Wire fetchSwitchImages() into scheduler as scrape:images:switches (daily 08:30 UTC)
- Dashboard: add 48px thumbnail column to switch table (lazy img with gear icon fallback)
2026-04-20 22:44:08 +02:00
Rene Fichtmueller
aa91798e8d fix(vcelink): resolve TS 5.9 narrowing quirk with explicit cast in dead code
price?: number narrowing via typeof/!== undefined does not work for
arithmetic comparisons in TypeScript 5.9 dead code paths; use 'as number'
cast to keep the dead code compilable while the early-return guard above
prevents runtime execution entirely.
2026-04-20 22:18:13 +02:00
Rene Fichtmueller
1aba912a15 fix(scrapers): fix ATGBics theme migration, NADDOD URL, disable VCELink
- ATGBics: update HTML parser from old card--product theme to new
  card__info theme (Shopify template changed April 2026); name now
  extracted from href link text instead of aria-label
- NADDOD: correct ensureVendor shop URL from /collections/transceivers
  (404) to /collection/optical-transceivers
- VCELink: disable scraper — site pivoted from optical transceivers to
  audio/video/cable products; all collection URLs return 404
2026-04-20 22:11:24 +02:00
Rene Fichtmueller
cb5a587d7e feat: rewrite ATGBICS scraper — static HTML, correct collection handles, GBP cookie
- Replaces Playwright with pure fetch() — static HTML has prices
- Correct collection handles (compatible-transceivers-sfpp-10g etc.)
- Cookie: cart_currency=GBP forces GBP pricing from any geo-IP
- Handles 35+ pages per category × 24 products = 840+ SFP+ products
- No IP-blocking with static HTML (Playwright was the trigger)
- Adds scripts/run-atgbics-mac.sh for Mac-side runner if needed
2026-04-18 22:48:29 +02:00
Rene Fichtmueller
785a6731ab fix: fiber24 stockLevel on_request (was unknown — violated DB constraint) 2026-04-18 22:26:45 +02:00
Rene Fichtmueller
d4ad9f4641 fix: ShopFiber24 sitemap-based scraping + Fibermall image extraction
ShopFiber24 (fiber24.ts):
- Complete rewrite: was using JS-rendered catalog (all prices = 0)
- New strategy: fetch sitemap_0.xml.gz → 310 product DE-URLs
- Each product page has Schema.org microdata: itemprop=price, sku, image
- Extracts: price (minPrice), SKU, image_url, name, specs
- Rate: 1 req/1.5s, no Playwright needed

FiberMall (fibermall.ts):
- Add imageUrl to Product interface
- Extract first fibermall.com/photo/*.jpg from product listing card
- Write image_url to transceivers table (has_image=true) on upsert
- SKU variants share parent product image
- 304 FiberMall transceivers will get images on next scraper run
2026-04-18 22:20:57 +02:00
Rene Fichtmueller
1da4abc488 fix: FS.com price extraction — DOM-based prices + shipping-context exclusion
- All 247 FS.com prices were €79 (shipping threshold, not product prices)
- Root cause: 'Gratis Versand ab 79 € (ohne MwSt.)' banner matched first
- Fix 1: DOM price extraction in page.evaluate with bad-parent skip list
- Fix 2: bodyText qualified patterns skip matches near shipping keywords
- Fix 3: waitForSelector for price DOM element before evaluate
- Fix 4: Deleted 247 invalid €79 observations from DB

Also included from previous session:
- db.ts: set has_image=true on image writes (fix 632 desync rows)
- spec-updater.ts: DR/FR/LR/ER/ZR → SMF, SR → MMF fiber type inference
2026-04-18 13:10:35 +02:00
Rene Fichtmueller
fcdd258369 fix: 10Gtek scraper now fetches prices from sfpcables.com
10gtek.com main site only exposes technical spec tables with no prices.
sfpcables.com is 10Gtek's own retail store and has both Model numbers
and USD prices in standard Magento product listings.

Changes:
- Switch scraping target from www.10gtek.com to sfpcables.com
- Parse Model: <part> + US.XX per product block (Magento structure)
- XFP fallback: extract part number from title after '|' separator
- Add fetchAllPages() with Magento loop-detection via seen-part dedup
- Remove QSFP-DD category (not available on sfpcables.com)
- Drop XFP-less categories from old 10gtek.com spec-table parser

Verified: 10/10 SFP prices, 10/10 SFP+ prices, 4/4 XFP prices on live site.
2026-04-18 05:27:49 +02:00
Rene Fichtmueller
2a6ec90ecd fix: fs-com Phase 1+2 crawler.run() ENOENT guard — Crawlee catches and re-throws the post-run _isTaskReadyFunction ENOENT internally, which rejected crawler.run() and aborted Phase 2 before it could start. Wrap both crawler.run() calls in try/catch to swallow ENOENT from request_queues paths; all processing is already complete at this point. 2026-04-18 03:52:49 +02:00