Add 59 Juniper OEM transceiver PIDs (SFP/SFP+/SFP28/QSFP+/QSFP28/
QSFP56/QSFP-DD/OSFP + DAC/AOC) to seed the transceivers table.
Register scrape:catalog:juniper-oem in scheduler (daily 04:15).
Fix BlueOptics scraper: force HTTP/1.1 via Node.js https.get() to
bypass server bug where HTTP/2 returns empty response body. Also
update catalog path from /transceivers/ to /Transceivers_1.
- cisco-tmg.ts: upsert Cisco OEM transceivers from TMG API instead of
SELECT-only. Parsers for formFactor/speed/reach/fiberType/tempRange.
Fixes market_status ('EOL') + temp_range ('COM'/'IND') check constraints.
- arista-oem.ts: seed scraper for 69 Arista OEM PIDs (1G→800G,
SFP/SFP28/QSFP+/QSFP28/QSFP-DD/OSFP/QSFP-DD800) with full specs.
- scheduler.ts: daily arista-oem seed at 04:00 UTC
cdn.cookielaw.org logos appear as the largest DOM image on Dell/Extreme
product pages when the cookie consent overlay is present. Added to both
GENERIC_IMAGE_PATTERNS (isGenericImage filter) and img fallback skipPattern
so the next-largest actual product image can be found.
Previously: if og:image existed (even as a Dell logo URL), page.evaluate() returned
early and the img fallback was never tried. Now: meta tags are extracted first, then
isGenericImage() is checked in Node.js, and the img fallback runs if meta image is null
or generic. This allows vendors like Dell (og:image = logo) to still get product images
via the DOM fallback.
- buildAristaUrl() now extracts series prefix (7060X5-32QS → 7060x5-series)
instead of individual model URLs that lack og:image
- Strip trailing sub-variant 'A' so R3A → R3 series page
- Add uniqueKey: row.id to each request — prevents Crawlee from deduplicating
models that share the same series URL (e.g. 7060x5-series)
- For Arista: always prefer fresh builder URL over stored product_page_url
so stale individual-model URLs don't override correct series pages
switch-image-fetcher:
- Add Fortinet URL builder (11 FortiSwitch models)
- Add Quanta Cloud Technology, Allied Telesis, Ufispace, Netgear URL builders
- Fix alcatel-lucent-enterprise slug missing from URL_BUILDERS dispatcher
- Fix NVIDIA builder to skip ConnectX/BlueField adapters (not switches)
- Add aruba slug alias for hpe-aruba
health endpoint:
- Add system metrics: CPU load (1/5/15m), memory usage, disk usage
- Add load_status indicator (ok/busy/overloaded)
- Expose process RSS memory
- Used by external monitors
scripts/monitor-erik.sh:
- Cron-ready health check script for Claudi (.82) and Raspberry Pis
- Checks TIP API health endpoint (load, memory, disk, DB latency)
- Checks PM2 process state via SSH (errored/stopped detection)
- ntfy.sh push notifications (set NTFY_TOPIC env var)
- Includes systemd service + timer unit comments for auto-install
- Add flexoptix-compat.ts: maps switch models to compatible Flexoptix transceivers
via search API (vendor_compat) with form-factor fallback (spec_match)
Scheduled daily at 09:00 UTC as scrape:compat:flexoptix
- Enhance community-issues.ts: add vendor advisory sources (Cisco Field Notices,
Juniper KB, SONiC GitHub Issues) + new scrapeTransceiverCompatIssues() that
searches for switch+transceiver combination problems specifically
- Scheduler: 59 schedules, 78 workers
price?: number narrowing via typeof/!== undefined does not work for
arithmetic comparisons in TypeScript 5.9 dead code paths; use 'as number'
cast to keep the dead code compilable while the early-return guard above
prevents runtime execution entirely.
- ATGBics: update HTML parser from old card--product theme to new
card__info theme (Shopify template changed April 2026); name now
extracted from href link text instead of aria-label
- NADDOD: correct ensureVendor shop URL from /collections/transceivers
(404) to /collection/optical-transceivers
- VCELink: disable scraper — site pivoted from optical transceivers to
audio/video/cable products; all collection URLs return 404
- Replaces Playwright with pure fetch() — static HTML has prices
- Correct collection handles (compatible-transceivers-sfpp-10g etc.)
- Cookie: cart_currency=GBP forces GBP pricing from any geo-IP
- Handles 35+ pages per category × 24 products = 840+ SFP+ products
- No IP-blocking with static HTML (Playwright was the trigger)
- Adds scripts/run-atgbics-mac.sh for Mac-side runner if needed
- All 247 FS.com prices were €79 (shipping threshold, not product prices)
- Root cause: 'Gratis Versand ab 79 € (ohne MwSt.)' banner matched first
- Fix 1: DOM price extraction in page.evaluate with bad-parent skip list
- Fix 2: bodyText qualified patterns skip matches near shipping keywords
- Fix 3: waitForSelector for price DOM element before evaluate
- Fix 4: Deleted 247 invalid €79 observations from DB
Also included from previous session:
- db.ts: set has_image=true on image writes (fix 632 desync rows)
- spec-updater.ts: DR/FR/LR/ER/ZR → SMF, SR → MMF fiber type inference
10gtek.com main site only exposes technical spec tables with no prices.
sfpcables.com is 10Gtek's own retail store and has both Model numbers
and USD prices in standard Magento product listings.
Changes:
- Switch scraping target from www.10gtek.com to sfpcables.com
- Parse Model: <part> + US.XX per product block (Magento structure)
- XFP fallback: extract part number from title after '|' separator
- Add fetchAllPages() with Magento loop-detection via seen-part dedup
- Remove QSFP-DD category (not available on sfpcables.com)
- Drop XFP-less categories from old 10gtek.com spec-table parser
Verified: 10/10 SFP prices, 10/10 SFP+ prices, 4/4 XFP prices on live site.
Pattern 1 (href→aria-label) finds 127 navigation links on GBICS BigCommerce
pages — none contain GBP prices. Pattern 2 (aria-label→href) correctly
finds 16-30 product links per category page with £XX.XX prices in aria-labels.
The fallback from P1 to P2 now triggers when P1 finds results but none
contain '£', rather than only when P1 finds 0 total results.
After PlaywrightCrawler.run() resolves, Crawlee's internal task loop
schedules one final _isTaskReadyFunction call that tries to read a
request queue .json file already cleaned up during processing. This
ENOENT fires as an unhandledRejection and calls process.exit(1),
aborting Phase 2 before prices are written to the database.
Added a targeted unhandledRejection handler in the require.main block
that swallows ENOENT errors from request_queues paths (benign Crawlee
cleanup race) while re-raising all other rejections.
- Add makeCrawleeConfig isolation to CheerioCrawler instances
- Switch from named persistent RequestQueue to ephemeral null queues:
named queues retain 'handled' state and skip all URLs on re-runs,
causing 0 observations on every run after the first.
- Applies to both enrichSwitchFromEbay and enrichTransceiversFromEbay.
- Add utils/crawlee-config.ts: makeCrawleeConfig(name) returns a
Crawlee Configuration with isolated localDataDirectory per scraper.
Uses storageClientOptions (not global CRAWLEE_STORAGE_DIR) so
concurrent pg-boss workers in the same process don't race on
the shared env var.
- Apply makeCrawleeConfig to all 6 Crawlee-based scrapers:
optcore (PlaywrightCrawler), atgbics (PlaywrightCrawler),
community-issues (CheerioCrawler + RequestQueue),
edgecore (CheerioCrawler), ufispace (CheerioCrawler),
market-intelligence (CheerioCrawler).
- scheduler.ts: add withIsolatedStorage for optcore and market-intel
workers (was missing, caused storage-fs path bleed from fs scraper).
- ebay-enricher.ts: fix vendor type 'marketplace' -> 'reseller' to
satisfy vendors_type_check constraint
['manufacturer','distributor','oem','reseller','compatible'].
FiberMall:
- Correct /store-XXXXX-name.htm category URLs (was /c/xxx/ → HTTP 404)
- Parser: split on new_proList_mainListLi, price from data-price on
currency_price span — fix 0.00 false-match from SKU variant items
- Also scrape SKU brand variant links from .sku_item divs
- Result: 3,410 prices now in DB (was 0)
Flexoptix:
- Fix extractPrice regex for EUR thousand-separator format
(2,921.60 EUR was parsed as 2 EUR)
- Add OSFP224 / 1.6T search queries (4 new, form factor was missing)
- Fix O.138HG2.C.05 stale price 3009.60→2921.60 EUR
Schema: competitor_verified + competitor_verified_at columns
added via ALTER TABLE (were referenced in code but missing in DB)
CHANGELOG: added 6 entries for 2026-04-12
- New scrapers: fibermall.ts (WooCommerce), vcelink.ts (Shopify), opticsbay.ts (WooCommerce)
- QSFPTEK rewritten to use /mall/commodity/list API (old OpenCart /c/*.html paths gone 404)
- New: attribute-based filtering by data rate (1G/10G/25G/40G/100G/200G/400G/800G)
- Scrapes HTML fragments, extracts US$ prices and product URLs
- scheduler.ts: +3 queues/schedules/workers (fibermall, vcelink, opticsbay) → 61 total workers
- index-pi.ts: Pi fleet picks up all 3 new scrapers