Static HTML collection pages return wrong results (all redirect to same 9 products).
Switch to /collections/{handle}/products.json?limit=250&page=N API which is:
- Reliable JSON (no HTML parsing)
- Correct per-collection product lists
- Clean pagination (stop at < limit results)
- Covers 11 key transceiver collections (1G, 10G, 25G, 40G, 100G, 400G)
- upsertPriceObservation: insert new observation if last one is >7 days old,
even when price (content_hash) hasn't changed — keeps timeseries data fresh
- ATGBICS: detect Shopify catalog wrap-around by tracking per-category seen URLs;
stop pagination when all products on a page were already seen in a prior page
- ATGBICS: improve hasNextPage to match &page=N anchored in href params
- scheduler: patch boss.schedule() to call createQueue() first (idempotent),
fixing FK constraint errors after DB reset — no need to touch 277 call sites
- index: registerWorkers() before registerSchedules() since boss.work() must
register handlers before schedules fire
- dashboard: fix switchBlogLlm() to use api() helper (adds Bearer auth token)
instead of raw fetch() which was returning 401 Unauthorized
Add 59 Juniper OEM transceiver PIDs (SFP/SFP+/SFP28/QSFP+/QSFP28/
QSFP56/QSFP-DD/OSFP + DAC/AOC) to seed the transceivers table.
Register scrape:catalog:juniper-oem in scheduler (daily 04:15).
Fix BlueOptics scraper: force HTTP/1.1 via Node.js https.get() to
bypass server bug where HTTP/2 returns empty response body. Also
update catalog path from /transceivers/ to /Transceivers_1.
- cisco-tmg.ts: upsert Cisco OEM transceivers from TMG API instead of
SELECT-only. Parsers for formFactor/speed/reach/fiberType/tempRange.
Fixes market_status ('EOL') + temp_range ('COM'/'IND') check constraints.
- arista-oem.ts: seed scraper for 69 Arista OEM PIDs (1G→800G,
SFP/SFP28/QSFP+/QSFP28/QSFP-DD/OSFP/QSFP-DD800) with full specs.
- scheduler.ts: daily arista-oem seed at 04:00 UTC
cdn.cookielaw.org logos appear as the largest DOM image on Dell/Extreme
product pages when the cookie consent overlay is present. Added to both
GENERIC_IMAGE_PATTERNS (isGenericImage filter) and img fallback skipPattern
so the next-largest actual product image can be found.
Previously: if og:image existed (even as a Dell logo URL), page.evaluate() returned
early and the img fallback was never tried. Now: meta tags are extracted first, then
isGenericImage() is checked in Node.js, and the img fallback runs if meta image is null
or generic. This allows vendors like Dell (og:image = logo) to still get product images
via the DOM fallback.
- buildAristaUrl() now extracts series prefix (7060X5-32QS → 7060x5-series)
instead of individual model URLs that lack og:image
- Strip trailing sub-variant 'A' so R3A → R3 series page
- Add uniqueKey: row.id to each request — prevents Crawlee from deduplicating
models that share the same series URL (e.g. 7060x5-series)
- For Arista: always prefer fresh builder URL over stored product_page_url
so stale individual-model URLs don't override correct series pages
switch-image-fetcher:
- Add Fortinet URL builder (11 FortiSwitch models)
- Add Quanta Cloud Technology, Allied Telesis, Ufispace, Netgear URL builders
- Fix alcatel-lucent-enterprise slug missing from URL_BUILDERS dispatcher
- Fix NVIDIA builder to skip ConnectX/BlueField adapters (not switches)
- Add aruba slug alias for hpe-aruba
health endpoint:
- Add system metrics: CPU load (1/5/15m), memory usage, disk usage
- Add load_status indicator (ok/busy/overloaded)
- Expose process RSS memory
- Used by external monitors
scripts/monitor-erik.sh:
- Cron-ready health check script for Claudi (.82) and Raspberry Pis
- Checks TIP API health endpoint (load, memory, disk, DB latency)
- Checks PM2 process state via SSH (errored/stopped detection)
- ntfy.sh push notifications (set NTFY_TOPIC env var)
- Includes systemd service + timer unit comments for auto-install
- Add flexoptix-compat.ts: maps switch models to compatible Flexoptix transceivers
via search API (vendor_compat) with form-factor fallback (spec_match)
Scheduled daily at 09:00 UTC as scrape:compat:flexoptix
- Enhance community-issues.ts: add vendor advisory sources (Cisco Field Notices,
Juniper KB, SONiC GitHub Issues) + new scrapeTransceiverCompatIssues() that
searches for switch+transceiver combination problems specifically
- Scheduler: 59 schedules, 78 workers
price?: number narrowing via typeof/!== undefined does not work for
arithmetic comparisons in TypeScript 5.9 dead code paths; use 'as number'
cast to keep the dead code compilable while the early-return guard above
prevents runtime execution entirely.
- ATGBics: update HTML parser from old card--product theme to new
card__info theme (Shopify template changed April 2026); name now
extracted from href link text instead of aria-label
- NADDOD: correct ensureVendor shop URL from /collections/transceivers
(404) to /collection/optical-transceivers
- VCELink: disable scraper — site pivoted from optical transceivers to
audio/video/cable products; all collection URLs return 404
- Replaces Playwright with pure fetch() — static HTML has prices
- Correct collection handles (compatible-transceivers-sfpp-10g etc.)
- Cookie: cart_currency=GBP forces GBP pricing from any geo-IP
- Handles 35+ pages per category × 24 products = 840+ SFP+ products
- No IP-blocking with static HTML (Playwright was the trigger)
- Adds scripts/run-atgbics-mac.sh for Mac-side runner if needed
- All 247 FS.com prices were €79 (shipping threshold, not product prices)
- Root cause: 'Gratis Versand ab 79 € (ohne MwSt.)' banner matched first
- Fix 1: DOM price extraction in page.evaluate with bad-parent skip list
- Fix 2: bodyText qualified patterns skip matches near shipping keywords
- Fix 3: waitForSelector for price DOM element before evaluate
- Fix 4: Deleted 247 invalid €79 observations from DB
Also included from previous session:
- db.ts: set has_image=true on image writes (fix 632 desync rows)
- spec-updater.ts: DR/FR/LR/ER/ZR → SMF, SR → MMF fiber type inference
optcore.net blocks Erik's IP (82.165.222.127) via Cloudflare WAF.
WP REST API returns HTML block page instead of JSON → 0 product URLs
→ 0 scraped pages every run. Add SKIP_OPTCORE_SCRAPER guard matching
the existing SKIP_FS_SCRAPER pattern. Set in ecosystem.config.js on
Erik. Residential IP (Mac launchd) would be needed to use this scraper.
Crawlee's FileSystemStorage marks request URLs as HANDLED (state=4,
orderNo=null) after processing. With purgeOnStart=false these entries
persist, so on the next run crawler.run(startUrls) deduplicates them
→ requestsTotal=0 → immediate finish with 0 scraped pages.
Fix: rmSync request_queues/default/ before each makeCrawleeConfig()
call. Safe: session pool state lives in key_value_stores/, not in
request_queues/. Affects all Crawlee-based scrapers (ATGBICS, Optcore,
Switch-assets, etc.).
10gtek.com main site only exposes technical spec tables with no prices.
sfpcables.com is 10Gtek's own retail store and has both Model numbers
and USD prices in standard Magento product listings.
Changes:
- Switch scraping target from www.10gtek.com to sfpcables.com
- Parse Model: <part> + US.XX per product block (Magento structure)
- XFP fallback: extract part number from title after '|' separator
- Add fetchAllPages() with Magento loop-detection via seen-part dedup
- Remove QSFP-DD category (not available on sfpcables.com)
- Drop XFP-less categories from old 10gtek.com spec-table parser
Verified: 10/10 SFP prices, 10/10 SFP+ prices, 4/4 XFP prices on live site.