- Fix OSFP-DR8-1.6T-FL and OSFP-2FR4-1.6T-FL: speed_gbps was 200, now 1600
→ FS.com 1.6T products now correctly match as comparables for Flexoptix O.1316T.C.05.M
- API: extend comparable price query to return comp_form_factor, comp_speed_gbps,
comp_reach_meters, comp_reach_label, comp_fiber_type, comp_wavelengths
- Dashboard: replace plain comparable price row with side-by-side spec comparison card
showing Flexoptix vs. competitor: Form Factor, Speed, Reach, Fiber, Wavelengths
with color coding (green=match, orange=mismatch) and savings badge (−45% günstiger)
- client.ts: add claude-code provider routing BLOG_LLM_PROVIDER=claude-code
to claude-bridge (flat-rate, no API billing via Claude Code subscription)
- checkHealth() now pings /health on claude-bridge for real availability check
- Default OLLAMA_LLM_MODEL changed from qwen2.5:14b to fo-blog-v5
- Dashboard: add claude-code card (EMPFOHLEN), rename fo-blog-v3 → fo-blog-v5
- loadBlogLLMStatus() handles all 3 providers: claude-code/anthropic/ollama
- Grid expanded from 3 to 4 columns to accommodate new card
- ecosystem.config.js + .env on Erik: OLLAMA_LLM_MODEL=fo-blog-v5 confirmed
- All 247 FS.com prices were €79 (shipping threshold, not product prices)
- Root cause: 'Gratis Versand ab 79 € (ohne MwSt.)' banner matched first
- Fix 1: DOM price extraction in page.evaluate with bad-parent skip list
- Fix 2: bodyText qualified patterns skip matches near shipping keywords
- Fix 3: waitForSelector for price DOM element before evaluate
- Fix 4: Deleted 247 invalid €79 observations from DB
Also included from previous session:
- db.ts: set has_image=true on image writes (fix 632 desync rows)
- spec-updater.ts: DR/FR/LR/ER/ZR → SMF, SR → MMF fiber type inference
Both generateBlog() and generateBlogManual() were calling
POST /api/blog/generate without an Authorization: Bearer header.
The requireAuth middleware correctly returned 401, which appeared
as 'Unauthorized — please log in' toast in the dashboard.
Fix: read loadToken() before each fetch and include the token in
the Authorization header. Also add r.status===401 guard to redirect
to login page when token expires, instead of showing error toast.
optcore.net blocks Erik's IP (82.165.222.127) via Cloudflare WAF.
WP REST API returns HTML block page instead of JSON → 0 product URLs
→ 0 scraped pages every run. Add SKIP_OPTCORE_SCRAPER guard matching
the existing SKIP_FS_SCRAPER pattern. Set in ecosystem.config.js on
Erik. Residential IP (Mac launchd) would be needed to use this scraper.
Crawlee's FileSystemStorage marks request URLs as HANDLED (state=4,
orderNo=null) after processing. With purgeOnStart=false these entries
persist, so on the next run crawler.run(startUrls) deduplicates them
→ requestsTotal=0 → immediate finish with 0 scraped pages.
Fix: rmSync request_queues/default/ before each makeCrawleeConfig()
call. Safe: session pool state lives in key_value_stores/, not in
request_queues/. Affects all Crawlee-based scrapers (ATGBICS, Optcore,
Switch-assets, etc.).
10gtek.com main site only exposes technical spec tables with no prices.
sfpcables.com is 10Gtek's own retail store and has both Model numbers
and USD prices in standard Magento product listings.
Changes:
- Switch scraping target from www.10gtek.com to sfpcables.com
- Parse Model: <part> + US.XX per product block (Magento structure)
- XFP fallback: extract part number from title after '|' separator
- Add fetchAllPages() with Magento loop-detection via seen-part dedup
- Remove QSFP-DD category (not available on sfpcables.com)
- Drop XFP-less categories from old 10gtek.com spec-table parser
Verified: 10/10 SFP prices, 10/10 SFP+ prices, 4/4 XFP prices on live site.
- Add global unhandledRejection handler in scheduler daemon to swallow
Crawlee's benign post-run ENOENT lock-file races (prevents process.exit(1))
- Add SKIP_FS_SCRAPER env var: skip FS.com worker on Erik where Cloudflare
WAF blocks datacenter IPs (Mac launchd handles FS.com from residential IP)
- Remove FS.COM from health monitor EXPECTED_VENDORS (skipped on Erik)
- Health monitor: extend pg-boss lookup from 12h → 26h, add completed-job
map; if job ran OK in last 26h + vendor has historical prices → mark
STABLE instead of CRITICAL (fixes ATGBICS/Fluxlight hash-dedup false positives)
- Install Playwright Chromium on Erik (fixes ATGBICS BrowserLaunchError)
- Create missing Crawlee storage dirs on Erik (storage-fs-phase1/2,
storage-ebay-transceivers) to prevent ENOENT on first Crawlee run
Pattern 1 (href→aria-label) finds 127 navigation links on GBICS BigCommerce
pages — none contain GBP prices. Pattern 2 (aria-label→href) correctly
finds 16-30 product links per category page with £XX.XX prices in aria-labels.
The fallback from P1 to P2 now triggers when P1 finds results but none
contain '£', rather than only when P1 finds 0 total results.
Previous logic fired an alert whenever prices_6h=0, even when prices
were genuinely stable (content hash dedup prevents duplicate inserts).
This caused Flexoptix, ATGBICS and others to trigger alerts every 3h
despite their scrapers running successfully.
New logic:
🔴 CRITICAL: last price > 7 days (genuine failure)
🟡 WARNING: last price 48h–7 days (possibly stale)
✅ STABLE: last price ≤48h, 0 new (prices unchanged, scraper OK)
Also shows pg-boss job state/time alongside each vendor for faster
root-cause diagnosis. Trimmed EXPECTED_VENDORS to vendors with actual
scraper implementations (removed never-scraped placeholders).
After PlaywrightCrawler.run() resolves, Crawlee's internal task loop
schedules one final _isTaskReadyFunction call that tries to read a
request queue .json file already cleaned up during processing. This
ENOENT fires as an unhandledRejection and calls process.exit(1),
aborting Phase 2 before prices are written to the database.
Added a targeted unhandledRejection handler in the require.main block
that swallows ENOENT errors from request_queues paths (benign Crawlee
cleanup race) while re-raising all other rejections.
Adds /tmp/tip-fs-scraper.lock PID file to prevent launchd from running
a second instance while the previous one is still active (e.g. 2am
schedule fires, runs past 10am when launchd fires again). Without this,
concurrent instances caused rmSync(storage-fs-phase1) in one instance
to delete the Crawlee storage dir while the other was still using it,
resulting in ENOENT crashes.
- Add makeCrawleeConfig isolation to CheerioCrawler instances
- Switch from named persistent RequestQueue to ephemeral null queues:
named queues retain 'handled' state and skip all URLs on re-runs,
causing 0 observations on every run after the first.
- Applies to both enrichSwitchFromEbay and enrichTransceiversFromEbay.
- Add utils/crawlee-config.ts: makeCrawleeConfig(name) returns a
Crawlee Configuration with isolated localDataDirectory per scraper.
Uses storageClientOptions (not global CRAWLEE_STORAGE_DIR) so
concurrent pg-boss workers in the same process don't race on
the shared env var.
- Apply makeCrawleeConfig to all 6 Crawlee-based scrapers:
optcore (PlaywrightCrawler), atgbics (PlaywrightCrawler),
community-issues (CheerioCrawler + RequestQueue),
edgecore (CheerioCrawler), ufispace (CheerioCrawler),
market-intelligence (CheerioCrawler).
- scheduler.ts: add withIsolatedStorage for optcore and market-intel
workers (was missing, caused storage-fs path bleed from fs scraper).
- ebay-enricher.ts: fix vendor type 'marketplace' -> 'reseller' to
satisfy vendors_type_check constraint
['manufacturer','distributor','oem','reseller','compatible'].
FiberMall:
- Correct /store-XXXXX-name.htm category URLs (was /c/xxx/ → HTTP 404)
- Parser: split on new_proList_mainListLi, price from data-price on
currency_price span — fix 0.00 false-match from SKU variant items
- Also scrape SKU brand variant links from .sku_item divs
- Result: 3,410 prices now in DB (was 0)
Flexoptix:
- Fix extractPrice regex for EUR thousand-separator format
(2,921.60 EUR was parsed as 2 EUR)
- Add OSFP224 / 1.6T search queries (4 new, form factor was missing)
- Fix O.138HG2.C.05 stale price 3009.60→2921.60 EUR
Schema: competitor_verified + competitor_verified_at columns
added via ALTER TABLE (were referenced in code but missing in DB)
CHANGELOG: added 6 entries for 2026-04-12
- New scrapers: fibermall.ts (WooCommerce), vcelink.ts (Shopify), opticsbay.ts (WooCommerce)
- QSFPTEK rewritten to use /mall/commodity/list API (old OpenCart /c/*.html paths gone 404)
- New: attribute-based filtering by data rate (1G/10G/25G/40G/100G/200G/400G/800G)
- Scrapes HTML fragments, extracts US$ prices and product URLs
- scheduler.ts: +3 queues/schedules/workers (fibermall, vcelink, opticsbay) → 61 total workers
- index-pi.ts: Pi fleet picks up all 3 new scrapers
Crawlee's SessionPool throws 'Could not find SDK_SESSION_POOL_STATE.json'
when initializing against a freshly-created isolated storage dir.
Setting CRAWLEE_PURGE_ON_START=1 tells Crawlee to start fresh instead
of trying to load non-existent session state — fixes FS.com and ATGBICS
crashes at the start of every 2h cycle after the dirs were cleaned up.
ProLabs uses B2B quote model - prices require reseller account and are
not shown publicly (schema.org always shows price=0.00). Fighting
CloudFront WAF with Firefox automation is pointless.
New approach:
- Sitemap-driven: downloads all 14 sitemaps to collect product URLs
- fetch-based: curl-compatible HTTP requests bypass CloudFront TLS detection
- catalog-only: writes part numbers + specs to transceivers table
- Rate-limited: 300ms between requests (~3 req/sec)
- No proxy needed: Pi nodes no longer consumed for ProLabs
Replace hard-coded purple/green colors with theme CSS variables.
Dark code blocks (#1e1e1e bg), orange accent for active borders/badges,
dark green for status text, amber for warnings — all readable on white.
Remove boss.work() registrations for lightweight fetch/cheerio scrapers
from Erik's scheduler. Pis are now the SOLE consumers of these queues:
fluxlight, gbics, optcore, champion-one, sfpcables, blueoptics, fiber24,
tscom, skylane, ascentoptics, gaotek, smartoptics, hubersuhner, news,
market-intel.