CODEX-TASK-flexoptix-reference-matching.md — comprehensive plan to fix
zero-match gap for ATGBICS/NADDOD/10Gtek/ShopFiber24 (8.260+ products
with 0 Flexoptix equivalences).
Root cause: 30-day price_observation window excludes vendors whose
scrapers ran >30 days ago. Solution: catalog-reconcile robot (full
bulk match, no time limit), form_factor normalization (SQL 108),
30→90 day window fix in nightly matcher, on-demand API endpoint.
Expected: coverage from 22% → 45-60% after one reconcile run.
- index-pi.ts: removed Playwright scrapers (FS.COM, eBay enricher, switch assets)
added NADDOD (fetch-based, benefits from residential IP)
now 32 fetch-only queues safe for ARM/Pi without Chromium
- index-fs-only.ts: new dedicated FS.COM + NADDOD worker for Erik
routes through Pi SOCKS5 via PROXY_URLS=socks5://10.10.0.6:1080
Crawlee ProxyConfiguration automatically applies to Playwright crawler
- pi-scraper-setup.sh: removed inline index-pi.ts override (repo version now authoritative)
- CODEX-TASK-pi-scraper-deploy.md: full 9-step Codex spec for Pi fleet setup
covers WireGuard keypair, Erik peer config, setup script, ecosystem.config.js
- CODEX-TASK-zero-manual-review.md: deterministic equivalence matcher spec
NADDOD uses LD+JSON for pricing (Astro/Shopify structure):
{"offers":{"price":"731.00","priceCurrency":"USD",...}}
Old regex (/US$\s*.../) never matched → all 132 price obs were lucky
text matches, not systematic. Now: parse all ld+json blocks first,
fall back to regex.
Also broaden sitemap URL regex to capture new-style URLs without .html:
/products/nvidia-networking/102612 (was being missed)
FS.com changed their HTML structure; compound class names are gone.
Current layout (verified 2026-05-06):
<div class="no_tax">5,10 € ohne MwSt.</div> ← B2B net price (preferred)
<div class="price">6,07 €</div> ← gross fallback
<div class="standard_price">6,07 €</div> ← gross fallback
Old selectors ([class*='price-value'] etc.) matched nothing → all prices
stored as €? null. New .no_tax first gives us the correct net/B2B price.
Previously always sliced first 600 URLs from sitemap, missing 6700+ products.
Now stores offset in naddod-cursor.json, advances by 600 per run with wrap-around.
Full sitemap coverage in ~13 runs (26h). Also adds TIP_STORAGE_DIR env support.
Static HTML collection pages return wrong results (all redirect to same 9 products).
Switch to /collections/{handle}/products.json?limit=250&page=N API which is:
- Reliable JSON (no HTML parsing)
- Correct per-collection product lists
- Clean pagination (stop at < limit results)
- Covers 11 key transceiver collections (1G, 10G, 25G, 40G, 100G, 400G)
- upsertPriceObservation: insert new observation if last one is >7 days old,
even when price (content_hash) hasn't changed — keeps timeseries data fresh
- ATGBICS: detect Shopify catalog wrap-around by tracking per-category seen URLs;
stop pagination when all products on a page were already seen in a prior page
- ATGBICS: improve hasNextPage to match &page=N anchored in href params
- scheduler: patch boss.schedule() to call createQueue() first (idempotent),
fixing FK constraint errors after DB reset — no need to touch 277 call sites
- index: registerWorkers() before registerSchedules() since boss.work() must
register handlers before schedules fire
- dashboard: fix switchBlogLlm() to use api() helper (adds Bearer auth token)
instead of raw fetch() which was returning 401 Unauthorized