3.3 KiB
FS.com / Fiberstore Targeted Verification Push
Date: 2026-05-09
Intent
Continue TIP data completion for FS.com/Fiberstore after Flexoptix. The operator requested price, image and product information to be researched deeply enough to avoid manual validation, while keeping Erik safe and writing every crawler/scraper/robot learning into the TIPLLM training pool.
Code Changed
packages/scraper/src/scrapers/fs-com.ts- added
FS_DB_DETAIL_ONLY=1 - targets existing FS.COM DB product URLs with missing verification signals
- avoids broad category discovery while known product URLs still need work
- improved reach parsing for comma/decimal values
- added deterministic fiber type fallback from product name, part number and specs
- writes product URL to
transceivers.product_page_url - stores the real FS.com product URL as detail verification source
- added
Live Runs
All runs were on Erik with:
- Playwright concurrency
1 nice -n 10- no broad category crawl
- DB-detail-only mode
Batch results:
- Batch 1: target
80, scraped80, failed0, new prices17, stock18, specs24 - Batch 2: target
80, scraped79, failed0, new prices6, stock8, specs23 - Batch 3: target
90, scraped89, failed0, new prices21, stock24, specs47 - Batch 4 closure: target
42, scraped42, failed0, new prices5, stock3, specs25
pnpm -C packages/scraper build passed on Erik after the scraper change.
FS.com Counters
Before:
- total rows:
383 - price verified:
379 - image verified:
299 - details verified:
108 - price+image+details:
108 - fully verified:
3 - missing URL:
76 - missing image URL:
84 - missing reach label:
9 - missing fiber type:
323 - HTML product-like complete:
106
After closure:
- total rows:
383 - price verified:
379 - image verified:
299 - details verified:
260 - price+image+details:
260 - fully verified:
205 - missing URL:
76 - missing image URL:
84 - missing reach label:
9 - missing fiber type:
123 - HTML product-like rows:
299 - HTML product-like complete:
258 - no-url rows:
76 - category rows:
4
TIP health after closure:
- status:
healthy - load status:
ok - memory used:
13% - transceivers:
17647 - vendors:
478 - switches:
680 - fully verified globally:
8522
Training Pool
FS.com batches were written to /tmp/tip-training-data and pushed to Gitea.
Training pool commits:
28cac05 crawl: add fscom db detail batch learning recorda0a6be3 crawl: add fscom db detail batch 2 learning record38736ae crawl: add fscom db detail batch 3 learning record2c25bf3 crawl: add fscom db detail closure learning record
Next
Do not repeat the same DB-detail-only FS.com crawler on Erik. The fourth clean closure batch did not increase details_verified, so the remaining gaps need a different strategy:
- source-discovery/classification for
76no-url rows - parser/source diagnostics for the remaining
41HTML product-like rows missing details or fiber/image signals - explicit classification for
4category rows - likely cleanup of historical/malformed
/de/de/products/...URLs and no-text pages
Truth rule: do not claim FS.com is complete. Current honest status is 258/299 HTML product-like rows complete and 205/383 fully verified overall.