transceiver-db/sync/history/2026-05-09-fscom-targeted-verification-push.md
2026-05-09 11:15:46 +02:00

102 lines
3.3 KiB
Markdown

# FS.com / Fiberstore Targeted Verification Push
Date: 2026-05-09
## Intent
Continue TIP data completion for FS.com/Fiberstore after Flexoptix. The operator requested price, image and product information to be researched deeply enough to avoid manual validation, while keeping Erik safe and writing every crawler/scraper/robot learning into the TIPLLM training pool.
## Code Changed
- `packages/scraper/src/scrapers/fs-com.ts`
- added `FS_DB_DETAIL_ONLY=1`
- targets existing FS.COM DB product URLs with missing verification signals
- avoids broad category discovery while known product URLs still need work
- improved reach parsing for comma/decimal values
- added deterministic fiber type fallback from product name, part number and specs
- writes product URL to `transceivers.product_page_url`
- stores the real FS.com product URL as detail verification source
## Live Runs
All runs were on Erik with:
- Playwright concurrency `1`
- `nice -n 10`
- no broad category crawl
- DB-detail-only mode
Batch results:
- Batch 1: target `80`, scraped `80`, failed `0`, new prices `17`, stock `18`, specs `24`
- Batch 2: target `80`, scraped `79`, failed `0`, new prices `6`, stock `8`, specs `23`
- Batch 3: target `90`, scraped `89`, failed `0`, new prices `21`, stock `24`, specs `47`
- Batch 4 closure: target `42`, scraped `42`, failed `0`, new prices `5`, stock `3`, specs `25`
`pnpm -C packages/scraper build` passed on Erik after the scraper change.
## FS.com Counters
Before:
- total rows: `383`
- price verified: `379`
- image verified: `299`
- details verified: `108`
- price+image+details: `108`
- fully verified: `3`
- missing URL: `76`
- missing image URL: `84`
- missing reach label: `9`
- missing fiber type: `323`
- HTML product-like complete: `106`
After closure:
- total rows: `383`
- price verified: `379`
- image verified: `299`
- details verified: `260`
- price+image+details: `260`
- fully verified: `205`
- missing URL: `76`
- missing image URL: `84`
- missing reach label: `9`
- missing fiber type: `123`
- HTML product-like rows: `299`
- HTML product-like complete: `258`
- no-url rows: `76`
- category rows: `4`
TIP health after closure:
- status: `healthy`
- load status: `ok`
- memory used: `13%`
- transceivers: `17647`
- vendors: `478`
- switches: `680`
- fully verified globally: `8522`
## Training Pool
FS.com batches were written to `/tmp/tip-training-data` and pushed to Gitea.
Training pool commits:
- `28cac05 crawl: add fscom db detail batch learning record`
- `a0a6be3 crawl: add fscom db detail batch 2 learning record`
- `38736ae crawl: add fscom db detail batch 3 learning record`
- `2c25bf3 crawl: add fscom db detail closure learning record`
## Next
Do not repeat the same DB-detail-only FS.com crawler on Erik. The fourth clean closure batch did not increase `details_verified`, so the remaining gaps need a different strategy:
- source-discovery/classification for `76` no-url rows
- parser/source diagnostics for the remaining `41` HTML product-like rows missing details or fiber/image signals
- explicit classification for `4` category rows
- likely cleanup of historical/malformed `/de/de/products/...` URLs and no-text pages
Truth rule: do not claim FS.com is complete. Current honest status is `258/299` HTML product-like rows complete and `205/383` fully verified overall.