transceiver-db/sync/history/2026-05-09-fscom-targeted-verification-push.md
2026-05-09 11:15:46 +02:00

3.3 KiB

FS.com / Fiberstore Targeted Verification Push

Date: 2026-05-09

Intent

Continue TIP data completion for FS.com/Fiberstore after Flexoptix. The operator requested price, image and product information to be researched deeply enough to avoid manual validation, while keeping Erik safe and writing every crawler/scraper/robot learning into the TIPLLM training pool.

Code Changed

  • packages/scraper/src/scrapers/fs-com.ts
    • added FS_DB_DETAIL_ONLY=1
    • targets existing FS.COM DB product URLs with missing verification signals
    • avoids broad category discovery while known product URLs still need work
    • improved reach parsing for comma/decimal values
    • added deterministic fiber type fallback from product name, part number and specs
    • writes product URL to transceivers.product_page_url
    • stores the real FS.com product URL as detail verification source

Live Runs

All runs were on Erik with:

  • Playwright concurrency 1
  • nice -n 10
  • no broad category crawl
  • DB-detail-only mode

Batch results:

  • Batch 1: target 80, scraped 80, failed 0, new prices 17, stock 18, specs 24
  • Batch 2: target 80, scraped 79, failed 0, new prices 6, stock 8, specs 23
  • Batch 3: target 90, scraped 89, failed 0, new prices 21, stock 24, specs 47
  • Batch 4 closure: target 42, scraped 42, failed 0, new prices 5, stock 3, specs 25

pnpm -C packages/scraper build passed on Erik after the scraper change.

FS.com Counters

Before:

  • total rows: 383
  • price verified: 379
  • image verified: 299
  • details verified: 108
  • price+image+details: 108
  • fully verified: 3
  • missing URL: 76
  • missing image URL: 84
  • missing reach label: 9
  • missing fiber type: 323
  • HTML product-like complete: 106

After closure:

  • total rows: 383
  • price verified: 379
  • image verified: 299
  • details verified: 260
  • price+image+details: 260
  • fully verified: 205
  • missing URL: 76
  • missing image URL: 84
  • missing reach label: 9
  • missing fiber type: 123
  • HTML product-like rows: 299
  • HTML product-like complete: 258
  • no-url rows: 76
  • category rows: 4

TIP health after closure:

  • status: healthy
  • load status: ok
  • memory used: 13%
  • transceivers: 17647
  • vendors: 478
  • switches: 680
  • fully verified globally: 8522

Training Pool

FS.com batches were written to /tmp/tip-training-data and pushed to Gitea.

Training pool commits:

  • 28cac05 crawl: add fscom db detail batch learning record
  • a0a6be3 crawl: add fscom db detail batch 2 learning record
  • 38736ae crawl: add fscom db detail batch 3 learning record
  • 2c25bf3 crawl: add fscom db detail closure learning record

Next

Do not repeat the same DB-detail-only FS.com crawler on Erik. The fourth clean closure batch did not increase details_verified, so the remaining gaps need a different strategy:

  • source-discovery/classification for 76 no-url rows
  • parser/source diagnostics for the remaining 41 HTML product-like rows missing details or fiber/image signals
  • explicit classification for 4 category rows
  • likely cleanup of historical/malformed /de/de/products/... URLs and no-text pages

Truth rule: do not claim FS.com is complete. Current honest status is 258/299 HTML product-like rows complete and 205/383 fully verified overall.