transceiver-db/sync/history/2026-05-09-near-complete-detail-queue-closure.md
2026-05-09 18:22:09 +02:00

2.4 KiB

Near-Complete Detail Queue Closure

Date: 2026-05-09 Scope: TIP transceiver detail verification for rows already backed by price, image, and competitor evidence

Goal

Close the remaining near-complete rows without manual approval and without launching heavy crawler/browser workloads on Erik.

Implemented

  • Added packages/scraper/src/scrapers/atgbics-detail-pages.ts
    • lightweight Shopify product.js fetcher
    • no browser, no Playwright
    • strict parser for form factor, speed, reach, media, wavelength, connector, and product class
  • Added packages/scraper/src/scrapers/shopfiber24-fibermall-detail-pages.ts
    • lightweight static HTML fetcher
    • FiberMall uses Schema.org Product JSON-LD
    • ShopFiber24 uses static title/meta/description evidence
  • Added package scripts:
    • scrape:atgbics:details
    • scrape:vendors:details

Results

  • ATGBICS:
    • first product.js run: fetched 107, updated 97, skipped 10, promoted 97
    • parser patch: Max Distance_N/A no longer blocks title/body distance evidence
    • final product.js run: fetched 10, updated 10, skipped 0, promoted 10
    • near-complete missing details: 0
  • FiberMall + ShopFiber24:
    • first detail run: fetched 116, updated 112, skipped 4, promoted 112
    • final semantic closure: fetched 4, updated 4, skipped 0, promoted 4
    • FiberMall near-complete missing details: 0
    • ShopFiber24 near-complete missing details: 0

Truth Rules

  • Do not turn a variable AOC/DAC or category page into a fake fixed-distance transceiver.
  • Use Variant reach for source-backed product families.
  • Classify switches, media converters, muxes, and adapters as their actual product class.
  • Classify 100G DWDM DCO as Coherent DWDM with line-system-dependent reach when no normal reach is stated.
  • FiberMall source titles can repair brand-only part numbers when the source page provides a concrete MPN/product code.

Final Live State

  • details_verified=12253
  • fully_verified=10976
  • near-complete queue:
    • price_verified=true
    • image_verified=true
    • competitor_verified=true
    • details_verified=false
    • result: 0
  • Public health:
    • status: healthy
    • load status: ok
    • memory used: 12%

Safety

  • No external AI was used.
  • No browser crawler was started.
  • Erik SSH flapped several times; work paused between retries instead of hammering the host.
  • All crawler/parser learnings were mirrored into the TIPLLM training pool.