2.6 KiB
2.6 KiB
Near-Complete Detail Queue Closure
Date: 2026-05-09 Scope: TIP transceiver detail verification for rows already backed by price, image, and competitor evidence
Goal
Close the remaining near-complete rows without manual approval and without launching heavy crawler/browser workloads on Erik.
Implemented
- Added
packages/scraper/src/scrapers/atgbics-detail-pages.ts- lightweight Shopify
product.jsfetcher - no browser, no Playwright
- strict parser for form factor, speed, reach, media, wavelength, connector, and product class
- lightweight Shopify
- Added
packages/scraper/src/scrapers/shopfiber24-fibermall-detail-pages.ts- lightweight static HTML fetcher
- FiberMall uses Schema.org Product JSON-LD
- ShopFiber24 uses static title/meta/description evidence
- Added package scripts:
scrape:atgbics:detailsscrape:vendors:details
Results
- ATGBICS:
- first product.js run: fetched
107, updated97, skipped10, promoted97 - parser patch:
Max Distance_N/Ano longer blocks title/body distance evidence - final product.js run: fetched
10, updated10, skipped0, promoted10 - concurrent price-verification exposed another AOC batch; follow-up run fetched
23, updated23, skipped0, promoted23 - near-complete missing details:
0
- first product.js run: fetched
- FiberMall + ShopFiber24:
- first detail run: fetched
116, updated112, skipped4, promoted112 - final semantic closure: fetched
4, updated4, skipped0, promoted4 - FiberMall near-complete missing details:
0 - ShopFiber24 near-complete missing details:
0
- first detail run: fetched
Truth Rules
- Do not turn a variable AOC/DAC or category page into a fake fixed-distance transceiver.
- Use
Variantreach for source-backed product families. - Classify switches, media converters, muxes, and adapters as their actual product class.
- Classify 100G DWDM DCO as
Coherent DWDMwith line-system-dependent reach when no normal reach is stated. - FiberMall source titles can repair brand-only part numbers when the source page provides a concrete MPN/product code.
Final Live State
price_verified=11582details_verified=12276fully_verified=11001- near-complete queue:
price_verified=trueimage_verified=truecompetitor_verified=truedetails_verified=false- result:
0
- Public health:
- status:
healthy - load status:
ok - memory used:
12%
- status:
Safety
- No external AI was used.
- No browser crawler was started.
- Erik SSH flapped several times; work paused between retries instead of hammering the host.
- All crawler/parser learnings were mirrored into the TIPLLM training pool.