68 lines
2.6 KiB
Markdown
68 lines
2.6 KiB
Markdown
# Near-Complete Detail Queue Closure
|
|
|
|
Date: 2026-05-09
|
|
Scope: TIP transceiver detail verification for rows already backed by price, image, and competitor evidence
|
|
|
|
## Goal
|
|
|
|
Close the remaining near-complete rows without manual approval and without launching heavy crawler/browser workloads on Erik.
|
|
|
|
## Implemented
|
|
|
|
- Added `packages/scraper/src/scrapers/atgbics-detail-pages.ts`
|
|
- lightweight Shopify `product.js` fetcher
|
|
- no browser, no Playwright
|
|
- strict parser for form factor, speed, reach, media, wavelength, connector, and product class
|
|
- Added `packages/scraper/src/scrapers/shopfiber24-fibermall-detail-pages.ts`
|
|
- lightweight static HTML fetcher
|
|
- FiberMall uses Schema.org Product JSON-LD
|
|
- ShopFiber24 uses static title/meta/description evidence
|
|
- Added package scripts:
|
|
- `scrape:atgbics:details`
|
|
- `scrape:vendors:details`
|
|
|
|
## Results
|
|
|
|
- ATGBICS:
|
|
- first product.js run: fetched `107`, updated `97`, skipped `10`, promoted `97`
|
|
- parser patch: `Max Distance_N/A` no longer blocks title/body distance evidence
|
|
- final product.js run: fetched `10`, updated `10`, skipped `0`, promoted `10`
|
|
- concurrent price-verification exposed another AOC batch; follow-up run fetched `23`, updated `23`, skipped `0`, promoted `23`
|
|
- near-complete missing details: `0`
|
|
- FiberMall + ShopFiber24:
|
|
- first detail run: fetched `116`, updated `112`, skipped `4`, promoted `112`
|
|
- final semantic closure: fetched `4`, updated `4`, skipped `0`, promoted `4`
|
|
- FiberMall near-complete missing details: `0`
|
|
- ShopFiber24 near-complete missing details: `0`
|
|
|
|
## Truth Rules
|
|
|
|
- Do not turn a variable AOC/DAC or category page into a fake fixed-distance transceiver.
|
|
- Use `Variant` reach for source-backed product families.
|
|
- Classify switches, media converters, muxes, and adapters as their actual product class.
|
|
- Classify 100G DWDM DCO as `Coherent DWDM` with line-system-dependent reach when no normal reach is stated.
|
|
- FiberMall source titles can repair brand-only part numbers when the source page provides a concrete MPN/product code.
|
|
|
|
## Final Live State
|
|
|
|
- `price_verified=11582`
|
|
- `details_verified=12276`
|
|
- `fully_verified=11001`
|
|
- near-complete queue:
|
|
- `price_verified=true`
|
|
- `image_verified=true`
|
|
- `competitor_verified=true`
|
|
- `details_verified=false`
|
|
- result: `0`
|
|
- Public health:
|
|
- status: `healthy`
|
|
- load status: `ok`
|
|
- memory used: `12%`
|
|
|
|
## Safety
|
|
|
|
- No external AI was used.
|
|
- No browser crawler was started.
|
|
- Erik SSH flapped several times; work paused between retries instead of hammering the host.
|
|
- All crawler/parser learnings were mirrored into the TIPLLM training pool.
|