transceiver-db/sync/history/2026-05-09-atgbics-json-rerun-and-copper-wavelength-semantics.md
2026-05-09 16:41:18 +02:00

63 lines
2.0 KiB
Markdown

# ATGBICS JSON Rerun And Copper Wavelength Semantics
Date: 2026-05-09
Actor: Codex
Scope: ATGBICS scraper and highspeed wavelength completeness
Mode: Erik-safe Shopify JSON, no browser crawler
## Code Change
Patched `packages/scraper/src/scrapers/atgbics.ts`:
- Copper/DAC/Twinax/Base-T/RJ45 products now produce `wavelengths=N/A`
- CWDM4 now produces `1271,1291,1311,1331`
- SR/SR4/SR8/SRBD/VR/ESR/CSR family now produces `850`
- DR/FR/LR/ER/PSM family now produces `1310`
The scraper source was synced to active `/opt/tip` and `pnpm -C packages/scraper build` passed on Erik.
## Runtime
Ran one light ATGBICS Shopify `products.json` pass:
- products processed: `7946`
- price updates: `61`
- image observations/updates: `7943`
- no Playwright/browser crawler
- command used `nice -n 10`
## Result
ATGBICS counters stayed effectively unchanged:
- total: `8269`
- price verified: `8241`
- image verified: `8257`
- details verified: `7435`
- fully verified: `7428`
- highspeed missing wavelengths: `663`
Reason: remaining ATGBICS highspeed wavelength gaps include many Cable/Copper and coherent/ZR/DCO/C-band variants. These need targeted classification and parser work rather than another broad JSON pass.
## Copper Truth Correction
Copper/DAC products do not have optical wavelengths. Empty Copper `wavelengths` were set to `N/A`:
- rows updated: `1044`
- highspeed missing-wavelength count before Copper correction: `1908`
- highspeed missing-wavelength count after Copper correction: `1360`
- highspeed Copper missing after correction: `0`
- remaining optical/non-Copper highspeed missing: `1220`
## Health
Public TIP health after run/update:
- status: `healthy`
- load status: `ok`
- memory used: `14%`
## Training Note
TIPLLM should not count Copper/DAC/Twinax products as missing optical wavelengths. Use `N/A` for wavelength semantics. For ATGBICS, another broad Shopify JSON pass is low-risk but low-yield; next useful work is targeted parsing of URL/title classes such as `ZR`, `DCO`, `C-band`, `LAN-WDM`, `CR8`, `breakout`, and cable form-factor correction.