87 lines
3.1 KiB
Markdown
87 lines
3.1 KiB
Markdown
# TIP Verification Artifact Cleanup And Vendor Completion — 2026-05-09
|
|
|
|
## Scope
|
|
|
|
- Continue TIP verification with deterministic robots only.
|
|
- Keep Erik safe by avoiding broad parallel crawl waves.
|
|
- Do not use external AI; TIPLLM training receives the lessons, not runtime inference.
|
|
- Sync all learnings into Gitea for Claude/Codex handoff.
|
|
|
|
## Implemented
|
|
|
|
- Added `verify:quarantine:non-transceivers`.
|
|
- Excludes obvious non-transceiver artifacts from active product verification.
|
|
- Clears price/image/details/competitor/fully flags on those rows.
|
|
- Covers GAO, Ascent, FS.com, Flexoptix, Arista, ShopFiber24, and Coherent artifact patterns.
|
|
- Added `verify:normalize:product-urls`.
|
|
- Repairs duplicated Mouser URL prefixes.
|
|
- Added `scrape:gaotek:details`.
|
|
- Lightweight fetch+cheerio verifier for GAO product pages.
|
|
- Hardened Ascent parser.
|
|
- Skips category/family rows before they enter the database.
|
|
- Repaired 10Gtek/SFPcables scraper.
|
|
- Passes product URL and image URL into the common verification path.
|
|
- Adds deterministic reach parsing for common meter/range text.
|
|
- Hardened scheduler reconcile.
|
|
- Does not promote excluded non-transceiver categories into `details_verified`.
|
|
|
|
## Live Runs
|
|
|
|
- Non-transceiver cleanup:
|
|
- 121 artifacts quarantined.
|
|
- 103 Flexoptix filter URL artifacts quarantined.
|
|
- 68 Ascent/category artifacts quarantined.
|
|
- 38 FS/Flex/Arista/ShopFiber/Coherent artifacts quarantined.
|
|
- 6 final FS/Flex redirect/no-source artifacts quarantined.
|
|
- GAO detail verifier:
|
|
- 245 product pages inspected.
|
|
- 181 rows updated and details verified.
|
|
- 64 skipped because the source still lacked complete deterministic specs.
|
|
- Mouser URL normalizer:
|
|
- 388 malformed `mouser.de` URLs repaired.
|
|
- 10Gtek/SFPcables:
|
|
- 50 products parsed after URL/image propagation fix.
|
|
- Ascent:
|
|
- 237 genuine products kept after category filtering.
|
|
- FS.com:
|
|
- 1 remaining DB detail page scraped.
|
|
- 1 price observation and 1 spec verification written.
|
|
- Reconcile completed.
|
|
- Equivalence matcher completed at `2026-05-09 20:11:39 UTC`.
|
|
|
|
## Final Observed State
|
|
|
|
- TIP health: healthy.
|
|
- Load: ok.
|
|
- Memory used: 13%.
|
|
- Active total: 17,405.
|
|
- Price verified: 11,523.
|
|
- Image verified: 12,125.
|
|
- Details verified: 16,810.
|
|
- Fully verified: 10,758.
|
|
|
|
## Vendor Truth
|
|
|
|
- Flexoptix:
|
|
- Active products have price/image/details complete.
|
|
- Remaining not-full rows are competitor-match only.
|
|
- FS.com:
|
|
- Active products have price/image/details complete.
|
|
- Remaining not-full rows are competitor-match only.
|
|
- GAO Tek:
|
|
- Quote-only/no public prices in crawled catalog.
|
|
- Prices were not fabricated.
|
|
- OEM-heavy vendors:
|
|
- Juniper, Cisco, Eoptolink, Ascent and similar vendors remain blocked mostly by missing public price/image/competitor evidence.
|
|
|
|
## Training Pool
|
|
|
|
- Appended four TIPLLM lessons to `training-data/tip-llm-capabilities-v1.jsonl`.
|
|
- Lessons cover:
|
|
- quote-only truthfulness
|
|
- non-transceiver artifact quarantine
|
|
- Erik-safe crawler operation
|
|
- Flexoptix/FS distinction between product-data completeness and competitor-match completeness
|
|
- JSONL validation passed.
|
|
|