3.1 KiB
3.1 KiB
TIP Verification Artifact Cleanup And Vendor Completion — 2026-05-09
Scope
- Continue TIP verification with deterministic robots only.
- Keep Erik safe by avoiding broad parallel crawl waves.
- Do not use external AI; TIPLLM training receives the lessons, not runtime inference.
- Sync all learnings into Gitea for Claude/Codex handoff.
Implemented
- Added
verify:quarantine:non-transceivers.- Excludes obvious non-transceiver artifacts from active product verification.
- Clears price/image/details/competitor/fully flags on those rows.
- Covers GAO, Ascent, FS.com, Flexoptix, Arista, ShopFiber24, and Coherent artifact patterns.
- Added
verify:normalize:product-urls.- Repairs duplicated Mouser URL prefixes.
- Added
scrape:gaotek:details.- Lightweight fetch+cheerio verifier for GAO product pages.
- Hardened Ascent parser.
- Skips category/family rows before they enter the database.
- Repaired 10Gtek/SFPcables scraper.
- Passes product URL and image URL into the common verification path.
- Adds deterministic reach parsing for common meter/range text.
- Hardened scheduler reconcile.
- Does not promote excluded non-transceiver categories into
details_verified.
- Does not promote excluded non-transceiver categories into
Live Runs
- Non-transceiver cleanup:
- 121 artifacts quarantined.
- 103 Flexoptix filter URL artifacts quarantined.
- 68 Ascent/category artifacts quarantined.
- 38 FS/Flex/Arista/ShopFiber/Coherent artifacts quarantined.
- 6 final FS/Flex redirect/no-source artifacts quarantined.
- GAO detail verifier:
- 245 product pages inspected.
- 181 rows updated and details verified.
- 64 skipped because the source still lacked complete deterministic specs.
- Mouser URL normalizer:
- 388 malformed
mouser.deURLs repaired.
- 388 malformed
- 10Gtek/SFPcables:
- 50 products parsed after URL/image propagation fix.
- Ascent:
- 237 genuine products kept after category filtering.
- FS.com:
- 1 remaining DB detail page scraped.
- 1 price observation and 1 spec verification written.
- Reconcile completed.
- Equivalence matcher completed at
2026-05-09 20:11:39 UTC.
Final Observed State
- TIP health: healthy.
- Load: ok.
- Memory used: 13%.
- Active total: 17,405.
- Price verified: 11,523.
- Image verified: 12,125.
- Details verified: 16,810.
- Fully verified: 10,758.
Vendor Truth
- Flexoptix:
- Active products have price/image/details complete.
- Remaining not-full rows are competitor-match only.
- FS.com:
- Active products have price/image/details complete.
- Remaining not-full rows are competitor-match only.
- GAO Tek:
- Quote-only/no public prices in crawled catalog.
- Prices were not fabricated.
- OEM-heavy vendors:
- Juniper, Cisco, Eoptolink, Ascent and similar vendors remain blocked mostly by missing public price/image/competitor evidence.
Training Pool
- Appended four TIPLLM lessons to
training-data/tip-llm-capabilities-v1.jsonl. - Lessons cover:
- quote-only truthfulness
- non-transceiver artifact quarantine
- Erik-safe crawler operation
- Flexoptix/FS distinction between product-data completeness and competitor-match completeness
- JSONL validation passed.