transceiver-db/sync/history/2026-05-09-tip-verification-artifact-cleanup-and-vendor-completion.md
2026-05-09 22:16:29 +02:00

3.1 KiB

TIP Verification Artifact Cleanup And Vendor Completion — 2026-05-09

Scope

  • Continue TIP verification with deterministic robots only.
  • Keep Erik safe by avoiding broad parallel crawl waves.
  • Do not use external AI; TIPLLM training receives the lessons, not runtime inference.
  • Sync all learnings into Gitea for Claude/Codex handoff.

Implemented

  • Added verify:quarantine:non-transceivers.
    • Excludes obvious non-transceiver artifacts from active product verification.
    • Clears price/image/details/competitor/fully flags on those rows.
    • Covers GAO, Ascent, FS.com, Flexoptix, Arista, ShopFiber24, and Coherent artifact patterns.
  • Added verify:normalize:product-urls.
    • Repairs duplicated Mouser URL prefixes.
  • Added scrape:gaotek:details.
    • Lightweight fetch+cheerio verifier for GAO product pages.
  • Hardened Ascent parser.
    • Skips category/family rows before they enter the database.
  • Repaired 10Gtek/SFPcables scraper.
    • Passes product URL and image URL into the common verification path.
    • Adds deterministic reach parsing for common meter/range text.
  • Hardened scheduler reconcile.
    • Does not promote excluded non-transceiver categories into details_verified.

Live Runs

  • Non-transceiver cleanup:
    • 121 artifacts quarantined.
    • 103 Flexoptix filter URL artifacts quarantined.
    • 68 Ascent/category artifacts quarantined.
    • 38 FS/Flex/Arista/ShopFiber/Coherent artifacts quarantined.
    • 6 final FS/Flex redirect/no-source artifacts quarantined.
  • GAO detail verifier:
    • 245 product pages inspected.
    • 181 rows updated and details verified.
    • 64 skipped because the source still lacked complete deterministic specs.
  • Mouser URL normalizer:
    • 388 malformed mouser.de URLs repaired.
  • 10Gtek/SFPcables:
    • 50 products parsed after URL/image propagation fix.
  • Ascent:
    • 237 genuine products kept after category filtering.
  • FS.com:
    • 1 remaining DB detail page scraped.
    • 1 price observation and 1 spec verification written.
  • Reconcile completed.
  • Equivalence matcher completed at 2026-05-09 20:11:39 UTC.

Final Observed State

  • TIP health: healthy.
  • Load: ok.
  • Memory used: 13%.
  • Active total: 17,405.
  • Price verified: 11,523.
  • Image verified: 12,125.
  • Details verified: 16,810.
  • Fully verified: 10,758.

Vendor Truth

  • Flexoptix:
    • Active products have price/image/details complete.
    • Remaining not-full rows are competitor-match only.
  • FS.com:
    • Active products have price/image/details complete.
    • Remaining not-full rows are competitor-match only.
  • GAO Tek:
    • Quote-only/no public prices in crawled catalog.
    • Prices were not fabricated.
  • OEM-heavy vendors:
    • Juniper, Cisco, Eoptolink, Ascent and similar vendors remain blocked mostly by missing public price/image/competitor evidence.

Training Pool

  • Appended four TIPLLM lessons to training-data/tip-llm-capabilities-v1.jsonl.
  • Lessons cover:
    • quote-only truthfulness
    • non-transceiver artifact quarantine
    • Erik-safe crawler operation
    • Flexoptix/FS distinction between product-data completeness and competitor-match completeness
  • JSONL validation passed.