transceiver-db/sync/history/2026-05-09-tip-equivalence-auto-research.md
2026-05-09 07:48:11 +02:00

2.8 KiB

TIP Equivalence Automated Research

Date: 2026-05-09

Goal

Remove manual equivalence validation as a required workflow for TIP product verification. Low-confidence matches should be researched and either confirmed or rejected automatically.

Findings

  • The dashboard had a large Approved + Re-Research backlog.
  • approve-all was marking low-confidence rows approved, then setting re_research_due_at.
  • The re-research worker only checked whether the competitor still had a recent price; it did not re-check technical equivalence quality.
  • Many low-confidence rows were objectively bad matches:
    • reach mismatches
    • wavelength mismatches
    • missing reach evidence
    • fiber mismatches

Code Changes

  • packages/api/src/routes/review.ts

    • approve-all now approves only confidence >= 0.73.
    • Weak rows stay pending and get queued for automated research.
    • needs_research includes pending research rows.
    • Added POST /api/review/run-research.
  • packages/scraper/src/scheduler.ts

    • Added deterministic equivalence evaluator.
    • Confirms matches only when there is:
      • recent competitor price
      • matching form factor
      • matching speed
      • matching fiber type
      • matching wavelength
      • compatible reach
      • confidence >= 0.73
    • Rejects stale, incomplete, contradictory, or low-confidence matches automatically.
    • Confirmed matches get a 30-day recheck.

Deployment

  • Synced code to Erik /opt/tip.
  • Built on Erik:
    • pnpm -C packages/api build
    • pnpm -C packages/scraper build
  • Restarted:
    • tip-api
    • tip-scraper-daemon
  • Both were online after restart.

Live Data Cleanup

No heavy crawler wave was started. Cleanup used existing crawled specs and price observations.

Processed pending + due re-research:

  • total: 144103
  • rejected fiber mismatch: 958
  • rejected reach mismatch: 82128
  • rejected missing reach evidence: 31151
  • rejected wavelength mismatch: 29865
  • rejected low confidence: 1

Processed old approved rows:

  • confirmed: 1986
  • rejected fiber mismatch: 184
  • rejected reach mismatch: 1704
  • rejected missing reach evidence: 1117
  • rejected wavelength mismatch: 993
  • rejected low confidence: 2

Processed old auto-approved rows:

  • confirmed: 32080
  • rejected reach mismatch: 260

Final State

  • pending: 0
  • approved: 1986
  • auto_approved: 32080
  • rejected: 148367
  • due re-research now: 0
  • scheduled 30-day rechecks: 34066

Product verification counters after reconcile:

  • competitor_verified: 11137
  • fully_verified: 290
  • price_verified: 11549
  • image_verified: 10629
  • details_verified: 9538

Next Work

Products rejected for missing reach/details should be enriched by targeted vendor crawlers. Keep Erik light; use Proxmox/Pi workers for heavier crawl waves. TIPLLM-only policy remains active for crawler/robot research and learning records.