transceiver-db/sync/history/2026-05-09-tip-equivalence-auto-research.md
2026-05-09 07:48:11 +02:00

99 lines
2.8 KiB
Markdown

# TIP Equivalence Automated Research
Date: 2026-05-09
## Goal
Remove manual equivalence validation as a required workflow for TIP product verification. Low-confidence matches should be researched and either confirmed or rejected automatically.
## Findings
- The dashboard had a large `Approved + Re-Research` backlog.
- `approve-all` was marking low-confidence rows approved, then setting `re_research_due_at`.
- The re-research worker only checked whether the competitor still had a recent price; it did not re-check technical equivalence quality.
- Many low-confidence rows were objectively bad matches:
- reach mismatches
- wavelength mismatches
- missing reach evidence
- fiber mismatches
## Code Changes
- `packages/api/src/routes/review.ts`
- `approve-all` now approves only confidence >= `0.73`.
- Weak rows stay pending and get queued for automated research.
- `needs_research` includes pending research rows.
- Added `POST /api/review/run-research`.
- `packages/scraper/src/scheduler.ts`
- Added deterministic equivalence evaluator.
- Confirms matches only when there is:
- recent competitor price
- matching form factor
- matching speed
- matching fiber type
- matching wavelength
- compatible reach
- confidence >= `0.73`
- Rejects stale, incomplete, contradictory, or low-confidence matches automatically.
- Confirmed matches get a 30-day recheck.
## Deployment
- Synced code to Erik `/opt/tip`.
- Built on Erik:
- `pnpm -C packages/api build`
- `pnpm -C packages/scraper build`
- Restarted:
- `tip-api`
- `tip-scraper-daemon`
- Both were online after restart.
## Live Data Cleanup
No heavy crawler wave was started. Cleanup used existing crawled specs and price observations.
Processed pending + due re-research:
- total: `144103`
- rejected fiber mismatch: `958`
- rejected reach mismatch: `82128`
- rejected missing reach evidence: `31151`
- rejected wavelength mismatch: `29865`
- rejected low confidence: `1`
Processed old approved rows:
- confirmed: `1986`
- rejected fiber mismatch: `184`
- rejected reach mismatch: `1704`
- rejected missing reach evidence: `1117`
- rejected wavelength mismatch: `993`
- rejected low confidence: `2`
Processed old auto-approved rows:
- confirmed: `32080`
- rejected reach mismatch: `260`
## Final State
- pending: `0`
- approved: `1986`
- auto_approved: `32080`
- rejected: `148367`
- due re-research now: `0`
- scheduled 30-day rechecks: `34066`
Product verification counters after reconcile:
- competitor_verified: `11137`
- fully_verified: `290`
- price_verified: `11549`
- image_verified: `10629`
- details_verified: `9538`
## Next Work
Products rejected for missing reach/details should be enriched by targeted vendor crawlers. Keep Erik light; use Proxmox/Pi workers for heavier crawl waves. TIPLLM-only policy remains active for crawler/robot research and learning records.