99 lines
2.8 KiB
Markdown
99 lines
2.8 KiB
Markdown
# TIP Equivalence Automated Research
|
|
|
|
Date: 2026-05-09
|
|
|
|
## Goal
|
|
|
|
Remove manual equivalence validation as a required workflow for TIP product verification. Low-confidence matches should be researched and either confirmed or rejected automatically.
|
|
|
|
## Findings
|
|
|
|
- The dashboard had a large `Approved + Re-Research` backlog.
|
|
- `approve-all` was marking low-confidence rows approved, then setting `re_research_due_at`.
|
|
- The re-research worker only checked whether the competitor still had a recent price; it did not re-check technical equivalence quality.
|
|
- Many low-confidence rows were objectively bad matches:
|
|
- reach mismatches
|
|
- wavelength mismatches
|
|
- missing reach evidence
|
|
- fiber mismatches
|
|
|
|
## Code Changes
|
|
|
|
- `packages/api/src/routes/review.ts`
|
|
- `approve-all` now approves only confidence >= `0.73`.
|
|
- Weak rows stay pending and get queued for automated research.
|
|
- `needs_research` includes pending research rows.
|
|
- Added `POST /api/review/run-research`.
|
|
|
|
- `packages/scraper/src/scheduler.ts`
|
|
- Added deterministic equivalence evaluator.
|
|
- Confirms matches only when there is:
|
|
- recent competitor price
|
|
- matching form factor
|
|
- matching speed
|
|
- matching fiber type
|
|
- matching wavelength
|
|
- compatible reach
|
|
- confidence >= `0.73`
|
|
- Rejects stale, incomplete, contradictory, or low-confidence matches automatically.
|
|
- Confirmed matches get a 30-day recheck.
|
|
|
|
## Deployment
|
|
|
|
- Synced code to Erik `/opt/tip`.
|
|
- Built on Erik:
|
|
- `pnpm -C packages/api build`
|
|
- `pnpm -C packages/scraper build`
|
|
- Restarted:
|
|
- `tip-api`
|
|
- `tip-scraper-daemon`
|
|
- Both were online after restart.
|
|
|
|
## Live Data Cleanup
|
|
|
|
No heavy crawler wave was started. Cleanup used existing crawled specs and price observations.
|
|
|
|
Processed pending + due re-research:
|
|
|
|
- total: `144103`
|
|
- rejected fiber mismatch: `958`
|
|
- rejected reach mismatch: `82128`
|
|
- rejected missing reach evidence: `31151`
|
|
- rejected wavelength mismatch: `29865`
|
|
- rejected low confidence: `1`
|
|
|
|
Processed old approved rows:
|
|
|
|
- confirmed: `1986`
|
|
- rejected fiber mismatch: `184`
|
|
- rejected reach mismatch: `1704`
|
|
- rejected missing reach evidence: `1117`
|
|
- rejected wavelength mismatch: `993`
|
|
- rejected low confidence: `2`
|
|
|
|
Processed old auto-approved rows:
|
|
|
|
- confirmed: `32080`
|
|
- rejected reach mismatch: `260`
|
|
|
|
## Final State
|
|
|
|
- pending: `0`
|
|
- approved: `1986`
|
|
- auto_approved: `32080`
|
|
- rejected: `148367`
|
|
- due re-research now: `0`
|
|
- scheduled 30-day rechecks: `34066`
|
|
|
|
Product verification counters after reconcile:
|
|
|
|
- competitor_verified: `11137`
|
|
- fully_verified: `290`
|
|
- price_verified: `11549`
|
|
- image_verified: `10629`
|
|
- details_verified: `9538`
|
|
|
|
## Next Work
|
|
|
|
Products rejected for missing reach/details should be enriched by targeted vendor crawlers. Keep Erik light; use Proxmox/Pi workers for heavier crawl waves. TIPLLM-only policy remains active for crawler/robot research and learning records.
|