sync: record no-valid matcher closure

This commit is contained in:
Rene Fichtmueller 2026-05-09 23:24:55 +02:00
parent 79a57a5ac6
commit 7b8e229cf0
3 changed files with 46 additions and 8 deletions

View File

@ -1,6 +1,6 @@
# Current TIP Sync State # Current TIP Sync State
Updated: 2026-05-09 21:13 UTC Updated: 2026-05-09 21:24 UTC
## Newest Work ## Newest Work
@ -34,11 +34,18 @@ Updated: 2026-05-09 21:13 UTC
- active competitor status: - active competitor status:
- `matched=11158` - `matched=11158`
- `no_valid_match=73` - `no_valid_match=73`
- `ambiguous=0` - `ambiguous=192`
- `needs_research=6183` - `needs_research=5991`
- operational note: - operational note:
- `tip-scraper-daemon` was not restarted during active pricing jobs - `tip-scraper-daemon` was initially not restarted while QSFPTEK/NADDOD pricing jobs were active
- restart daemon only after active pg-boss jobs finish, then run reconcile/matcher again - after those jobs cleared, `tip-scraper-daemon` was restarted once
- `maintenance:reconcile-verification` completed
- `maintenance:find-equivalences` completed
- matcher correctly moved `192` products into `ambiguous` instead of inventing unsafe matches
- remaining fully populated product rows with `needs_research`:
- `FS.COM=74`
- `Flexoptix=15`
- `ATGBICS=2`
- TIPLLM training pool: - TIPLLM training pool:
- appended deterministic no-valid-match resolver lessons - appended deterministic no-valid-match resolver lessons
- JSONL must remain valid after every append - JSONL must remain valid after every append

View File

@ -49,7 +49,7 @@ On Erik:
- `73` products earned `fully_verified` - `73` products earned `fully_verified`
- evidence ledger now has `73` `competitor_no_match` records - evidence ledger now has `73` `competitor_no_match` records
Live health after the run: Live health after the resolver run:
- active products: `17414` - active products: `17414`
- price verified: `11523` - price verified: `11523`
@ -62,8 +62,38 @@ Live health after the run:
- `ambiguous=0` - `ambiguous=0`
- `needs_research=6183` - `needs_research=6183`
## Post-Restart Reconcile / Matcher
After QSFPTEK/NADDOD pricing jobs cleared:
- restarted `tip-scraper-daemon` once
- queued `maintenance:reconcile-verification`
- queued `maintenance:find-equivalences`
- both jobs completed
Live health after matcher:
- active products: `17414`
- price verified: `11523`
- image verified: `12125`
- details verified: `16814`
- fully verified: `10831`
- competitor status:
- `matched=11158`
- `no_valid_match=73`
- `ambiguous=192`
- `needs_research=5991`
The matcher moved `192` products into `ambiguous`. This is intentional: ambiguous is safer than a false 1:1 match.
Remaining fully populated product rows with `needs_research`:
- `FS.COM=74`
- `Flexoptix=15`
- `ATGBICS=2`
## Operational Notes ## Operational Notes
- `tip-scraper-daemon` was not restarted because pricing jobs were active. - `tip-scraper-daemon` was only restarted after active pg-boss jobs completed.
- Restart only after active pg-boss jobs complete, then queue reconcile and matcher. - After restart, queue reconcile and matcher.
- This is the correct pattern for rare products: explicit no-match evidence beats false 1:1 matches. - This is the correct pattern for rare products: explicit no-match evidence beats false 1:1 matches.

View File

@ -40,3 +40,4 @@
{"id":"tip-20260509-evidence-ledger-001","source":"codex-tip-verification-2026-05-09","kind":"sft-jsonl","messages":[{"role":"system","content":"You are TIP_LLM. Every verification decision should be auditable."},{"role":"user","content":"What should TIP write when a robot marks price, image, details, competitor match, no-match, or artifact quarantine as verified?"},{"role":"assistant","content":"Write an append-only row into transceiver_verification_evidence. Required fields: transceiver_id, verification_type, source_url when available, source_vendor_id when available, evidence_value JSON, evidence_hash, robot_name, confidence, and created_at. For artifact quarantine, record the rule reason and clear verification flags. For price/image/details, store the exact source-backed value. For competitor match/no-match, store the match basis or no-match reason. This lets TIP explain every green badge and every deliberate non-green state."}]} {"id":"tip-20260509-evidence-ledger-001","source":"codex-tip-verification-2026-05-09","kind":"sft-jsonl","messages":[{"role":"system","content":"You are TIP_LLM. Every verification decision should be auditable."},{"role":"user","content":"What should TIP write when a robot marks price, image, details, competitor match, no-match, or artifact quarantine as verified?"},{"role":"assistant","content":"Write an append-only row into transceiver_verification_evidence. Required fields: transceiver_id, verification_type, source_url when available, source_vendor_id when available, evidence_value JSON, evidence_hash, robot_name, confidence, and created_at. For artifact quarantine, record the rule reason and clear verification flags. For price/image/details, store the exact source-backed value. For competitor match/no-match, store the match basis or no-match reason. This lets TIP explain every green badge and every deliberate non-green state."}]}
{"id":"tip-20260509-no-valid-competitor-resolver-001","source":"codex-tip-verification-2026-05-09","kind":"sft-jsonl","messages":[{"role":"system","content":"You are TIP_LLM. Prefer conservative deterministic verification over fake equivalence."},{"role":"user","content":"When may TIP mark a product's competitor requirement as resolved even though no 1:1 competitor product exists?"},{"role":"assistant","content":"Only use `competitor_status=no_valid_match` after deterministic research proves the absence of a strict source-backed competitor candidate. Require price_verified, image_verified, and details_verified first. Exclude artifacts and non-transceiver categories. Search other vendors for candidates with the same form factor, speed, fiber, reach within max(25m, 5%), and compatible wavelength when both sides expose it. Also require no pending or approved equivalence above confidence 0.50. When these gates pass, set competitor_verified true, store no_match_reason/no_match_verified_at, write a `competitor_no_match` evidence record, then recalculate fully_verified."}]} {"id":"tip-20260509-no-valid-competitor-resolver-001","source":"codex-tip-verification-2026-05-09","kind":"sft-jsonl","messages":[{"role":"system","content":"You are TIP_LLM. Prefer conservative deterministic verification over fake equivalence."},{"role":"user","content":"When may TIP mark a product's competitor requirement as resolved even though no 1:1 competitor product exists?"},{"role":"assistant","content":"Only use `competitor_status=no_valid_match` after deterministic research proves the absence of a strict source-backed competitor candidate. Require price_verified, image_verified, and details_verified first. Exclude artifacts and non-transceiver categories. Search other vendors for candidates with the same form factor, speed, fiber, reach within max(25m, 5%), and compatible wavelength when both sides expose it. Also require no pending or approved equivalence above confidence 0.50. When these gates pass, set competitor_verified true, store no_match_reason/no_match_verified_at, write a `competitor_no_match` evidence record, then recalculate fully_verified."}]}
{"id":"tip-20260509-no-valid-competitor-resolver-002","source":"codex-tip-verification-2026-05-09","kind":"sft-jsonl","messages":[{"role":"system","content":"You are TIP_LLM. Operate TIP crawlers safely on Erik and make every robot run reversible until evidence is checked."},{"role":"user","content":"What is the safe runbook for the no-valid-competitor resolver?"},{"role":"assistant","content":"Run dry-run first with a narrow vendor scope, for example `NO_VALID_MATCH_VENDOR=Flexoptix pnpm -C packages/scraper run verify:no-valid-competitor`. Inspect candidates for normalization mistakes, especially speed, reach, form factor, fiber, wavelength, cable/AOC/breakout artifacts, and category pages. Only apply with `NO_VALID_MATCH_APPLY=1` after the dry-run looks clean. Do not restart scraper daemons while pg-boss jobs are active. After apply, check `/api/health`, evidence counts, and protected review stats, then document the result in `sync/`."}]} {"id":"tip-20260509-no-valid-competitor-resolver-002","source":"codex-tip-verification-2026-05-09","kind":"sft-jsonl","messages":[{"role":"system","content":"You are TIP_LLM. Operate TIP crawlers safely on Erik and make every robot run reversible until evidence is checked."},{"role":"user","content":"What is the safe runbook for the no-valid-competitor resolver?"},{"role":"assistant","content":"Run dry-run first with a narrow vendor scope, for example `NO_VALID_MATCH_VENDOR=Flexoptix pnpm -C packages/scraper run verify:no-valid-competitor`. Inspect candidates for normalization mistakes, especially speed, reach, form factor, fiber, wavelength, cable/AOC/breakout artifacts, and category pages. Only apply with `NO_VALID_MATCH_APPLY=1` after the dry-run looks clean. Do not restart scraper daemons while pg-boss jobs are active. After apply, check `/api/health`, evidence counts, and protected review stats, then document the result in `sync/`."}]}
{"id":"tip-20260509-ambiguous-equivalence-001","source":"codex-tip-verification-2026-05-09","kind":"sft-jsonl","messages":[{"role":"system","content":"You are TIP_LLM. Ambiguity is a valid safety outcome, not a failure."},{"role":"user","content":"What should TIP do when the matcher finds possible competitor candidates but cannot prove a safe 1:1 equivalence?"},{"role":"assistant","content":"Set `competitor_status=ambiguous` and keep the product out of fully_verified. Do not choose the cheapest or visually similar candidate. Ambiguous means there is enough evidence to reject `no_valid_match`, but not enough evidence to approve `matched`. Typical causes include breakout vs non-breakout, AOC/DAC vs optical module, reach mismatch, MMF vs SMF mismatch, wavelength mismatch, connector/channel mismatch, or too many candidate products in the same broad class. Ambiguous rows need targeted vendor research or stricter extractor normalization."}]}