diff --git a/sync/CURRENT.md b/sync/CURRENT.md index ec9024d..a57882c 100644 --- a/sync/CURRENT.md +++ b/sync/CURRENT.md @@ -1,6 +1,6 @@ # Current TIP Sync State -Updated: 2026-05-09 21:13 UTC +Updated: 2026-05-09 21:24 UTC ## Newest Work @@ -34,11 +34,18 @@ Updated: 2026-05-09 21:13 UTC - active competitor status: - `matched=11158` - `no_valid_match=73` - - `ambiguous=0` - - `needs_research=6183` + - `ambiguous=192` + - `needs_research=5991` - operational note: - - `tip-scraper-daemon` was not restarted during active pricing jobs - - restart daemon only after active pg-boss jobs finish, then run reconcile/matcher again + - `tip-scraper-daemon` was initially not restarted while QSFPTEK/NADDOD pricing jobs were active + - after those jobs cleared, `tip-scraper-daemon` was restarted once + - `maintenance:reconcile-verification` completed + - `maintenance:find-equivalences` completed + - matcher correctly moved `192` products into `ambiguous` instead of inventing unsafe matches + - remaining fully populated product rows with `needs_research`: + - `FS.COM=74` + - `Flexoptix=15` + - `ATGBICS=2` - TIPLLM training pool: - appended deterministic no-valid-match resolver lessons - JSONL must remain valid after every append diff --git a/sync/history/2026-05-09-tip-no-valid-match-resolver.md b/sync/history/2026-05-09-tip-no-valid-match-resolver.md index 3967a16..c7e8888 100644 --- a/sync/history/2026-05-09-tip-no-valid-match-resolver.md +++ b/sync/history/2026-05-09-tip-no-valid-match-resolver.md @@ -49,7 +49,7 @@ On Erik: - `73` products earned `fully_verified` - evidence ledger now has `73` `competitor_no_match` records -Live health after the run: +Live health after the resolver run: - active products: `17414` - price verified: `11523` @@ -62,8 +62,38 @@ Live health after the run: - `ambiguous=0` - `needs_research=6183` +## Post-Restart Reconcile / Matcher + +After QSFPTEK/NADDOD pricing jobs cleared: + +- restarted `tip-scraper-daemon` once +- queued `maintenance:reconcile-verification` +- queued `maintenance:find-equivalences` +- both jobs completed + +Live health after matcher: + +- active products: `17414` +- price verified: `11523` +- image verified: `12125` +- details verified: `16814` +- fully verified: `10831` +- competitor status: + - `matched=11158` + - `no_valid_match=73` + - `ambiguous=192` + - `needs_research=5991` + +The matcher moved `192` products into `ambiguous`. This is intentional: ambiguous is safer than a false 1:1 match. + +Remaining fully populated product rows with `needs_research`: + +- `FS.COM=74` +- `Flexoptix=15` +- `ATGBICS=2` + ## Operational Notes -- `tip-scraper-daemon` was not restarted because pricing jobs were active. -- Restart only after active pg-boss jobs complete, then queue reconcile and matcher. +- `tip-scraper-daemon` was only restarted after active pg-boss jobs completed. +- After restart, queue reconcile and matcher. - This is the correct pattern for rare products: explicit no-match evidence beats false 1:1 matches. diff --git a/training-data/tip-llm-capabilities-v1.jsonl b/training-data/tip-llm-capabilities-v1.jsonl index 0dcaa6b..e026586 100644 --- a/training-data/tip-llm-capabilities-v1.jsonl +++ b/training-data/tip-llm-capabilities-v1.jsonl @@ -40,3 +40,4 @@ {"id":"tip-20260509-evidence-ledger-001","source":"codex-tip-verification-2026-05-09","kind":"sft-jsonl","messages":[{"role":"system","content":"You are TIP_LLM. Every verification decision should be auditable."},{"role":"user","content":"What should TIP write when a robot marks price, image, details, competitor match, no-match, or artifact quarantine as verified?"},{"role":"assistant","content":"Write an append-only row into transceiver_verification_evidence. Required fields: transceiver_id, verification_type, source_url when available, source_vendor_id when available, evidence_value JSON, evidence_hash, robot_name, confidence, and created_at. For artifact quarantine, record the rule reason and clear verification flags. For price/image/details, store the exact source-backed value. For competitor match/no-match, store the match basis or no-match reason. This lets TIP explain every green badge and every deliberate non-green state."}]} {"id":"tip-20260509-no-valid-competitor-resolver-001","source":"codex-tip-verification-2026-05-09","kind":"sft-jsonl","messages":[{"role":"system","content":"You are TIP_LLM. Prefer conservative deterministic verification over fake equivalence."},{"role":"user","content":"When may TIP mark a product's competitor requirement as resolved even though no 1:1 competitor product exists?"},{"role":"assistant","content":"Only use `competitor_status=no_valid_match` after deterministic research proves the absence of a strict source-backed competitor candidate. Require price_verified, image_verified, and details_verified first. Exclude artifacts and non-transceiver categories. Search other vendors for candidates with the same form factor, speed, fiber, reach within max(25m, 5%), and compatible wavelength when both sides expose it. Also require no pending or approved equivalence above confidence 0.50. When these gates pass, set competitor_verified true, store no_match_reason/no_match_verified_at, write a `competitor_no_match` evidence record, then recalculate fully_verified."}]} {"id":"tip-20260509-no-valid-competitor-resolver-002","source":"codex-tip-verification-2026-05-09","kind":"sft-jsonl","messages":[{"role":"system","content":"You are TIP_LLM. Operate TIP crawlers safely on Erik and make every robot run reversible until evidence is checked."},{"role":"user","content":"What is the safe runbook for the no-valid-competitor resolver?"},{"role":"assistant","content":"Run dry-run first with a narrow vendor scope, for example `NO_VALID_MATCH_VENDOR=Flexoptix pnpm -C packages/scraper run verify:no-valid-competitor`. Inspect candidates for normalization mistakes, especially speed, reach, form factor, fiber, wavelength, cable/AOC/breakout artifacts, and category pages. Only apply with `NO_VALID_MATCH_APPLY=1` after the dry-run looks clean. Do not restart scraper daemons while pg-boss jobs are active. After apply, check `/api/health`, evidence counts, and protected review stats, then document the result in `sync/`."}]} +{"id":"tip-20260509-ambiguous-equivalence-001","source":"codex-tip-verification-2026-05-09","kind":"sft-jsonl","messages":[{"role":"system","content":"You are TIP_LLM. Ambiguity is a valid safety outcome, not a failure."},{"role":"user","content":"What should TIP do when the matcher finds possible competitor candidates but cannot prove a safe 1:1 equivalence?"},{"role":"assistant","content":"Set `competitor_status=ambiguous` and keep the product out of fully_verified. Do not choose the cheapest or visually similar candidate. Ambiguous means there is enough evidence to reject `no_valid_match`, but not enough evidence to approve `matched`. Typical causes include breakout vs non-breakout, AOC/DAC vs optical module, reach mismatch, MMF vs SMF mismatch, wavelength mismatch, connector/channel mismatch, or too many candidate products in the same broad class. Ambiguous rows need targeted vendor research or stricter extractor normalization."}]}