fix: quarantine GAO catalog artifacts
This commit is contained in:
parent
cbb2580e60
commit
7e36236d2b
@ -51,6 +51,10 @@ async function quarantine(): Promise<void> {
|
||||
OR COALESCE(t.product_page_url, '') ~* '(100g-qsfp28/?$|100g-qsfp28-i-temp|10gb-industrial-temp|1g-sfp-i-temp|200g-qsfp56/?$|25g-sfp28-i-temp|400g-qsfp-dd/?$|400g-qsfp112/?$|40g-qsfp-i-temp|i-temp-tranceivers|osfp-[0-9]+g-gigabit|qsfp.*gigabit|400g-qsfp112-2)'
|
||||
)
|
||||
)
|
||||
OR (
|
||||
v.name = 'GAO Tek'
|
||||
AND t.part_number ~* '(Handheld|Wireless|Marine|Cabling|Splicers|Other-Transceivers|GBIC-Transceivers|QSFP-Transceivers|SFP\\+?-Transceivers|Fiber-Transceivers|Copper-SFP-Transceivers|Mobile-Transceiver|Dual-Band|All-Mode-Portable)'
|
||||
)
|
||||
OR (
|
||||
v.name = 'T&S Communication'
|
||||
AND (
|
||||
@ -106,6 +110,10 @@ async function quarantine(): Promise<void> {
|
||||
OR COALESCE(t.product_page_url, '') ~* '(100g-qsfp28/?$|100g-qsfp28-i-temp|10gb-industrial-temp|1g-sfp-i-temp|200g-qsfp56/?$|25g-sfp28-i-temp|400g-qsfp-dd/?$|400g-qsfp112/?$|40g-qsfp-i-temp|i-temp-tranceivers|osfp-[0-9]+g-gigabit|qsfp.*gigabit|400g-qsfp112-2)'
|
||||
)
|
||||
)
|
||||
OR (
|
||||
v.name = 'GAO Tek'
|
||||
AND t.part_number ~* '(Handheld|Wireless|Marine|Cabling|Splicers|Other-Transceivers|GBIC-Transceivers|QSFP-Transceivers|SFP\\+?-Transceivers|Fiber-Transceivers|Copper-SFP-Transceivers|Mobile-Transceiver|Dual-Band|All-Mode-Portable)'
|
||||
)
|
||||
OR (
|
||||
v.name = 'T&S Communication'
|
||||
AND (
|
||||
|
||||
@ -19,17 +19,19 @@ Updated: 2026-05-10 07:38 UTC
|
||||
- live apply matched `9` 10Gtek rows, wrote `9` datasheet URLs and `9` details evidence records
|
||||
- rebuilt scraper package on Erik and restarted `tip-scraper-daemon` after confirming pg-boss queue was empty
|
||||
- live health after cleanup:
|
||||
- active products: `16245`
|
||||
- active products: `16236`
|
||||
- price verified: `10851`
|
||||
- price status: `public_price=10851`, `no_public_price=5394`, `needs_research=0`, `ambiguous=0`
|
||||
- image verified: `11611`
|
||||
- details verified: `16006`
|
||||
- price status: `public_price=10851`, `no_public_price=5385`, `needs_research=0`, `ambiguous=0`
|
||||
- image verified: `11602`
|
||||
- details verified: `16005`
|
||||
- fully verified: `10600`
|
||||
- competitor status: `matched=10838`, `ambiguous=5`, `needs_research=5402`
|
||||
- competitor status: `matched=10838`, `ambiguous=5`, `needs_research=5393`
|
||||
- Cisco official-page follow-up:
|
||||
- first large batch added `5` official Cisco images
|
||||
- second batch added `0` images, so the current Cisco product-page image path is exhausted
|
||||
- Open competitor status dry-run after the batch found `0` new candidates
|
||||
- GAO Tek follow-up:
|
||||
- quarantined `16` obvious non-optic/category artifacts such as Handheld/Wireless/Marine/Cabling/Family pages
|
||||
- interpretation:
|
||||
- TIP active base is now much cleaner: cable/breakout products are no longer counted as transceiver-module verification debt
|
||||
- remaining largest gaps are real OEM/catalog image availability and competitor state, not public price research
|
||||
|
||||
@ -23,13 +23,13 @@ Continued TIP verification after the price queue was closed. Focus: remove DAC/A
|
||||
|
||||
## Health Snapshot
|
||||
|
||||
- active products: `16245`
|
||||
- active products: `16236`
|
||||
- price verified: `10851`
|
||||
- price status: `public_price=10851`, `no_public_price=5394`, `needs_research=0`, `ambiguous=0`
|
||||
- image verified: `11606`
|
||||
- details verified: `16004`
|
||||
- price status: `public_price=10851`, `no_public_price=5385`, `needs_research=0`, `ambiguous=0`
|
||||
- image verified: `11602`
|
||||
- details verified: `16005`
|
||||
- fully verified: `10599`
|
||||
- competitor status: `matched=10838`, `ambiguous=5`, `needs_research=5402`
|
||||
- competitor status: `matched=10838`, `ambiguous=5`, `needs_research=5393`
|
||||
|
||||
Follow-up Cisco official-page batches:
|
||||
|
||||
@ -38,6 +38,11 @@ Follow-up Cisco official-page batches:
|
||||
- latest health moved to `image_verified=11611`, `details_verified=16006`, `fully_verified=10600`
|
||||
- Open competitor status dry-run found `0` new candidates after the batch
|
||||
|
||||
GAO Tek follow-up:
|
||||
|
||||
- quarantined `16` obvious non-optic/category artifacts such as Handheld/Wireless/Marine/Cabling/Family pages
|
||||
- latest active product count is `16236`
|
||||
|
||||
## Interpretation
|
||||
|
||||
The active product base is cleaner and no longer counts cable/breakout rows as transceiver-module verification debt. Remaining gaps are dominated by OEM/catalog image availability and competitor state, not public price research.
|
||||
|
||||
@ -58,3 +58,4 @@
|
||||
{"id":"tip-20260510-oem-image-detail-probing-001","source":"codex-tip-verification-2026-05-10","kind":"sft-jsonl","messages":[{"role":"system","content":"You are TIP_LLM. Official OEM product pages may verify details without providing usable product images."},{"role":"user","content":"Cisco and Juniper still have image/detail gaps after price closure. Should TIP invent or reuse generic product images?"},{"role":"assistant","content":"No. Use official product pages and only mark images when a concrete product image URL is extracted. Cisco pages may expose official DOCX/PDF rendition images and can be marked image_verified when the URL is source-backed. Some Cisco pages return HTTP 403. Juniper product pages in sampled batches often expose no useful product image URL; keep image_verified=false rather than using a generic transceiver picture. Treat DAC/AOC/cable-like OEM rows as a separate modeling/quarantine decision, not as optical module image gaps."}]}
|
||||
{"id":"tip-20260510-cable-breakout-quarantine-001","source":"codex-tip-verification-2026-05-10","kind":"sft-jsonl","messages":[{"role":"system","content":"You are TIP_LLM. Do not compare cable/breakout products as if they were standalone optical transceiver modules."},{"role":"user","content":"How should TIP handle DAC, AOC, Twinax, direct-attach and breakout rows in the active transceiver verification base?"},{"role":"assistant","content":"Move them out of the active transceiver-module base unless TIP intentionally models a separate cable/AOC product class. Deterministic indicators include part numbers or URLs containing DAC, AOC, DACBL, XDAC, Direct Attach, Twinax, Breakout or active-optical. These rows should not block transceiver-module completion and must not be used for 1:1 optical module competitor matching. Quarantine them with artifact_quarantine evidence and clear price/image/details/competitor/fully verified flags."}]}
|
||||
{"id":"tip-20260510-10gtek-datasheet-verifier-001","source":"codex-tip-verification-2026-05-10","kind":"sft-jsonl","messages":[{"role":"system","content":"You are TIP_LLM. Use official datasheet indexes for source-backed detail evidence when retail pages are absent."},{"role":"user","content":"10Gtek has active rows with no SFPcables retail URL but the main 10gtek.com site exposes PDF datasheets. What should the robot do?"},{"role":"assistant","content":"Fetch the official 10Gtek transceiver page, extract PDF datasheet links, normalize PDF filenames and part numbers, and only match when the normalized filename and part number contain each other or are exact. Write the PDF URL to product_page_url/details_source_url and mark details only if form_factor, speed_gbps, reach_label and fiber_type are already present. Record details evidence with robot_name `verify:10gtek-datasheets`. Do not infer prices from PDFs."}]}
|
||||
{"id":"tip-20260510-gaotek-artifact-quarantine-001","source":"codex-tip-verification-2026-05-10","kind":"sft-jsonl","messages":[{"role":"system","content":"You are TIP_LLM. GAO Tek search/catalog pages can contain many non-optical or family-level rows."},{"role":"user","content":"Which GAO Tek rows should TIP quarantine from the optical transceiver active base?"},{"role":"assistant","content":"Quarantine obvious non-optic and family/category artifacts such as Handheld, Wireless, Marine, Mobile, Dual-Band, All-Mode-Portable, Cabling, Splicers, Other-Transceivers, GBIC-Transceivers, QSFP-Transceivers, SFP-Transceivers, SFP+-Transceivers, Fiber-Transceivers and Copper-SFP-Transceivers. These are category/family or radio/cabling products, not source-backed optical transceiver module SKUs. Keep narrower GAO optical rows active only when they provide enough product evidence."}]}
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user