crawl: add atgbics followup learning
This commit is contained in:
parent
b979358ff0
commit
79651febbe
@ -32,3 +32,4 @@
|
|||||||
{"event":"db_evidence_backfill","observed_at":"2026-05-09T16:05:00Z","actor":"codex-atgbics-deterministic-special-case-backfill","profile":"erik-safe-db-only","wave":"atgbics-special-case-closure","vendor":"ATGBICS","summary":"Backfilled 32 ATGBICS special-case rows using deterministic protocol/product-class rules. Promoted 32 additional rows to fully_verified.","input":{"precheck":{"atgbics_near_complete_missing_details":139,"safe_special_case_rows":32,"patterns":["loopback","10GBASE-T/RJ45","LRM","BX60/BXD-60/BXU-60","CWDM 10G 60km","CSR"]}},"decision":{"rules":["Loopback/test modules are non-optical test products with N/A reach/fiber/wavelength semantics.","10GBASE-T/RJ45 SFP+ uses 30m Copper with N/A wavelength.","LRM uses 220m MMF at 1310nm.","BX60 uses 60km SMF with directional BiDi wavelength evidence.","CWDM 10G 60 uses 60km SMF and source wavelength.","CSR uses 400m MMF at 850nm."],"runtime_policy":"DB-only update; no crawler wave; no external AI."},"outcome":{"updated":{"atgbics_detail_rows":32,"fully_verified_promoted":32},"postcheck":{"atgbics_near_complete_missing_details":107},"global_after":{"total":17647,"details_verified":12030,"fully_verified":10753},"tip_health":{"status":"healthy","load_status":"ok","memory_used_pct":12}},"truth_policy":"Remaining ATGBICS rows need detail-page extraction because the URL slug no longer carries enough reach evidence.","safety_notes":["No external AI was used.","No browser crawler was started.","Erik public health stayed healthy."]}
|
{"event":"db_evidence_backfill","observed_at":"2026-05-09T16:05:00Z","actor":"codex-atgbics-deterministic-special-case-backfill","profile":"erik-safe-db-only","wave":"atgbics-special-case-closure","vendor":"ATGBICS","summary":"Backfilled 32 ATGBICS special-case rows using deterministic protocol/product-class rules. Promoted 32 additional rows to fully_verified.","input":{"precheck":{"atgbics_near_complete_missing_details":139,"safe_special_case_rows":32,"patterns":["loopback","10GBASE-T/RJ45","LRM","BX60/BXD-60/BXU-60","CWDM 10G 60km","CSR"]}},"decision":{"rules":["Loopback/test modules are non-optical test products with N/A reach/fiber/wavelength semantics.","10GBASE-T/RJ45 SFP+ uses 30m Copper with N/A wavelength.","LRM uses 220m MMF at 1310nm.","BX60 uses 60km SMF with directional BiDi wavelength evidence.","CWDM 10G 60 uses 60km SMF and source wavelength.","CSR uses 400m MMF at 850nm."],"runtime_policy":"DB-only update; no crawler wave; no external AI."},"outcome":{"updated":{"atgbics_detail_rows":32,"fully_verified_promoted":32},"postcheck":{"atgbics_near_complete_missing_details":107},"global_after":{"total":17647,"details_verified":12030,"fully_verified":10753},"tip_health":{"status":"healthy","load_status":"ok","memory_used_pct":12}},"truth_policy":"Remaining ATGBICS rows need detail-page extraction because the URL slug no longer carries enough reach evidence.","safety_notes":["No external AI was used.","No browser crawler was started.","Erik public health stayed healthy."]}
|
||||||
{"event":"targeted_detail_verifier","observed_at":"2026-05-09T16:12:00Z","actor":"codex-atgbics-product-js-detail-verifier","profile":"erik-safe-lightweight-fetch","wave":"atgbics-product-js-closure","vendor":"ATGBICS","summary":"Added and ran a lightweight Shopify product.js verifier for ATGBICS near-complete rows. Closed the ATGBICS near-complete detail queue.","code":{"repo_path":"packages/scraper/src/scrapers/atgbics-detail-pages.ts","script":"scrape:atgbics:details","fetch_mode":"one product.js JSON endpoint per product page","browser_used":false},"input":{"precheck":{"atgbics_near_complete_missing_details":107},"source_evidence":["Shopify product.js title","Shopify product.js description","Shopify tags such as Max Data Rate, Max Distance, Cable Type, Wavelength, Interface, Product Category"]},"decision":{"rules":["Prefer structured Shopify tags but fall back to title/body when a tag is unparseable or says N/A.","Never accept distance without media evidence.","Loopback/test products use N/A reach/fiber/wavelength semantics.","Do not use external AI; parser rules only."],"bug_learned":"Max Distance_N/A tags can hide real reach in product title/description; parser must use parsed tag value, not tag presence, as the gate.","runtime_policy":"Lightweight source fetch; no browser crawler; paced requests."},"outcome":{"runs":[{"fetched":107,"updated":97,"skipped":10,"promoted":97},{"fetched":10,"updated":10,"skipped":0,"promoted":10}],"postcheck":{"atgbics_near_complete_missing_details":0},"global_after":{"total":17647,"details_verified":12137,"fully_verified":10860},"tip_health":{"status":"healthy","load_status":"ok","memory_used_pct":13}},"truth_policy":"ATGBICS product pages are accepted as source evidence only when detail fields can be parsed from tags/title/body; unparseable rows remain skipped until parser evidence is improved.","safety_notes":["No external AI was used.","No Playwright/browser crawler was started.","SSH refused intermittently; retries were paused to protect Erik."]}
|
{"event":"targeted_detail_verifier","observed_at":"2026-05-09T16:12:00Z","actor":"codex-atgbics-product-js-detail-verifier","profile":"erik-safe-lightweight-fetch","wave":"atgbics-product-js-closure","vendor":"ATGBICS","summary":"Added and ran a lightweight Shopify product.js verifier for ATGBICS near-complete rows. Closed the ATGBICS near-complete detail queue.","code":{"repo_path":"packages/scraper/src/scrapers/atgbics-detail-pages.ts","script":"scrape:atgbics:details","fetch_mode":"one product.js JSON endpoint per product page","browser_used":false},"input":{"precheck":{"atgbics_near_complete_missing_details":107},"source_evidence":["Shopify product.js title","Shopify product.js description","Shopify tags such as Max Data Rate, Max Distance, Cable Type, Wavelength, Interface, Product Category"]},"decision":{"rules":["Prefer structured Shopify tags but fall back to title/body when a tag is unparseable or says N/A.","Never accept distance without media evidence.","Loopback/test products use N/A reach/fiber/wavelength semantics.","Do not use external AI; parser rules only."],"bug_learned":"Max Distance_N/A tags can hide real reach in product title/description; parser must use parsed tag value, not tag presence, as the gate.","runtime_policy":"Lightweight source fetch; no browser crawler; paced requests."},"outcome":{"runs":[{"fetched":107,"updated":97,"skipped":10,"promoted":97},{"fetched":10,"updated":10,"skipped":0,"promoted":10}],"postcheck":{"atgbics_near_complete_missing_details":0},"global_after":{"total":17647,"details_verified":12137,"fully_verified":10860},"tip_health":{"status":"healthy","load_status":"ok","memory_used_pct":13}},"truth_policy":"ATGBICS product pages are accepted as source evidence only when detail fields can be parsed from tags/title/body; unparseable rows remain skipped until parser evidence is improved.","safety_notes":["No external AI was used.","No Playwright/browser crawler was started.","SSH refused intermittently; retries were paused to protect Erik."]}
|
||||||
{"event":"targeted_detail_verifier","observed_at":"2026-05-09T16:20:00Z","actor":"codex-shopfiber24-fibermall-detail-verifier","profile":"erik-safe-lightweight-fetch","wave":"shopfiber24-fibermall-near-complete-closure","vendor":"ShopFiber24+FiberMall","summary":"Added and ran a lightweight static detail verifier for the remaining ShopFiber24 and FiberMall near-complete rows. Closed the global near-complete queue.","code":{"repo_path":"packages/scraper/src/scrapers/shopfiber24-fibermall-detail-pages.ts","script":"scrape:vendors:details","fetch_mode":"one static HTML page per product URL","browser_used":false},"input":{"precheck":{"fibermall_near_complete_missing_details":24,"shopfiber24_near_complete_missing_details":92,"global_near_complete_missing_details":116},"source_evidence":["FiberMall Schema.org Product JSON-LD name/description/mpn","ShopFiber24 title/meta description/OG metadata","source URL and existing DB price/image/competitor evidence"]},"decision":{"rules":["FiberMall JSON-LD Product blocks are primary source evidence for name, MPN, media, speed, reach, wavelength, connector and image provenance.","ShopFiber24 static title/meta text is accepted when it contains deterministic protocol/reach/media evidence.","Variable AOC/DAC/category pages must be classified as product families with Variant reach, not a fake fixed meter value.","Switches, media converters, muxes and adapters must be classified as their product class, not as optical equivalents.","100G DWDM DCO rows without a normal reach should be classified as Coherent DWDM with line-system-dependent reach.","10GBASE-T copper SFP+ rows use RJ45 Copper 30m standard semantics when source identifies 10G Base-T copper but omits distance."],"runtime_policy":"Lightweight source fetch; no browser crawler; no external AI."},"outcome":{"runs":[{"fetched":116,"updated":112,"skipped":4,"promoted":112},{"fetched":4,"updated":4,"skipped":0,"promoted":4}],"postcheck":{"fibermall_near_complete_missing_details":0,"shopfiber24_near_complete_missing_details":0,"global_near_complete_missing_details":0},"global_after":{"total":17647,"details_verified":12253,"fully_verified":10976},"tip_health":{"status":"healthy","load_status":"ok","memory_used_pct":12}},"truth_policy":"Closing a queue does not mean inventing optics. Families/accessories/converters are verified as the product they actually are, and should not be used as 1:1 optical competitor equivalents unless a separate equivalence rule allows it.","safety_notes":["No external AI was used.","No Playwright/browser crawler was started.","Erik public health stayed healthy.","SSH refused intermittently; retries were delayed."]}
|
{"event":"targeted_detail_verifier","observed_at":"2026-05-09T16:20:00Z","actor":"codex-shopfiber24-fibermall-detail-verifier","profile":"erik-safe-lightweight-fetch","wave":"shopfiber24-fibermall-near-complete-closure","vendor":"ShopFiber24+FiberMall","summary":"Added and ran a lightweight static detail verifier for the remaining ShopFiber24 and FiberMall near-complete rows. Closed the global near-complete queue.","code":{"repo_path":"packages/scraper/src/scrapers/shopfiber24-fibermall-detail-pages.ts","script":"scrape:vendors:details","fetch_mode":"one static HTML page per product URL","browser_used":false},"input":{"precheck":{"fibermall_near_complete_missing_details":24,"shopfiber24_near_complete_missing_details":92,"global_near_complete_missing_details":116},"source_evidence":["FiberMall Schema.org Product JSON-LD name/description/mpn","ShopFiber24 title/meta description/OG metadata","source URL and existing DB price/image/competitor evidence"]},"decision":{"rules":["FiberMall JSON-LD Product blocks are primary source evidence for name, MPN, media, speed, reach, wavelength, connector and image provenance.","ShopFiber24 static title/meta text is accepted when it contains deterministic protocol/reach/media evidence.","Variable AOC/DAC/category pages must be classified as product families with Variant reach, not a fake fixed meter value.","Switches, media converters, muxes and adapters must be classified as their product class, not as optical equivalents.","100G DWDM DCO rows without a normal reach should be classified as Coherent DWDM with line-system-dependent reach.","10GBASE-T copper SFP+ rows use RJ45 Copper 30m standard semantics when source identifies 10G Base-T copper but omits distance."],"runtime_policy":"Lightweight source fetch; no browser crawler; no external AI."},"outcome":{"runs":[{"fetched":116,"updated":112,"skipped":4,"promoted":112},{"fetched":4,"updated":4,"skipped":0,"promoted":4}],"postcheck":{"fibermall_near_complete_missing_details":0,"shopfiber24_near_complete_missing_details":0,"global_near_complete_missing_details":0},"global_after":{"total":17647,"details_verified":12253,"fully_verified":10976},"tip_health":{"status":"healthy","load_status":"ok","memory_used_pct":12}},"truth_policy":"Closing a queue does not mean inventing optics. Families/accessories/converters are verified as the product they actually are, and should not be used as 1:1 optical competitor equivalents unless a separate equivalence rule allows it.","safety_notes":["No external AI was used.","No Playwright/browser crawler was started.","Erik public health stayed healthy.","SSH refused intermittently; retries were delayed."]}
|
||||||
|
{"event":"targeted_detail_verifier_followup","observed_at":"2026-05-09T16:25:00Z","actor":"codex-atgbics-product-js-detail-verifier","profile":"erik-safe-lightweight-fetch","wave":"atgbics-aoc-late-price-followup","vendor":"ATGBICS","summary":"A concurrent price-verification process exposed 23 new ATGBICS AOC near-complete rows after the first closure. Re-ran the ATGBICS product.js verifier and returned the global near-complete queue to zero.","input":{"precheck":{"new_atgbics_near_complete_missing_details":23,"reason":"price_verified increased from 11557 to 11582 after the first queue closure"}},"decision":{"rules":["Use the already-deployed ATGBICS product.js verifier for newly exposed rows.","AOC product URLs/titles with explicit m length, MMF and 850nm are deterministic source evidence.","Pause between SSH retries when Erik refuses connections."],"runtime_policy":"Lightweight source fetch; no browser crawler; no external AI."},"outcome":{"runs":[{"fetched":23,"updated":23,"skipped":0,"promoted":23}],"postcheck":{"global_near_complete_missing_details":0},"global_after":{"total":17647,"price_verified":11582,"details_verified":12276,"fully_verified":11001},"tip_health":{"status":"healthy","load_status":"ok","memory_used_pct":12}},"truth_policy":"Queue closure must be rechecked after concurrent crawler/price jobs because newly price-verified rows can enter the near-complete set after a previous zero state.","safety_notes":["No external AI was used.","No Playwright/browser crawler was started.","Erik public health stayed healthy."]}
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user