From 7da78a999d80fcb4b841b922c103d6a914e1b740 Mon Sep 17 00:00:00 2001 From: Rene Fichtmueller Date: Sat, 9 May 2026 14:14:48 +0200 Subject: [PATCH] sync: record immediate equivalence revalidation --- sync/CURRENT.md | 61 +++++++++- ...alence-revalidation-and-crawlee-binding.md | 106 ++++++++++++++++++ 2 files changed, 166 insertions(+), 1 deletion(-) create mode 100644 sync/history/2026-05-09-tip-immediate-equivalence-revalidation-and-crawlee-binding.md diff --git a/sync/CURRENT.md b/sync/CURRENT.md index 333e78b..c44321d 100644 --- a/sync/CURRENT.md +++ b/sync/CURRENT.md @@ -1,9 +1,68 @@ # Current TIP Sync State -Updated: 2026-05-09 11:59 UTC +Updated: 2026-05-09 12:16 UTC ## Newest Work +- Immediate full TIP equivalence revalidation on 2026-05-09: + - operator requested all open TIP validation to be completed immediately and all product matches checked for true 1:1 equivalence + - live preflight: + - equivalence queue: `pending=0`, `approved=1986`, `auto_approved=32080`, `rejected=148367`, `due_research=0` + - active matches scheduled for future 30-day recheck: `34066` + - strict DB preflight over all active matches found: + - no recent-price gaps: `0` + - hard technical mismatches: `0` + - missing critical 1:1 evidence: `0` + - hard criteria checked: form factor, speed, fiber type, reach ratio, primary wavelength and recent competitor price evidence + - action: + - marked all `34066` active `approved/auto_approved` equivalences as due immediately + - queued `18` existing PgBoss `maintenance:re-research-equivalences` jobs + - used the existing DB-only TIP re-research worker; no browser crawler wave and no external AI + - result: + - all `18/18` jobs completed + - `due_research=0` + - `active_researched_today=34066` + - no automated-research rejections in this immediate pass + - final equivalence queue: `pending=0`, `approved=1986`, `auto_approved=32080`, `rejected=148367` + - transceiver verification counters after the pass: + - `competitor_verified=11470` + - `price_verified=11557` + - `image_verified=10711` + - `details_verified=9929` + - `fully_verified=9135` + - total transceivers `17647` + - TIP health after run: + - status `healthy` + - load status `ok` + - memory used `13%` + - API/DB connected + - truth: + - the manual equivalence queue is empty and all active matches have just been rechecked by deterministic 1:1 evidence rules + - this does not mean every product row in TIP is complete; largest product verification gaps remain vendor-specific crawler/enrichment work, especially ATGBICS, NADDOD, GAO Tek, Juniper/Cisco, Ascent/Eoptolink and other vendor/catalog rows + +- Crawlee integration/binding on 2026-05-09: + - operator asked to install, use and bind Crawlee/Crawlee-Python after priority evaluation + - pushed TIP commits: + - `60531b6 feat: add crawlee python worker integration` + - `49f0871 chore: ignore crawlee python build artifacts` + - TypeScript TIP core remains the production crawler core using `crawlee` and Playwright + - added scraper scripts: + - `pnpm -C packages/scraper scrape:fs:db-detail` + - `pnpm -C packages/scraper scrape:fs:url-discovery` + - added optional isolated Python worker: + - `packages/crawlee-python/` + - `scripts/setup-crawlee-python-worker.sh` + - `docs/TIP_CRAWLEE_RUNTIME.md` + - Python worker policy: + - Crawlee-Python is for Pi/Proxmox/residential side workers and extraction experiments + - writes JSONL evidence only + - no direct DB writes + - no replacement for the TypeScript TIP scraper core + - smoke test: + - installed `crawlee==1.6.3` into `/tmp/tip-crawlee-python-venv` + - ran `tip_crawlee_worker` against `https://crawlee.dev` + - JSONL evidence output succeeded + - Priority Crawlee evaluation + FS.com URL discovery on 2026-05-09: - operator asked whether these repos help: - `https://github.com/apify/crawlee` diff --git a/sync/history/2026-05-09-tip-immediate-equivalence-revalidation-and-crawlee-binding.md b/sync/history/2026-05-09-tip-immediate-equivalence-revalidation-and-crawlee-binding.md new file mode 100644 index 0000000..7d053d9 --- /dev/null +++ b/sync/history/2026-05-09-tip-immediate-equivalence-revalidation-and-crawlee-binding.md @@ -0,0 +1,106 @@ +# TIP Immediate Equivalence Revalidation + Crawlee Binding + +Date: 2026-05-09 +Actor: Codex + +## Operator Request + +The operator asked to immediately verify and validate all open TIP work and to check whether products really match 1:1. The operator also asked to install, use and bind Crawlee/Crawlee-Python, with all crawler/scraper/robot learning recorded for TIPLLM. + +## Crawlee Binding + +Pushed to Gitea: + +- `60531b6 feat: add crawlee python worker integration` +- `49f0871 chore: ignore crawlee python build artifacts` + +Added: + +- `packages/crawlee-python/` +- `scripts/setup-crawlee-python-worker.sh` +- `docs/TIP_CRAWLEE_RUNTIME.md` +- scraper scripts: + - `pnpm -C packages/scraper scrape:fs:db-detail` + - `pnpm -C packages/scraper scrape:fs:url-discovery` + +Policy: + +- TypeScript Crawlee/Playwright remains the TIP production crawler core. +- Crawlee-Python is optional for Pi/Proxmox/residential workers and writes JSONL evidence only. +- Crawlee-Python does not write directly to TIP DB. +- No external AI was used. + +Smoke test: + +- Installed `crawlee==1.6.3` in `/tmp/tip-crawlee-python-venv`. +- Ran `tip_crawlee_worker` against `https://crawlee.dev`. +- JSONL evidence output succeeded. + +## Equivalence Revalidation + +Preflight: + +- `pending=0` +- `approved=1986` +- `auto_approved=32080` +- `rejected=148367` +- `due_research=0` +- active approved/auto-approved matches: `34066` + +Strict DB preflight over all active matches: + +- no recent-price gaps: `0` +- hard technical mismatches: `0` +- missing critical 1:1 evidence: `0` + +Hard criteria checked: + +- recent competitor price evidence +- form factor +- speed +- fiber type +- reach ratio +- primary wavelength + +Action: + +- Marked all `34066` active `approved/auto_approved` equivalences as immediately due. +- Queued `18` PgBoss jobs for `maintenance:re-research-equivalences`. +- Used the existing DB-only TIP research worker. +- No browser crawler wave was started. + +Result: + +- `18/18` jobs completed. +- `due_research=0` +- `active_researched_today=34066` +- no automated-research rejections in this immediate pass +- final queue: + - `pending=0` + - `approved=1986` + - `auto_approved=32080` + - `rejected=148367` + +Final product verification counters: + +- `competitor_verified=11470` +- `price_verified=11557` +- `image_verified=10711` +- `details_verified=9929` +- `fully_verified=9135` +- total transceivers: `17647` + +TIP health after run: + +- status: `healthy` +- load status: `ok` +- memory used: `13%` +- API/DB connected + +## Truth For Next Agent + +The manual equivalence queue is empty and all active equivalence matches have just been rechecked by deterministic 1:1 rules. + +This does not mean every product row in TIP is fully complete. Product verification gaps remain vendor-specific crawler/enrichment work. Largest remaining gaps are outside the already-focused Flexoptix and FS.com passes, especially ATGBICS, NADDOD, GAO Tek, Juniper/Cisco, Ascent Optics, Eoptolink and other vendor/catalog rows. + +Do not start a broad browser crawler wave on Erik. Continue vendor-targeted, low-concurrency jobs or move heavier discovery to Pi/Proxmox workers.