sync: record immediate equivalence revalidation

This commit is contained in:
Rene Fichtmueller 2026-05-09 14:14:48 +02:00
parent 49f0871720
commit 7da78a999d
2 changed files with 166 additions and 1 deletions

View File

@ -1,9 +1,68 @@
# Current TIP Sync State
Updated: 2026-05-09 11:59 UTC
Updated: 2026-05-09 12:16 UTC
## Newest Work
- Immediate full TIP equivalence revalidation on 2026-05-09:
- operator requested all open TIP validation to be completed immediately and all product matches checked for true 1:1 equivalence
- live preflight:
- equivalence queue: `pending=0`, `approved=1986`, `auto_approved=32080`, `rejected=148367`, `due_research=0`
- active matches scheduled for future 30-day recheck: `34066`
- strict DB preflight over all active matches found:
- no recent-price gaps: `0`
- hard technical mismatches: `0`
- missing critical 1:1 evidence: `0`
- hard criteria checked: form factor, speed, fiber type, reach ratio, primary wavelength and recent competitor price evidence
- action:
- marked all `34066` active `approved/auto_approved` equivalences as due immediately
- queued `18` existing PgBoss `maintenance:re-research-equivalences` jobs
- used the existing DB-only TIP re-research worker; no browser crawler wave and no external AI
- result:
- all `18/18` jobs completed
- `due_research=0`
- `active_researched_today=34066`
- no automated-research rejections in this immediate pass
- final equivalence queue: `pending=0`, `approved=1986`, `auto_approved=32080`, `rejected=148367`
- transceiver verification counters after the pass:
- `competitor_verified=11470`
- `price_verified=11557`
- `image_verified=10711`
- `details_verified=9929`
- `fully_verified=9135`
- total transceivers `17647`
- TIP health after run:
- status `healthy`
- load status `ok`
- memory used `13%`
- API/DB connected
- truth:
- the manual equivalence queue is empty and all active matches have just been rechecked by deterministic 1:1 evidence rules
- this does not mean every product row in TIP is complete; largest product verification gaps remain vendor-specific crawler/enrichment work, especially ATGBICS, NADDOD, GAO Tek, Juniper/Cisco, Ascent/Eoptolink and other vendor/catalog rows
- Crawlee integration/binding on 2026-05-09:
- operator asked to install, use and bind Crawlee/Crawlee-Python after priority evaluation
- pushed TIP commits:
- `60531b6 feat: add crawlee python worker integration`
- `49f0871 chore: ignore crawlee python build artifacts`
- TypeScript TIP core remains the production crawler core using `crawlee` and Playwright
- added scraper scripts:
- `pnpm -C packages/scraper scrape:fs:db-detail`
- `pnpm -C packages/scraper scrape:fs:url-discovery`
- added optional isolated Python worker:
- `packages/crawlee-python/`
- `scripts/setup-crawlee-python-worker.sh`
- `docs/TIP_CRAWLEE_RUNTIME.md`
- Python worker policy:
- Crawlee-Python is for Pi/Proxmox/residential side workers and extraction experiments
- writes JSONL evidence only
- no direct DB writes
- no replacement for the TypeScript TIP scraper core
- smoke test:
- installed `crawlee==1.6.3` into `/tmp/tip-crawlee-python-venv`
- ran `tip_crawlee_worker` against `https://crawlee.dev`
- JSONL evidence output succeeded
- Priority Crawlee evaluation + FS.com URL discovery on 2026-05-09:
- operator asked whether these repos help:
- `https://github.com/apify/crawlee`

View File

@ -0,0 +1,106 @@
# TIP Immediate Equivalence Revalidation + Crawlee Binding
Date: 2026-05-09
Actor: Codex
## Operator Request
The operator asked to immediately verify and validate all open TIP work and to check whether products really match 1:1. The operator also asked to install, use and bind Crawlee/Crawlee-Python, with all crawler/scraper/robot learning recorded for TIPLLM.
## Crawlee Binding
Pushed to Gitea:
- `60531b6 feat: add crawlee python worker integration`
- `49f0871 chore: ignore crawlee python build artifacts`
Added:
- `packages/crawlee-python/`
- `scripts/setup-crawlee-python-worker.sh`
- `docs/TIP_CRAWLEE_RUNTIME.md`
- scraper scripts:
- `pnpm -C packages/scraper scrape:fs:db-detail`
- `pnpm -C packages/scraper scrape:fs:url-discovery`
Policy:
- TypeScript Crawlee/Playwright remains the TIP production crawler core.
- Crawlee-Python is optional for Pi/Proxmox/residential workers and writes JSONL evidence only.
- Crawlee-Python does not write directly to TIP DB.
- No external AI was used.
Smoke test:
- Installed `crawlee==1.6.3` in `/tmp/tip-crawlee-python-venv`.
- Ran `tip_crawlee_worker` against `https://crawlee.dev`.
- JSONL evidence output succeeded.
## Equivalence Revalidation
Preflight:
- `pending=0`
- `approved=1986`
- `auto_approved=32080`
- `rejected=148367`
- `due_research=0`
- active approved/auto-approved matches: `34066`
Strict DB preflight over all active matches:
- no recent-price gaps: `0`
- hard technical mismatches: `0`
- missing critical 1:1 evidence: `0`
Hard criteria checked:
- recent competitor price evidence
- form factor
- speed
- fiber type
- reach ratio
- primary wavelength
Action:
- Marked all `34066` active `approved/auto_approved` equivalences as immediately due.
- Queued `18` PgBoss jobs for `maintenance:re-research-equivalences`.
- Used the existing DB-only TIP research worker.
- No browser crawler wave was started.
Result:
- `18/18` jobs completed.
- `due_research=0`
- `active_researched_today=34066`
- no automated-research rejections in this immediate pass
- final queue:
- `pending=0`
- `approved=1986`
- `auto_approved=32080`
- `rejected=148367`
Final product verification counters:
- `competitor_verified=11470`
- `price_verified=11557`
- `image_verified=10711`
- `details_verified=9929`
- `fully_verified=9135`
- total transceivers: `17647`
TIP health after run:
- status: `healthy`
- load status: `ok`
- memory used: `13%`
- API/DB connected
## Truth For Next Agent
The manual equivalence queue is empty and all active equivalence matches have just been rechecked by deterministic 1:1 rules.
This does not mean every product row in TIP is fully complete. Product verification gaps remain vendor-specific crawler/enrichment work. Largest remaining gaps are outside the already-focused Flexoptix and FS.com passes, especially ATGBICS, NADDOD, GAO Tek, Juniper/Cisco, Ascent Optics, Eoptolink and other vendor/catalog rows.
Do not start a broad browser crawler wave on Erik. Continue vendor-targeted, low-concurrency jobs or move heavier discovery to Pi/Proxmox workers.