transceiver-db/sync/history/2026-05-09-tip-immediate-equivalence-revalidation-and-crawlee-binding.md
2026-05-09 14:14:48 +02:00

107 lines
3.0 KiB
Markdown

# TIP Immediate Equivalence Revalidation + Crawlee Binding
Date: 2026-05-09
Actor: Codex
## Operator Request
The operator asked to immediately verify and validate all open TIP work and to check whether products really match 1:1. The operator also asked to install, use and bind Crawlee/Crawlee-Python, with all crawler/scraper/robot learning recorded for TIPLLM.
## Crawlee Binding
Pushed to Gitea:
- `60531b6 feat: add crawlee python worker integration`
- `49f0871 chore: ignore crawlee python build artifacts`
Added:
- `packages/crawlee-python/`
- `scripts/setup-crawlee-python-worker.sh`
- `docs/TIP_CRAWLEE_RUNTIME.md`
- scraper scripts:
- `pnpm -C packages/scraper scrape:fs:db-detail`
- `pnpm -C packages/scraper scrape:fs:url-discovery`
Policy:
- TypeScript Crawlee/Playwright remains the TIP production crawler core.
- Crawlee-Python is optional for Pi/Proxmox/residential workers and writes JSONL evidence only.
- Crawlee-Python does not write directly to TIP DB.
- No external AI was used.
Smoke test:
- Installed `crawlee==1.6.3` in `/tmp/tip-crawlee-python-venv`.
- Ran `tip_crawlee_worker` against `https://crawlee.dev`.
- JSONL evidence output succeeded.
## Equivalence Revalidation
Preflight:
- `pending=0`
- `approved=1986`
- `auto_approved=32080`
- `rejected=148367`
- `due_research=0`
- active approved/auto-approved matches: `34066`
Strict DB preflight over all active matches:
- no recent-price gaps: `0`
- hard technical mismatches: `0`
- missing critical 1:1 evidence: `0`
Hard criteria checked:
- recent competitor price evidence
- form factor
- speed
- fiber type
- reach ratio
- primary wavelength
Action:
- Marked all `34066` active `approved/auto_approved` equivalences as immediately due.
- Queued `18` PgBoss jobs for `maintenance:re-research-equivalences`.
- Used the existing DB-only TIP research worker.
- No browser crawler wave was started.
Result:
- `18/18` jobs completed.
- `due_research=0`
- `active_researched_today=34066`
- no automated-research rejections in this immediate pass
- final queue:
- `pending=0`
- `approved=1986`
- `auto_approved=32080`
- `rejected=148367`
Final product verification counters:
- `competitor_verified=11470`
- `price_verified=11557`
- `image_verified=10711`
- `details_verified=9929`
- `fully_verified=9135`
- total transceivers: `17647`
TIP health after run:
- status: `healthy`
- load status: `ok`
- memory used: `13%`
- API/DB connected
## Truth For Next Agent
The manual equivalence queue is empty and all active equivalence matches have just been rechecked by deterministic 1:1 rules.
This does not mean every product row in TIP is fully complete. Product verification gaps remain vendor-specific crawler/enrichment work. Largest remaining gaps are outside the already-focused Flexoptix and FS.com passes, especially ATGBICS, NADDOD, GAO Tek, Juniper/Cisco, Ascent Optics, Eoptolink and other vendor/catalog rows.
Do not start a broad browser crawler wave on Erik. Continue vendor-targeted, low-concurrency jobs or move heavier discovery to Pi/Proxmox workers.