107 lines
3.0 KiB
Markdown
107 lines
3.0 KiB
Markdown
# TIP Immediate Equivalence Revalidation + Crawlee Binding
|
|
|
|
Date: 2026-05-09
|
|
Actor: Codex
|
|
|
|
## Operator Request
|
|
|
|
The operator asked to immediately verify and validate all open TIP work and to check whether products really match 1:1. The operator also asked to install, use and bind Crawlee/Crawlee-Python, with all crawler/scraper/robot learning recorded for TIPLLM.
|
|
|
|
## Crawlee Binding
|
|
|
|
Pushed to Gitea:
|
|
|
|
- `60531b6 feat: add crawlee python worker integration`
|
|
- `49f0871 chore: ignore crawlee python build artifacts`
|
|
|
|
Added:
|
|
|
|
- `packages/crawlee-python/`
|
|
- `scripts/setup-crawlee-python-worker.sh`
|
|
- `docs/TIP_CRAWLEE_RUNTIME.md`
|
|
- scraper scripts:
|
|
- `pnpm -C packages/scraper scrape:fs:db-detail`
|
|
- `pnpm -C packages/scraper scrape:fs:url-discovery`
|
|
|
|
Policy:
|
|
|
|
- TypeScript Crawlee/Playwright remains the TIP production crawler core.
|
|
- Crawlee-Python is optional for Pi/Proxmox/residential workers and writes JSONL evidence only.
|
|
- Crawlee-Python does not write directly to TIP DB.
|
|
- No external AI was used.
|
|
|
|
Smoke test:
|
|
|
|
- Installed `crawlee==1.6.3` in `/tmp/tip-crawlee-python-venv`.
|
|
- Ran `tip_crawlee_worker` against `https://crawlee.dev`.
|
|
- JSONL evidence output succeeded.
|
|
|
|
## Equivalence Revalidation
|
|
|
|
Preflight:
|
|
|
|
- `pending=0`
|
|
- `approved=1986`
|
|
- `auto_approved=32080`
|
|
- `rejected=148367`
|
|
- `due_research=0`
|
|
- active approved/auto-approved matches: `34066`
|
|
|
|
Strict DB preflight over all active matches:
|
|
|
|
- no recent-price gaps: `0`
|
|
- hard technical mismatches: `0`
|
|
- missing critical 1:1 evidence: `0`
|
|
|
|
Hard criteria checked:
|
|
|
|
- recent competitor price evidence
|
|
- form factor
|
|
- speed
|
|
- fiber type
|
|
- reach ratio
|
|
- primary wavelength
|
|
|
|
Action:
|
|
|
|
- Marked all `34066` active `approved/auto_approved` equivalences as immediately due.
|
|
- Queued `18` PgBoss jobs for `maintenance:re-research-equivalences`.
|
|
- Used the existing DB-only TIP research worker.
|
|
- No browser crawler wave was started.
|
|
|
|
Result:
|
|
|
|
- `18/18` jobs completed.
|
|
- `due_research=0`
|
|
- `active_researched_today=34066`
|
|
- no automated-research rejections in this immediate pass
|
|
- final queue:
|
|
- `pending=0`
|
|
- `approved=1986`
|
|
- `auto_approved=32080`
|
|
- `rejected=148367`
|
|
|
|
Final product verification counters:
|
|
|
|
- `competitor_verified=11470`
|
|
- `price_verified=11557`
|
|
- `image_verified=10711`
|
|
- `details_verified=9929`
|
|
- `fully_verified=9135`
|
|
- total transceivers: `17647`
|
|
|
|
TIP health after run:
|
|
|
|
- status: `healthy`
|
|
- load status: `ok`
|
|
- memory used: `13%`
|
|
- API/DB connected
|
|
|
|
## Truth For Next Agent
|
|
|
|
The manual equivalence queue is empty and all active equivalence matches have just been rechecked by deterministic 1:1 rules.
|
|
|
|
This does not mean every product row in TIP is fully complete. Product verification gaps remain vendor-specific crawler/enrichment work. Largest remaining gaps are outside the already-focused Flexoptix and FS.com passes, especially ATGBICS, NADDOD, GAO Tek, Juniper/Cisco, Ascent Optics, Eoptolink and other vendor/catalog rows.
|
|
|
|
Do not start a broad browser crawler wave on Erik. Continue vendor-targeted, low-concurrency jobs or move heavier discovery to Pi/Proxmox workers.
|