transceiver-db/sync/history/2026-05-09-tip-immediate-equivalence-revalidation-and-crawlee-binding.md
2026-05-09 14:14:48 +02:00

3.0 KiB

TIP Immediate Equivalence Revalidation + Crawlee Binding

Date: 2026-05-09 Actor: Codex

Operator Request

The operator asked to immediately verify and validate all open TIP work and to check whether products really match 1:1. The operator also asked to install, use and bind Crawlee/Crawlee-Python, with all crawler/scraper/robot learning recorded for TIPLLM.

Crawlee Binding

Pushed to Gitea:

  • 60531b6 feat: add crawlee python worker integration
  • 49f0871 chore: ignore crawlee python build artifacts

Added:

  • packages/crawlee-python/
  • scripts/setup-crawlee-python-worker.sh
  • docs/TIP_CRAWLEE_RUNTIME.md
  • scraper scripts:
    • pnpm -C packages/scraper scrape:fs:db-detail
    • pnpm -C packages/scraper scrape:fs:url-discovery

Policy:

  • TypeScript Crawlee/Playwright remains the TIP production crawler core.
  • Crawlee-Python is optional for Pi/Proxmox/residential workers and writes JSONL evidence only.
  • Crawlee-Python does not write directly to TIP DB.
  • No external AI was used.

Smoke test:

  • Installed crawlee==1.6.3 in /tmp/tip-crawlee-python-venv.
  • Ran tip_crawlee_worker against https://crawlee.dev.
  • JSONL evidence output succeeded.

Equivalence Revalidation

Preflight:

  • pending=0
  • approved=1986
  • auto_approved=32080
  • rejected=148367
  • due_research=0
  • active approved/auto-approved matches: 34066

Strict DB preflight over all active matches:

  • no recent-price gaps: 0
  • hard technical mismatches: 0
  • missing critical 1:1 evidence: 0

Hard criteria checked:

  • recent competitor price evidence
  • form factor
  • speed
  • fiber type
  • reach ratio
  • primary wavelength

Action:

  • Marked all 34066 active approved/auto_approved equivalences as immediately due.
  • Queued 18 PgBoss jobs for maintenance:re-research-equivalences.
  • Used the existing DB-only TIP research worker.
  • No browser crawler wave was started.

Result:

  • 18/18 jobs completed.
  • due_research=0
  • active_researched_today=34066
  • no automated-research rejections in this immediate pass
  • final queue:
    • pending=0
    • approved=1986
    • auto_approved=32080
    • rejected=148367

Final product verification counters:

  • competitor_verified=11470
  • price_verified=11557
  • image_verified=10711
  • details_verified=9929
  • fully_verified=9135
  • total transceivers: 17647

TIP health after run:

  • status: healthy
  • load status: ok
  • memory used: 13%
  • API/DB connected

Truth For Next Agent

The manual equivalence queue is empty and all active equivalence matches have just been rechecked by deterministic 1:1 rules.

This does not mean every product row in TIP is fully complete. Product verification gaps remain vendor-specific crawler/enrichment work. Largest remaining gaps are outside the already-focused Flexoptix and FS.com passes, especially ATGBICS, NADDOD, GAO Tek, Juniper/Cisco, Ascent Optics, Eoptolink and other vendor/catalog rows.

Do not start a broad browser crawler wave on Erik. Continue vendor-targeted, low-concurrency jobs or move heavier discovery to Pi/Proxmox workers.