# 2026-04-29 Codex Full Session Handoff This is the complete Codex-side handoff for the recent TIP work visible in this thread. It is intended for Claude Code, Codex, and laptop sync workflows via Gitea. ## Scope Covered here: - TIP product verification and crawler status. - Product image/details verification fixes. - Blog Engine Hot Topics fix. - TIPLLM-only robot planning policy. - Gitea-backed TIPLLM training pool experience logging. - Erik operational safety constraints. - Cross-repo sync with `rene/llm-gateway`. Not covered: - Chats or history that are not present in this Codex thread and not already written into this repository or sibling `sync/` folders. ## User Intent Rene wants TIP to move toward completion: - Product photos must be crawled and verified. - Product details must be verified. - Overall product verification must be trustworthy. - Blog Engine Hot Topics should rotate/use meaningful topics for blog creation. - Crawler/robot orchestration should use available Proxmox/Pi capacity. - Erik must not be overloaded by heavy crawlers. - TIPLLM must be the only AI used for crawler/robot planning and extraction feedback. - Robot/crawler experiences must become TIPLLM training data in Gitea. - All agent handoffs should live in `sync/` folders in Gitea so Claude Code/Codex/laptop workflows can continue cleanly. ## Live TIP Snapshot Last checked live on 2026-04-29: - API: healthy. - `tip-api`: online. - `tip-scraper-daemon`: online. - Total transceivers: `13,546`. - Price verified: `7,250`. - Image verified: `7,025`. - Details verified: `6,243`. - Fully verified: `5,812`. - Last price observation: `2026-04-29 19:15:53 UTC`. - Last stock observation: `2026-04-29 19:15:56 UTC`. - Transceiver updates in last 24h at time of check: `5,175`. - New transceivers in last 24h at time of check: `462`. ## Verification Blockers At the DB blocker check: - Missing price: `6,296`. - Missing image: `6,521`. - Missing details: `7,303`. - Near-full but missing details: `797`. - Near-full but missing image: `237`. - Near-full but missing price: `43`. Top vendor blockers included: - Juniper Networks: `464` not fully verified, mostly images. - GAO Tek: `414`, mostly details. - FS.COM: `378`, details/images. - Cisco Systems: `330`, all signals missing. - Ascent Optics: `305`, all signals missing. - Eoptolink: `287`, all signals missing. - ATGBICS: about `250` not fully verified. - Flexoptix: about `119` details. - FiberMall: about `72` details. Recommended verification strategy: 1. Details fast lane first, because near-full missing-details rows convert fastest. 2. Then targeted image backfill for large OEMs. 3. Treat OEM price verification separately; many OEM catalog products may not have direct prices. ## Product Verification Work Completed Implemented verification pipeline changes: - Product image crawl writes `image_verified`, `image_verified_url`, `image_verified_at`. - Product detail scrape writes `details_verified`, `details_source_url`, `details_verified_at`. - Scraped product pages now preserve/backfill `product_page_url`. - Maintenance reconcile promotes old data into verification flags. - CLI exposes `--backfill-images`. - Migration added: - `sql/102-product-verification-reconcile.sql` Important touched paths: - `packages/scraper/src/utils/db.ts` - `packages/scraper/src/utils/backfill-images.ts` - `packages/scraper/src/utils/image-downloader.ts` - `packages/scraper/src/utils/spec-updater.ts` - `packages/scraper/src/index.ts` - `packages/scraper/src/scheduler.ts` - `packages/scraper/src/scrapers/atgbics.ts` - `packages/scraper/src/scrapers/fiber24.ts` - `packages/scraper/src/scrapers/fibermall.ts` - `sql/102-product-verification-reconcile.sql` Migration result on Erik: - Total: `13,084` at that earlier time. - Image verified: `6,423`. - Details verified: `6,231`. - Fully verified: `5,704`. Then image backfill ran: - GAO Tek: `313` updated, `6` no-image, `95` errors/404s. - Other vendors: `289 / 309` updated. - Total new images: `602`. - Backfill elapsed: about `1369.1s`. After restart at that time: - Image verified: `7,025`. - Fully verified: `5,812`. ## Blog Engine Hot Topics Work Completed User reported: - Blog Engine Hot Topics always showed the same topics. - These topics are used to create blog posts. - More content/context for BlogLLM would help. Root causes found: - Hot Topics API effectively sorted by `urgency` only, so static `hot/breaking` topics dominated. - Rotating research/evergreen topics existed but were lower priority and often invisible. - Dashboard sent `customTitle` / `customAngle`, but API expected `custom_title` / `additional_context`. - `blog_title_created` badge existed in UI but API did not populate it. Implemented: - Diversified ranking with urgency, source score, freshness, deterministic jitter and source caps. - Refresh shuffle via query seed. - Already-created topics demoted via recent `blog_drafts`. - API returns: - `blog_title_created` - `last_blog_created_at` - `rank_score` - `llm_context` - Dashboard passes: - `custom_title` - `additional_context` - Blog route injects Hot Topic briefing into master-draft context as well as topic expansion. Important paths: - `packages/api/src/routes/hot-topics.ts` - `packages/dashboard/hot-topics.js` - `packages/api/src/routes/blog.ts` Verified live: - `/api/hot-topics?limit=...&shuffle=...` returns varied ordering. - `llm_context` is present. - API remained healthy after restart. ## TIPLLM Robot Policy User explicitly requested: - Use TIPLLM only. - No other AI for this crawler/robot planning lane. - Write experiences into a Gitea training pool. - If TIPLLM training pool does not exist, create it. Implemented local code: - `packages/scraper/src/robots/verification-robots.ts` - `--status` - `--tipllm-plan --limit=N` - `--enqueue=details-fast-lane|priority-vendors|all` - `--profile=erik-safe|pi-fetch|proxmox-heavy` - `--dry-run` - `--max-queues=N` - `packages/scraper/src/crawler-llm/training-data-writer.ts` - added `writeRobotExperience`. - writes raw robot audit rows. - writes SFT records for TIPLLM. - removed hardcoded Gitea token fallback. - uses existing git remote when no `GITEA_TOKEN` env var is set. - `scripts/tip-learning-pool-build.ts` - imports `TIP_TRAINING_REPO/qa-pairs/**/*.jsonl` into `tip_llm`. - `docs/TIP_SELFLEARNING_WORKFLOW.md` - documented robot experience pool and safety defaults. - `packages/scraper/package.json` - added `robots:verification`. Safety defaults: - Default profile: `erik-safe`. - `erik-safe` max queues: `3`. - `erik-safe` excludes heavy Playwright/discovery queues. - `pi-fetch` excludes heavy/discovery queues. - `proxmox-heavy` is explicit and intended for heavy crawler work. No crawler jobs were started while building this. No queue waves were enqueued while building this. ## Gitea TIPLLM Training Pool Found local clone: - `/tmp/tip-training-data` - remote: `rene/tip-training-data` Erik did not have `/tmp/tip-training-data/.git` at the time of check. Wrote first robot experience record locally and pushed to Gitea: ```text f1c83f8 crawl: add robot-status training records [2026-04-29T20:11:24.091Z] ``` Files in Gitea training pool: - `qa-pairs/robot-control-high.jsonl` - `robot-experiences/2026-04-29.jsonl` This record encodes: - TIPLLM-only policy. - Erik controller-only policy. - Proxmox/Pi heavy worker policy. - No crawler jobs started. ## Erik Notes Synced robot/training code to `/opt/tip`. Did not: - start crawler jobs. - enqueue robot waves. - restart PM2 services. Remote scraper TypeScript build initially failed because of stale misplaced remote-only duplicate files: - `/opt/tip/packages/scraper/src/scrapers/scheduler.ts` - `/opt/tip/packages/scraper/src/vendor-discovery-crawler.ts` These did not exist locally and had wrong relative imports. Removed only these duplicates. Remote scraper build passed afterward. PM2 status after this: - `tip-api`: online. - `tip-scraper-daemon`: online. ## Cross-Repo Sync Claude Code created a similar sync handoff in `rene/llm-gateway`. From user screenshot: ```text e272105 sync: add chat handoff + context scaffolding for Codex integration (2026-04-29) ``` Gitea path shown: ```text http://192.168.178.196:3000/rene/llm-gateway/src/main/sync/ ``` Rule: When work touches TIP, Magatama, LLM Gateway, bridges, auth, or shared Erik infrastructure, read both: - `transceiver-db/sync/CURRENT.md` - `llm-gateway/sync/CURRENT.md` ## Sync Folder Work Created in this repo: - `sync/README.md` - `sync/CURRENT.md` - `sync/history/2026-04-29-tipllm-robot-learning.md` - `sync/history/2026-04-29-cross-repo-sync.md` - this file. Already pushed earlier: ```text 6c42ca7 docs: add shared agent sync handoff 8e7c5aa docs: link llm-gateway sync handoff ``` ## Current Dirty Worktree As of this handoff, many non-sync files remain modified/untracked: - `CHANGELOG_PENDING.md` - `docs/TIP_SELFLEARNING_WORKFLOW.md` - `packages/api/src/routes/hot-topics.ts` - `packages/dashboard/hot-topics.js` - `packages/mcp-server/src/index.ts` - `packages/scraper/package.json` - `packages/scraper/src/crawler-llm/core.ts` - `packages/scraper/src/crawler-llm/training-data-writer.ts` - `packages/scraper/src/scrapers/atgbics.ts` - `packages/scraper/src/scrapers/fiber24.ts` - `packages/scraper/src/scrapers/fibermall.ts` - `packages/scraper/src/utils/backfill-images.ts` - `packages/scraper/src/utils/db.ts` - `packages/scraper/src/utils/image-downloader.ts` - `packages/scraper/src/utils/spec-updater.ts` - `scripts/tip-learning-pool-build.ts` - `packages/scraper/src/robots/` - `packages/scraper/src/scrapers/audiocodes-oem.ts` - `packages/scraper/src/seed-batch35.ts` - `packages/scraper/src/seed-batch36.ts` - `packages/scraper/src/seed-batch37.ts` - `sql/102-product-verification-reconcile.sql` Do not revert blindly. Some are Codex changes from this session; some appear to be pre-existing Claude/Codex work. ## Safe Commands Read-only/status: ```bash npm run robots:verification -w packages/scraper -- --status ``` TIPLLM planning only, no crawl jobs: ```bash npm run robots:verification -w packages/scraper -- --tipllm-plan --limit=3 ``` Dry-run queue plan only: ```bash npm run robots:verification -w packages/scraper -- --enqueue=details-fast-lane --profile=erik-safe --dry-run ``` Build checks: ```bash npm run build -w packages/scraper npm run build -w packages/api ``` ## Next Recommended Steps 1. Pull both sync folders from Gitea: - `rene/transceiver-db` - `rene/llm-gateway` 2. Review dirty worktree before committing code. 3. Decide whether to commit TIP verification + Hot Topics + robot learning code as one or several commits. 4. If running robots, start with TIPLLM planning only. 5. If dispatching crawl work, send heavy profiles to Proxmox/Pi, not Erik.