11 KiB
2026-04-29 Codex Full Session Handoff
This is the complete Codex-side handoff for the recent TIP work visible in this thread. It is intended for Claude Code, Codex, and laptop sync workflows via Gitea.
Scope
Covered here:
- TIP product verification and crawler status.
- Product image/details verification fixes.
- Blog Engine Hot Topics fix.
- TIPLLM-only robot planning policy.
- Gitea-backed TIPLLM training pool experience logging.
- Erik operational safety constraints.
- Cross-repo sync with
rene/llm-gateway.
Not covered:
- Chats or history that are not present in this Codex thread and not already written into this repository or sibling
sync/folders.
User Intent
Rene wants TIP to move toward completion:
- Product photos must be crawled and verified.
- Product details must be verified.
- Overall product verification must be trustworthy.
- Blog Engine Hot Topics should rotate/use meaningful topics for blog creation.
- Crawler/robot orchestration should use available Proxmox/Pi capacity.
- Erik must not be overloaded by heavy crawlers.
- TIPLLM must be the only AI used for crawler/robot planning and extraction feedback.
- Robot/crawler experiences must become TIPLLM training data in Gitea.
- All agent handoffs should live in
sync/folders in Gitea so Claude Code/Codex/laptop workflows can continue cleanly.
Live TIP Snapshot
Last checked live on 2026-04-29:
- API: healthy.
tip-api: online.tip-scraper-daemon: online.- Total transceivers:
13,546. - Price verified:
7,250. - Image verified:
7,025. - Details verified:
6,243. - Fully verified:
5,812. - Last price observation:
2026-04-29 19:15:53 UTC. - Last stock observation:
2026-04-29 19:15:56 UTC. - Transceiver updates in last 24h at time of check:
5,175. - New transceivers in last 24h at time of check:
462.
Verification Blockers
At the DB blocker check:
- Missing price:
6,296. - Missing image:
6,521. - Missing details:
7,303. - Near-full but missing details:
797. - Near-full but missing image:
237. - Near-full but missing price:
43.
Top vendor blockers included:
- Juniper Networks:
464not fully verified, mostly images. - GAO Tek:
414, mostly details. - FS.COM:
378, details/images. - Cisco Systems:
330, all signals missing. - Ascent Optics:
305, all signals missing. - Eoptolink:
287, all signals missing. - ATGBICS: about
250not fully verified. - Flexoptix: about
119details. - FiberMall: about
72details.
Recommended verification strategy:
- Details fast lane first, because near-full missing-details rows convert fastest.
- Then targeted image backfill for large OEMs.
- Treat OEM price verification separately; many OEM catalog products may not have direct prices.
Product Verification Work Completed
Implemented verification pipeline changes:
- Product image crawl writes
image_verified,image_verified_url,image_verified_at. - Product detail scrape writes
details_verified,details_source_url,details_verified_at. - Scraped product pages now preserve/backfill
product_page_url. - Maintenance reconcile promotes old data into verification flags.
- CLI exposes
--backfill-images. - Migration added:
sql/102-product-verification-reconcile.sql
Important touched paths:
packages/scraper/src/utils/db.tspackages/scraper/src/utils/backfill-images.tspackages/scraper/src/utils/image-downloader.tspackages/scraper/src/utils/spec-updater.tspackages/scraper/src/index.tspackages/scraper/src/scheduler.tspackages/scraper/src/scrapers/atgbics.tspackages/scraper/src/scrapers/fiber24.tspackages/scraper/src/scrapers/fibermall.tssql/102-product-verification-reconcile.sql
Migration result on Erik:
- Total:
13,084at that earlier time. - Image verified:
6,423. - Details verified:
6,231. - Fully verified:
5,704.
Then image backfill ran:
- GAO Tek:
313updated,6no-image,95errors/404s. - Other vendors:
289 / 309updated. - Total new images:
602. - Backfill elapsed: about
1369.1s.
After restart at that time:
- Image verified:
7,025. - Fully verified:
5,812.
Blog Engine Hot Topics Work Completed
User reported:
- Blog Engine Hot Topics always showed the same topics.
- These topics are used to create blog posts.
- More content/context for BlogLLM would help.
Root causes found:
- Hot Topics API effectively sorted by
urgencyonly, so statichot/breakingtopics dominated. - Rotating research/evergreen topics existed but were lower priority and often invisible.
- Dashboard sent
customTitle/customAngle, but API expectedcustom_title/additional_context. blog_title_createdbadge existed in UI but API did not populate it.
Implemented:
- Diversified ranking with urgency, source score, freshness, deterministic jitter and source caps.
- Refresh shuffle via query seed.
- Already-created topics demoted via recent
blog_drafts. - API returns:
blog_title_createdlast_blog_created_atrank_scorellm_context
- Dashboard passes:
custom_titleadditional_context
- Blog route injects Hot Topic briefing into master-draft context as well as topic expansion.
Important paths:
packages/api/src/routes/hot-topics.tspackages/dashboard/hot-topics.jspackages/api/src/routes/blog.ts
Verified live:
/api/hot-topics?limit=...&shuffle=...returns varied ordering.llm_contextis present.- API remained healthy after restart.
TIPLLM Robot Policy
User explicitly requested:
- Use TIPLLM only.
- No other AI for this crawler/robot planning lane.
- Write experiences into a Gitea training pool.
- If TIPLLM training pool does not exist, create it.
Implemented local code:
packages/scraper/src/robots/verification-robots.ts--status--tipllm-plan --limit=N--enqueue=details-fast-lane|priority-vendors|all--profile=erik-safe|pi-fetch|proxmox-heavy--dry-run--max-queues=N
packages/scraper/src/crawler-llm/training-data-writer.ts- added
writeRobotExperience. - writes raw robot audit rows.
- writes SFT records for TIPLLM.
- removed hardcoded Gitea token fallback.
- uses existing git remote when no
GITEA_TOKENenv var is set.
- added
scripts/tip-learning-pool-build.ts- imports
TIP_TRAINING_REPO/qa-pairs/**/*.jsonlintotip_llm.
- imports
docs/TIP_SELFLEARNING_WORKFLOW.md- documented robot experience pool and safety defaults.
packages/scraper/package.json- added
robots:verification.
- added
Safety defaults:
- Default profile:
erik-safe. erik-safemax queues:3.erik-safeexcludes heavy Playwright/discovery queues.pi-fetchexcludes heavy/discovery queues.proxmox-heavyis explicit and intended for heavy crawler work.
No crawler jobs were started while building this. No queue waves were enqueued while building this.
Gitea TIPLLM Training Pool
Found local clone:
/tmp/tip-training-data- remote:
rene/tip-training-data
Erik did not have /tmp/tip-training-data/.git at the time of check.
Wrote first robot experience record locally and pushed to Gitea:
f1c83f8 crawl: add robot-status training records [2026-04-29T20:11:24.091Z]
Files in Gitea training pool:
qa-pairs/robot-control-high.jsonlrobot-experiences/2026-04-29.jsonl
This record encodes:
- TIPLLM-only policy.
- Erik controller-only policy.
- Proxmox/Pi heavy worker policy.
- No crawler jobs started.
Erik Notes
Synced robot/training code to /opt/tip.
Did not:
- start crawler jobs.
- enqueue robot waves.
- restart PM2 services.
Remote scraper TypeScript build initially failed because of stale misplaced remote-only duplicate files:
/opt/tip/packages/scraper/src/scrapers/scheduler.ts/opt/tip/packages/scraper/src/vendor-discovery-crawler.ts
These did not exist locally and had wrong relative imports. Removed only these duplicates. Remote scraper build passed afterward.
PM2 status after this:
tip-api: online.tip-scraper-daemon: online.
Cross-Repo Sync
Claude Code created a similar sync handoff in rene/llm-gateway.
From user screenshot:
e272105 sync: add chat handoff + context scaffolding for Codex integration (2026-04-29)
Gitea path shown:
http://192.168.178.196:3000/rene/llm-gateway/src/main/sync/
Rule:
When work touches TIP, Magatama, LLM Gateway, bridges, auth, or shared Erik infrastructure, read both:
transceiver-db/sync/CURRENT.mdllm-gateway/sync/CURRENT.md
Sync Folder Work
Created in this repo:
sync/README.mdsync/CURRENT.mdsync/history/2026-04-29-tipllm-robot-learning.mdsync/history/2026-04-29-cross-repo-sync.md- this file.
Already pushed earlier:
6c42ca7 docs: add shared agent sync handoff
8e7c5aa docs: link llm-gateway sync handoff
Current Dirty Worktree
As of this handoff, many non-sync files remain modified/untracked:
CHANGELOG_PENDING.mddocs/TIP_SELFLEARNING_WORKFLOW.mdpackages/api/src/routes/hot-topics.tspackages/dashboard/hot-topics.jspackages/mcp-server/src/index.tspackages/scraper/package.jsonpackages/scraper/src/crawler-llm/core.tspackages/scraper/src/crawler-llm/training-data-writer.tspackages/scraper/src/scrapers/atgbics.tspackages/scraper/src/scrapers/fiber24.tspackages/scraper/src/scrapers/fibermall.tspackages/scraper/src/utils/backfill-images.tspackages/scraper/src/utils/db.tspackages/scraper/src/utils/image-downloader.tspackages/scraper/src/utils/spec-updater.tsscripts/tip-learning-pool-build.tspackages/scraper/src/robots/packages/scraper/src/scrapers/audiocodes-oem.tspackages/scraper/src/seed-batch35.tspackages/scraper/src/seed-batch36.tspackages/scraper/src/seed-batch37.tssql/102-product-verification-reconcile.sql
Do not revert blindly. Some are Codex changes from this session; some appear to be pre-existing Claude/Codex work.
Safe Commands
Read-only/status:
npm run robots:verification -w packages/scraper -- --status
TIPLLM planning only, no crawl jobs:
npm run robots:verification -w packages/scraper -- --tipllm-plan --limit=3
Dry-run queue plan only:
npm run robots:verification -w packages/scraper -- --enqueue=details-fast-lane --profile=erik-safe --dry-run
Build checks:
npm run build -w packages/scraper
npm run build -w packages/api
Next Recommended Steps
- Pull both sync folders from Gitea:
rene/transceiver-dbrene/llm-gateway
- Review dirty worktree before committing code.
- Decide whether to commit TIP verification + Hot Topics + robot learning code as one or several commits.
- If running robots, start with TIPLLM planning only.
- If dispatching crawl work, send heavy profiles to Proxmox/Pi, not Erik.