transceiver-db/sync/history/2026-04-29-codex-full-session-handoff.md
2026-04-29 22:52:56 +02:00

11 KiB

2026-04-29 Codex Full Session Handoff

This is the complete Codex-side handoff for the recent TIP work visible in this thread. It is intended for Claude Code, Codex, and laptop sync workflows via Gitea.

Scope

Covered here:

  • TIP product verification and crawler status.
  • Product image/details verification fixes.
  • Blog Engine Hot Topics fix.
  • TIPLLM-only robot planning policy.
  • Gitea-backed TIPLLM training pool experience logging.
  • Erik operational safety constraints.
  • Cross-repo sync with rene/llm-gateway.

Not covered:

  • Chats or history that are not present in this Codex thread and not already written into this repository or sibling sync/ folders.

User Intent

Rene wants TIP to move toward completion:

  • Product photos must be crawled and verified.
  • Product details must be verified.
  • Overall product verification must be trustworthy.
  • Blog Engine Hot Topics should rotate/use meaningful topics for blog creation.
  • Crawler/robot orchestration should use available Proxmox/Pi capacity.
  • Erik must not be overloaded by heavy crawlers.
  • TIPLLM must be the only AI used for crawler/robot planning and extraction feedback.
  • Robot/crawler experiences must become TIPLLM training data in Gitea.
  • All agent handoffs should live in sync/ folders in Gitea so Claude Code/Codex/laptop workflows can continue cleanly.

Live TIP Snapshot

Last checked live on 2026-04-29:

  • API: healthy.
  • tip-api: online.
  • tip-scraper-daemon: online.
  • Total transceivers: 13,546.
  • Price verified: 7,250.
  • Image verified: 7,025.
  • Details verified: 6,243.
  • Fully verified: 5,812.
  • Last price observation: 2026-04-29 19:15:53 UTC.
  • Last stock observation: 2026-04-29 19:15:56 UTC.
  • Transceiver updates in last 24h at time of check: 5,175.
  • New transceivers in last 24h at time of check: 462.

Verification Blockers

At the DB blocker check:

  • Missing price: 6,296.
  • Missing image: 6,521.
  • Missing details: 7,303.
  • Near-full but missing details: 797.
  • Near-full but missing image: 237.
  • Near-full but missing price: 43.

Top vendor blockers included:

  • Juniper Networks: 464 not fully verified, mostly images.
  • GAO Tek: 414, mostly details.
  • FS.COM: 378, details/images.
  • Cisco Systems: 330, all signals missing.
  • Ascent Optics: 305, all signals missing.
  • Eoptolink: 287, all signals missing.
  • ATGBICS: about 250 not fully verified.
  • Flexoptix: about 119 details.
  • FiberMall: about 72 details.

Recommended verification strategy:

  1. Details fast lane first, because near-full missing-details rows convert fastest.
  2. Then targeted image backfill for large OEMs.
  3. Treat OEM price verification separately; many OEM catalog products may not have direct prices.

Product Verification Work Completed

Implemented verification pipeline changes:

  • Product image crawl writes image_verified, image_verified_url, image_verified_at.
  • Product detail scrape writes details_verified, details_source_url, details_verified_at.
  • Scraped product pages now preserve/backfill product_page_url.
  • Maintenance reconcile promotes old data into verification flags.
  • CLI exposes --backfill-images.
  • Migration added:
    • sql/102-product-verification-reconcile.sql

Important touched paths:

  • packages/scraper/src/utils/db.ts
  • packages/scraper/src/utils/backfill-images.ts
  • packages/scraper/src/utils/image-downloader.ts
  • packages/scraper/src/utils/spec-updater.ts
  • packages/scraper/src/index.ts
  • packages/scraper/src/scheduler.ts
  • packages/scraper/src/scrapers/atgbics.ts
  • packages/scraper/src/scrapers/fiber24.ts
  • packages/scraper/src/scrapers/fibermall.ts
  • sql/102-product-verification-reconcile.sql

Migration result on Erik:

  • Total: 13,084 at that earlier time.
  • Image verified: 6,423.
  • Details verified: 6,231.
  • Fully verified: 5,704.

Then image backfill ran:

  • GAO Tek: 313 updated, 6 no-image, 95 errors/404s.
  • Other vendors: 289 / 309 updated.
  • Total new images: 602.
  • Backfill elapsed: about 1369.1s.

After restart at that time:

  • Image verified: 7,025.
  • Fully verified: 5,812.

Blog Engine Hot Topics Work Completed

User reported:

  • Blog Engine Hot Topics always showed the same topics.
  • These topics are used to create blog posts.
  • More content/context for BlogLLM would help.

Root causes found:

  • Hot Topics API effectively sorted by urgency only, so static hot/breaking topics dominated.
  • Rotating research/evergreen topics existed but were lower priority and often invisible.
  • Dashboard sent customTitle / customAngle, but API expected custom_title / additional_context.
  • blog_title_created badge existed in UI but API did not populate it.

Implemented:

  • Diversified ranking with urgency, source score, freshness, deterministic jitter and source caps.
  • Refresh shuffle via query seed.
  • Already-created topics demoted via recent blog_drafts.
  • API returns:
    • blog_title_created
    • last_blog_created_at
    • rank_score
    • llm_context
  • Dashboard passes:
    • custom_title
    • additional_context
  • Blog route injects Hot Topic briefing into master-draft context as well as topic expansion.

Important paths:

  • packages/api/src/routes/hot-topics.ts
  • packages/dashboard/hot-topics.js
  • packages/api/src/routes/blog.ts

Verified live:

  • /api/hot-topics?limit=...&shuffle=... returns varied ordering.
  • llm_context is present.
  • API remained healthy after restart.

TIPLLM Robot Policy

User explicitly requested:

  • Use TIPLLM only.
  • No other AI for this crawler/robot planning lane.
  • Write experiences into a Gitea training pool.
  • If TIPLLM training pool does not exist, create it.

Implemented local code:

  • packages/scraper/src/robots/verification-robots.ts
    • --status
    • --tipllm-plan --limit=N
    • --enqueue=details-fast-lane|priority-vendors|all
    • --profile=erik-safe|pi-fetch|proxmox-heavy
    • --dry-run
    • --max-queues=N
  • packages/scraper/src/crawler-llm/training-data-writer.ts
    • added writeRobotExperience.
    • writes raw robot audit rows.
    • writes SFT records for TIPLLM.
    • removed hardcoded Gitea token fallback.
    • uses existing git remote when no GITEA_TOKEN env var is set.
  • scripts/tip-learning-pool-build.ts
    • imports TIP_TRAINING_REPO/qa-pairs/**/*.jsonl into tip_llm.
  • docs/TIP_SELFLEARNING_WORKFLOW.md
    • documented robot experience pool and safety defaults.
  • packages/scraper/package.json
    • added robots:verification.

Safety defaults:

  • Default profile: erik-safe.
  • erik-safe max queues: 3.
  • erik-safe excludes heavy Playwright/discovery queues.
  • pi-fetch excludes heavy/discovery queues.
  • proxmox-heavy is explicit and intended for heavy crawler work.

No crawler jobs were started while building this. No queue waves were enqueued while building this.

Gitea TIPLLM Training Pool

Found local clone:

  • /tmp/tip-training-data
  • remote: rene/tip-training-data

Erik did not have /tmp/tip-training-data/.git at the time of check.

Wrote first robot experience record locally and pushed to Gitea:

f1c83f8 crawl: add robot-status training records [2026-04-29T20:11:24.091Z]

Files in Gitea training pool:

  • qa-pairs/robot-control-high.jsonl
  • robot-experiences/2026-04-29.jsonl

This record encodes:

  • TIPLLM-only policy.
  • Erik controller-only policy.
  • Proxmox/Pi heavy worker policy.
  • No crawler jobs started.

Erik Notes

Synced robot/training code to /opt/tip.

Did not:

  • start crawler jobs.
  • enqueue robot waves.
  • restart PM2 services.

Remote scraper TypeScript build initially failed because of stale misplaced remote-only duplicate files:

  • /opt/tip/packages/scraper/src/scrapers/scheduler.ts
  • /opt/tip/packages/scraper/src/vendor-discovery-crawler.ts

These did not exist locally and had wrong relative imports. Removed only these duplicates. Remote scraper build passed afterward.

PM2 status after this:

  • tip-api: online.
  • tip-scraper-daemon: online.

Cross-Repo Sync

Claude Code created a similar sync handoff in rene/llm-gateway.

From user screenshot:

e272105 sync: add chat handoff + context scaffolding for Codex integration (2026-04-29)

Gitea path shown:

http://192.168.178.196:3000/rene/llm-gateway/src/main/sync/

Rule:

When work touches TIP, Magatama, LLM Gateway, bridges, auth, or shared Erik infrastructure, read both:

  • transceiver-db/sync/CURRENT.md
  • llm-gateway/sync/CURRENT.md

Sync Folder Work

Created in this repo:

  • sync/README.md
  • sync/CURRENT.md
  • sync/history/2026-04-29-tipllm-robot-learning.md
  • sync/history/2026-04-29-cross-repo-sync.md
  • this file.

Already pushed earlier:

6c42ca7 docs: add shared agent sync handoff
8e7c5aa docs: link llm-gateway sync handoff

Current Dirty Worktree

As of this handoff, many non-sync files remain modified/untracked:

  • CHANGELOG_PENDING.md
  • docs/TIP_SELFLEARNING_WORKFLOW.md
  • packages/api/src/routes/hot-topics.ts
  • packages/dashboard/hot-topics.js
  • packages/mcp-server/src/index.ts
  • packages/scraper/package.json
  • packages/scraper/src/crawler-llm/core.ts
  • packages/scraper/src/crawler-llm/training-data-writer.ts
  • packages/scraper/src/scrapers/atgbics.ts
  • packages/scraper/src/scrapers/fiber24.ts
  • packages/scraper/src/scrapers/fibermall.ts
  • packages/scraper/src/utils/backfill-images.ts
  • packages/scraper/src/utils/db.ts
  • packages/scraper/src/utils/image-downloader.ts
  • packages/scraper/src/utils/spec-updater.ts
  • scripts/tip-learning-pool-build.ts
  • packages/scraper/src/robots/
  • packages/scraper/src/scrapers/audiocodes-oem.ts
  • packages/scraper/src/seed-batch35.ts
  • packages/scraper/src/seed-batch36.ts
  • packages/scraper/src/seed-batch37.ts
  • sql/102-product-verification-reconcile.sql

Do not revert blindly. Some are Codex changes from this session; some appear to be pre-existing Claude/Codex work.

Safe Commands

Read-only/status:

npm run robots:verification -w packages/scraper -- --status

TIPLLM planning only, no crawl jobs:

npm run robots:verification -w packages/scraper -- --tipllm-plan --limit=3

Dry-run queue plan only:

npm run robots:verification -w packages/scraper -- --enqueue=details-fast-lane --profile=erik-safe --dry-run

Build checks:

npm run build -w packages/scraper
npm run build -w packages/api
  1. Pull both sync folders from Gitea:
    • rene/transceiver-db
    • rene/llm-gateway
  2. Review dirty worktree before committing code.
  3. Decide whether to commit TIP verification + Hot Topics + robot learning code as one or several commits.
  4. If running robots, start with TIPLLM planning only.
  5. If dispatching crawl work, send heavy profiles to Proxmox/Pi, not Erik.