transceiver-db/sync/history/2026-04-29-codex-full-session-handoff.md

# 2026-04-29 Codex Full Session Handoff

This is the complete Codex-side handoff for the recent TIP work visible in this thread. It is intended for Claude Code, Codex, and laptop sync workflows via Gitea.

## Scope

Covered here:

- TIP product verification and crawler status.
- Product image/details verification fixes.
- Blog Engine Hot Topics fix.
- TIPLLM-only robot planning policy.
- Gitea-backed TIPLLM training pool experience logging.
- Erik operational safety constraints.
- Cross-repo sync with `rene/llm-gateway`.

Not covered:

- Chats or history that are not present in this Codex thread and not already written into this repository or sibling `sync/` folders.

## User Intent

Rene wants TIP to move toward completion:

- Product photos must be crawled and verified.
- Product details must be verified.
- Overall product verification must be trustworthy.
- Blog Engine Hot Topics should rotate/use meaningful topics for blog creation.
- Crawler/robot orchestration should use available Proxmox/Pi capacity.
- Erik must not be overloaded by heavy crawlers.
- TIPLLM must be the only AI used for crawler/robot planning and extraction feedback.
- Robot/crawler experiences must become TIPLLM training data in Gitea.
- All agent handoffs should live in `sync/` folders in Gitea so Claude Code/Codex/laptop workflows can continue cleanly.

## Live TIP Snapshot

Last checked live on 2026-04-29:

- API: healthy.
- `tip-api`: online.
- `tip-scraper-daemon`: online.
- Total transceivers: `13,546`.
- Price verified: `7,250`.
- Image verified: `7,025`.
- Details verified: `6,243`.
- Fully verified: `5,812`.
- Last price observation: `2026-04-29 19:15:53 UTC`.
- Last stock observation: `2026-04-29 19:15:56 UTC`.
- Transceiver updates in last 24h at time of check: `5,175`.
- New transceivers in last 24h at time of check: `462`.

## Verification Blockers

At the DB blocker check:

- Missing price: `6,296`.
- Missing image: `6,521`.
- Missing details: `7,303`.
- Near-full but missing details: `797`.
- Near-full but missing image: `237`.
- Near-full but missing price: `43`.

Top vendor blockers included:

- Juniper Networks: `464` not fully verified, mostly images.
- GAO Tek: `414`, mostly details.
- FS.COM: `378`, details/images.
- Cisco Systems: `330`, all signals missing.
- Ascent Optics: `305`, all signals missing.
- Eoptolink: `287`, all signals missing.
- ATGBICS: about `250` not fully verified.
- Flexoptix: about `119` details.
- FiberMall: about `72` details.

Recommended verification strategy:

1. Details fast lane first, because near-full missing-details rows convert fastest.
2. Then targeted image backfill for large OEMs.
3. Treat OEM price verification separately; many OEM catalog products may not have direct prices.

## Product Verification Work Completed

Implemented verification pipeline changes:

- Product image crawl writes `image_verified`, `image_verified_url`, `image_verified_at`.
- Product detail scrape writes `details_verified`, `details_source_url`, `details_verified_at`.
- Scraped product pages now preserve/backfill `product_page_url`.
- Maintenance reconcile promotes old data into verification flags.
- CLI exposes `--backfill-images`.
- Migration added:
  - `sql/102-product-verification-reconcile.sql`

Important touched paths:

- `packages/scraper/src/utils/db.ts`
- `packages/scraper/src/utils/backfill-images.ts`
- `packages/scraper/src/utils/image-downloader.ts`
- `packages/scraper/src/utils/spec-updater.ts`
- `packages/scraper/src/index.ts`
- `packages/scraper/src/scheduler.ts`
- `packages/scraper/src/scrapers/atgbics.ts`
- `packages/scraper/src/scrapers/fiber24.ts`
- `packages/scraper/src/scrapers/fibermall.ts`
- `sql/102-product-verification-reconcile.sql`

Migration result on Erik:

- Total: `13,084` at that earlier time.
- Image verified: `6,423`.
- Details verified: `6,231`.
- Fully verified: `5,704`.

Then image backfill ran:

- GAO Tek: `313` updated, `6` no-image, `95` errors/404s.
- Other vendors: `289 / 309` updated.
- Total new images: `602`.
- Backfill elapsed: about `1369.1s`.

After restart at that time:

- Image verified: `7,025`.
- Fully verified: `5,812`.

## Blog Engine Hot Topics Work Completed

User reported:

- Blog Engine Hot Topics always showed the same topics.
- These topics are used to create blog posts.
- More content/context for BlogLLM would help.

Root causes found:

- Hot Topics API effectively sorted by `urgency` only, so static `hot/breaking` topics dominated.
- Rotating research/evergreen topics existed but were lower priority and often invisible.
- Dashboard sent `customTitle` / `customAngle`, but API expected `custom_title` / `additional_context`.
- `blog_title_created` badge existed in UI but API did not populate it.

Implemented:

- Diversified ranking with urgency, source score, freshness, deterministic jitter and source caps.
- Refresh shuffle via query seed.
- Already-created topics demoted via recent `blog_drafts`.
- API returns:
  - `blog_title_created`
  - `last_blog_created_at`
  - `rank_score`
  - `llm_context`
- Dashboard passes:
  - `custom_title`
  - `additional_context`
- Blog route injects Hot Topic briefing into master-draft context as well as topic expansion.

Important paths:

- `packages/api/src/routes/hot-topics.ts`
- `packages/dashboard/hot-topics.js`
- `packages/api/src/routes/blog.ts`

Verified live:

- `/api/hot-topics?limit=...&shuffle=...` returns varied ordering.
- `llm_context` is present.
- API remained healthy after restart.

## TIPLLM Robot Policy

User explicitly requested:

- Use TIPLLM only.
- No other AI for this crawler/robot planning lane.
- Write experiences into a Gitea training pool.
- If TIPLLM training pool does not exist, create it.

Implemented local code:

- `packages/scraper/src/robots/verification-robots.ts`
  - `--status`
  - `--tipllm-plan --limit=N`
  - `--enqueue=details-fast-lane|priority-vendors|all`
  - `--profile=erik-safe|pi-fetch|proxmox-heavy`
  - `--dry-run`
  - `--max-queues=N`
- `packages/scraper/src/crawler-llm/training-data-writer.ts`
  - added `writeRobotExperience`.
  - writes raw robot audit rows.
  - writes SFT records for TIPLLM.
  - removed hardcoded Gitea token fallback.
  - uses existing git remote when no `GITEA_TOKEN` env var is set.
- `scripts/tip-learning-pool-build.ts`
  - imports `TIP_TRAINING_REPO/qa-pairs/**/*.jsonl` into `tip_llm`.
- `docs/TIP_SELFLEARNING_WORKFLOW.md`
  - documented robot experience pool and safety defaults.
- `packages/scraper/package.json`
  - added `robots:verification`.

Safety defaults:

- Default profile: `erik-safe`.
- `erik-safe` max queues: `3`.
- `erik-safe` excludes heavy Playwright/discovery queues.
- `pi-fetch` excludes heavy/discovery queues.
- `proxmox-heavy` is explicit and intended for heavy crawler work.

No crawler jobs were started while building this.
No queue waves were enqueued while building this.

## Gitea TIPLLM Training Pool

Found local clone:

- `/tmp/tip-training-data`
- remote: `rene/tip-training-data`

Erik did not have `/tmp/tip-training-data/.git` at the time of check.

Wrote first robot experience record locally and pushed to Gitea:

```text
f1c83f8 crawl: add robot-status training records [2026-04-29T20:11:24.091Z]
```

Files in Gitea training pool:

- `qa-pairs/robot-control-high.jsonl`
- `robot-experiences/2026-04-29.jsonl`

This record encodes:

- TIPLLM-only policy.
- Erik controller-only policy.
- Proxmox/Pi heavy worker policy.
- No crawler jobs started.

## Erik Notes

Synced robot/training code to `/opt/tip`.

Did not:

- start crawler jobs.
- enqueue robot waves.
- restart PM2 services.

Remote scraper TypeScript build initially failed because of stale misplaced remote-only duplicate files:

- `/opt/tip/packages/scraper/src/scrapers/scheduler.ts`
- `/opt/tip/packages/scraper/src/vendor-discovery-crawler.ts`

These did not exist locally and had wrong relative imports. Removed only these duplicates. Remote scraper build passed afterward.

PM2 status after this:

- `tip-api`: online.
- `tip-scraper-daemon`: online.

## Cross-Repo Sync

Claude Code created a similar sync handoff in `rene/llm-gateway`.

From user screenshot:

```text
e272105 sync: add chat handoff + context scaffolding for Codex integration (2026-04-29)
```

Gitea path shown:

```text
http://192.168.178.196:3000/rene/llm-gateway/src/main/sync/
```

Rule:

When work touches TIP, Magatama, LLM Gateway, bridges, auth, or shared Erik infrastructure, read both:

- `transceiver-db/sync/CURRENT.md`
- `llm-gateway/sync/CURRENT.md`

## Sync Folder Work

Created in this repo:

- `sync/README.md`
- `sync/CURRENT.md`
- `sync/history/2026-04-29-tipllm-robot-learning.md`
- `sync/history/2026-04-29-cross-repo-sync.md`
- this file.

Already pushed earlier:

```text
6c42ca7 docs: add shared agent sync handoff
8e7c5aa docs: link llm-gateway sync handoff
```

## Current Dirty Worktree

As of this handoff, many non-sync files remain modified/untracked:

- `CHANGELOG_PENDING.md`
- `docs/TIP_SELFLEARNING_WORKFLOW.md`
- `packages/api/src/routes/hot-topics.ts`
- `packages/dashboard/hot-topics.js`
- `packages/mcp-server/src/index.ts`
- `packages/scraper/package.json`
- `packages/scraper/src/crawler-llm/core.ts`
- `packages/scraper/src/crawler-llm/training-data-writer.ts`
- `packages/scraper/src/scrapers/atgbics.ts`
- `packages/scraper/src/scrapers/fiber24.ts`
- `packages/scraper/src/scrapers/fibermall.ts`
- `packages/scraper/src/utils/backfill-images.ts`
- `packages/scraper/src/utils/db.ts`
- `packages/scraper/src/utils/image-downloader.ts`
- `packages/scraper/src/utils/spec-updater.ts`
- `scripts/tip-learning-pool-build.ts`
- `packages/scraper/src/robots/`
- `packages/scraper/src/scrapers/audiocodes-oem.ts`
- `packages/scraper/src/seed-batch35.ts`
- `packages/scraper/src/seed-batch36.ts`
- `packages/scraper/src/seed-batch37.ts`
- `sql/102-product-verification-reconcile.sql`

Do not revert blindly. Some are Codex changes from this session; some appear to be pre-existing Claude/Codex work.

## Safe Commands

Read-only/status:

```bash
npm run robots:verification -w packages/scraper -- --status
```

TIPLLM planning only, no crawl jobs:

```bash
npm run robots:verification -w packages/scraper -- --tipllm-plan --limit=3
```

Dry-run queue plan only:

```bash
npm run robots:verification -w packages/scraper -- --enqueue=details-fast-lane --profile=erik-safe --dry-run
```

Build checks:

```bash
npm run build -w packages/scraper
npm run build -w packages/api
```

## Next Recommended Steps

1. Pull both sync folders from Gitea:
   - `rene/transceiver-db`
   - `rene/llm-gateway`
2. Review dirty worktree before committing code.
3. Decide whether to commit TIP verification + Hot Topics + robot learning code as one or several commits.
4. If running robots, start with TIPLLM planning only.
5. If dispatching crawl work, send heavy profiles to Proxmox/Pi, not Erik.