transceiver-db/sync/history/2026-04-29-codex-full-session-handoff.md
2026-04-29 22:52:56 +02:00

364 lines
11 KiB
Markdown

# 2026-04-29 Codex Full Session Handoff
This is the complete Codex-side handoff for the recent TIP work visible in this thread. It is intended for Claude Code, Codex, and laptop sync workflows via Gitea.
## Scope
Covered here:
- TIP product verification and crawler status.
- Product image/details verification fixes.
- Blog Engine Hot Topics fix.
- TIPLLM-only robot planning policy.
- Gitea-backed TIPLLM training pool experience logging.
- Erik operational safety constraints.
- Cross-repo sync with `rene/llm-gateway`.
Not covered:
- Chats or history that are not present in this Codex thread and not already written into this repository or sibling `sync/` folders.
## User Intent
Rene wants TIP to move toward completion:
- Product photos must be crawled and verified.
- Product details must be verified.
- Overall product verification must be trustworthy.
- Blog Engine Hot Topics should rotate/use meaningful topics for blog creation.
- Crawler/robot orchestration should use available Proxmox/Pi capacity.
- Erik must not be overloaded by heavy crawlers.
- TIPLLM must be the only AI used for crawler/robot planning and extraction feedback.
- Robot/crawler experiences must become TIPLLM training data in Gitea.
- All agent handoffs should live in `sync/` folders in Gitea so Claude Code/Codex/laptop workflows can continue cleanly.
## Live TIP Snapshot
Last checked live on 2026-04-29:
- API: healthy.
- `tip-api`: online.
- `tip-scraper-daemon`: online.
- Total transceivers: `13,546`.
- Price verified: `7,250`.
- Image verified: `7,025`.
- Details verified: `6,243`.
- Fully verified: `5,812`.
- Last price observation: `2026-04-29 19:15:53 UTC`.
- Last stock observation: `2026-04-29 19:15:56 UTC`.
- Transceiver updates in last 24h at time of check: `5,175`.
- New transceivers in last 24h at time of check: `462`.
## Verification Blockers
At the DB blocker check:
- Missing price: `6,296`.
- Missing image: `6,521`.
- Missing details: `7,303`.
- Near-full but missing details: `797`.
- Near-full but missing image: `237`.
- Near-full but missing price: `43`.
Top vendor blockers included:
- Juniper Networks: `464` not fully verified, mostly images.
- GAO Tek: `414`, mostly details.
- FS.COM: `378`, details/images.
- Cisco Systems: `330`, all signals missing.
- Ascent Optics: `305`, all signals missing.
- Eoptolink: `287`, all signals missing.
- ATGBICS: about `250` not fully verified.
- Flexoptix: about `119` details.
- FiberMall: about `72` details.
Recommended verification strategy:
1. Details fast lane first, because near-full missing-details rows convert fastest.
2. Then targeted image backfill for large OEMs.
3. Treat OEM price verification separately; many OEM catalog products may not have direct prices.
## Product Verification Work Completed
Implemented verification pipeline changes:
- Product image crawl writes `image_verified`, `image_verified_url`, `image_verified_at`.
- Product detail scrape writes `details_verified`, `details_source_url`, `details_verified_at`.
- Scraped product pages now preserve/backfill `product_page_url`.
- Maintenance reconcile promotes old data into verification flags.
- CLI exposes `--backfill-images`.
- Migration added:
- `sql/102-product-verification-reconcile.sql`
Important touched paths:
- `packages/scraper/src/utils/db.ts`
- `packages/scraper/src/utils/backfill-images.ts`
- `packages/scraper/src/utils/image-downloader.ts`
- `packages/scraper/src/utils/spec-updater.ts`
- `packages/scraper/src/index.ts`
- `packages/scraper/src/scheduler.ts`
- `packages/scraper/src/scrapers/atgbics.ts`
- `packages/scraper/src/scrapers/fiber24.ts`
- `packages/scraper/src/scrapers/fibermall.ts`
- `sql/102-product-verification-reconcile.sql`
Migration result on Erik:
- Total: `13,084` at that earlier time.
- Image verified: `6,423`.
- Details verified: `6,231`.
- Fully verified: `5,704`.
Then image backfill ran:
- GAO Tek: `313` updated, `6` no-image, `95` errors/404s.
- Other vendors: `289 / 309` updated.
- Total new images: `602`.
- Backfill elapsed: about `1369.1s`.
After restart at that time:
- Image verified: `7,025`.
- Fully verified: `5,812`.
## Blog Engine Hot Topics Work Completed
User reported:
- Blog Engine Hot Topics always showed the same topics.
- These topics are used to create blog posts.
- More content/context for BlogLLM would help.
Root causes found:
- Hot Topics API effectively sorted by `urgency` only, so static `hot/breaking` topics dominated.
- Rotating research/evergreen topics existed but were lower priority and often invisible.
- Dashboard sent `customTitle` / `customAngle`, but API expected `custom_title` / `additional_context`.
- `blog_title_created` badge existed in UI but API did not populate it.
Implemented:
- Diversified ranking with urgency, source score, freshness, deterministic jitter and source caps.
- Refresh shuffle via query seed.
- Already-created topics demoted via recent `blog_drafts`.
- API returns:
- `blog_title_created`
- `last_blog_created_at`
- `rank_score`
- `llm_context`
- Dashboard passes:
- `custom_title`
- `additional_context`
- Blog route injects Hot Topic briefing into master-draft context as well as topic expansion.
Important paths:
- `packages/api/src/routes/hot-topics.ts`
- `packages/dashboard/hot-topics.js`
- `packages/api/src/routes/blog.ts`
Verified live:
- `/api/hot-topics?limit=...&shuffle=...` returns varied ordering.
- `llm_context` is present.
- API remained healthy after restart.
## TIPLLM Robot Policy
User explicitly requested:
- Use TIPLLM only.
- No other AI for this crawler/robot planning lane.
- Write experiences into a Gitea training pool.
- If TIPLLM training pool does not exist, create it.
Implemented local code:
- `packages/scraper/src/robots/verification-robots.ts`
- `--status`
- `--tipllm-plan --limit=N`
- `--enqueue=details-fast-lane|priority-vendors|all`
- `--profile=erik-safe|pi-fetch|proxmox-heavy`
- `--dry-run`
- `--max-queues=N`
- `packages/scraper/src/crawler-llm/training-data-writer.ts`
- added `writeRobotExperience`.
- writes raw robot audit rows.
- writes SFT records for TIPLLM.
- removed hardcoded Gitea token fallback.
- uses existing git remote when no `GITEA_TOKEN` env var is set.
- `scripts/tip-learning-pool-build.ts`
- imports `TIP_TRAINING_REPO/qa-pairs/**/*.jsonl` into `tip_llm`.
- `docs/TIP_SELFLEARNING_WORKFLOW.md`
- documented robot experience pool and safety defaults.
- `packages/scraper/package.json`
- added `robots:verification`.
Safety defaults:
- Default profile: `erik-safe`.
- `erik-safe` max queues: `3`.
- `erik-safe` excludes heavy Playwright/discovery queues.
- `pi-fetch` excludes heavy/discovery queues.
- `proxmox-heavy` is explicit and intended for heavy crawler work.
No crawler jobs were started while building this.
No queue waves were enqueued while building this.
## Gitea TIPLLM Training Pool
Found local clone:
- `/tmp/tip-training-data`
- remote: `rene/tip-training-data`
Erik did not have `/tmp/tip-training-data/.git` at the time of check.
Wrote first robot experience record locally and pushed to Gitea:
```text
f1c83f8 crawl: add robot-status training records [2026-04-29T20:11:24.091Z]
```
Files in Gitea training pool:
- `qa-pairs/robot-control-high.jsonl`
- `robot-experiences/2026-04-29.jsonl`
This record encodes:
- TIPLLM-only policy.
- Erik controller-only policy.
- Proxmox/Pi heavy worker policy.
- No crawler jobs started.
## Erik Notes
Synced robot/training code to `/opt/tip`.
Did not:
- start crawler jobs.
- enqueue robot waves.
- restart PM2 services.
Remote scraper TypeScript build initially failed because of stale misplaced remote-only duplicate files:
- `/opt/tip/packages/scraper/src/scrapers/scheduler.ts`
- `/opt/tip/packages/scraper/src/vendor-discovery-crawler.ts`
These did not exist locally and had wrong relative imports. Removed only these duplicates. Remote scraper build passed afterward.
PM2 status after this:
- `tip-api`: online.
- `tip-scraper-daemon`: online.
## Cross-Repo Sync
Claude Code created a similar sync handoff in `rene/llm-gateway`.
From user screenshot:
```text
e272105 sync: add chat handoff + context scaffolding for Codex integration (2026-04-29)
```
Gitea path shown:
```text
http://192.168.178.196:3000/rene/llm-gateway/src/main/sync/
```
Rule:
When work touches TIP, Magatama, LLM Gateway, bridges, auth, or shared Erik infrastructure, read both:
- `transceiver-db/sync/CURRENT.md`
- `llm-gateway/sync/CURRENT.md`
## Sync Folder Work
Created in this repo:
- `sync/README.md`
- `sync/CURRENT.md`
- `sync/history/2026-04-29-tipllm-robot-learning.md`
- `sync/history/2026-04-29-cross-repo-sync.md`
- this file.
Already pushed earlier:
```text
6c42ca7 docs: add shared agent sync handoff
8e7c5aa docs: link llm-gateway sync handoff
```
## Current Dirty Worktree
As of this handoff, many non-sync files remain modified/untracked:
- `CHANGELOG_PENDING.md`
- `docs/TIP_SELFLEARNING_WORKFLOW.md`
- `packages/api/src/routes/hot-topics.ts`
- `packages/dashboard/hot-topics.js`
- `packages/mcp-server/src/index.ts`
- `packages/scraper/package.json`
- `packages/scraper/src/crawler-llm/core.ts`
- `packages/scraper/src/crawler-llm/training-data-writer.ts`
- `packages/scraper/src/scrapers/atgbics.ts`
- `packages/scraper/src/scrapers/fiber24.ts`
- `packages/scraper/src/scrapers/fibermall.ts`
- `packages/scraper/src/utils/backfill-images.ts`
- `packages/scraper/src/utils/db.ts`
- `packages/scraper/src/utils/image-downloader.ts`
- `packages/scraper/src/utils/spec-updater.ts`
- `scripts/tip-learning-pool-build.ts`
- `packages/scraper/src/robots/`
- `packages/scraper/src/scrapers/audiocodes-oem.ts`
- `packages/scraper/src/seed-batch35.ts`
- `packages/scraper/src/seed-batch36.ts`
- `packages/scraper/src/seed-batch37.ts`
- `sql/102-product-verification-reconcile.sql`
Do not revert blindly. Some are Codex changes from this session; some appear to be pre-existing Claude/Codex work.
## Safe Commands
Read-only/status:
```bash
npm run robots:verification -w packages/scraper -- --status
```
TIPLLM planning only, no crawl jobs:
```bash
npm run robots:verification -w packages/scraper -- --tipllm-plan --limit=3
```
Dry-run queue plan only:
```bash
npm run robots:verification -w packages/scraper -- --enqueue=details-fast-lane --profile=erik-safe --dry-run
```
Build checks:
```bash
npm run build -w packages/scraper
npm run build -w packages/api
```
## Next Recommended Steps
1. Pull both sync folders from Gitea:
- `rene/transceiver-db`
- `rene/llm-gateway`
2. Review dirty worktree before committing code.
3. Decide whether to commit TIP verification + Hot Topics + robot learning code as one or several commits.
4. If running robots, start with TIPLLM planning only.
5. If dispatching crawl work, send heavy profiles to Proxmox/Pi, not Erik.