docs: add full codex session sync handoff
This commit is contained in:
parent
8e7c5aa6fd
commit
e4d4a0c43e
@ -1,6 +1,6 @@
|
||||
# Current TIP Sync State
|
||||
|
||||
Updated: 2026-04-29 20:25 UTC
|
||||
Updated: 2026-04-29 20:40 UTC
|
||||
|
||||
## Active Policy
|
||||
|
||||
@ -27,6 +27,9 @@ When work touches TIP, Magatama, LLM Gateway, bridges, auth, or shared Erik infr
|
||||
|
||||
## Latest Work
|
||||
|
||||
- Full Codex session handoff was added:
|
||||
- `sync/history/2026-04-29-codex-full-session-handoff.md`
|
||||
- covers TIP verification, product image/detail crawling, Blog Engine Hot Topics, TIPLLM robots, training pool, Erik status, and cross-repo sync.
|
||||
- Added a verification robot controller:
|
||||
- `packages/scraper/src/robots/verification-robots.ts`
|
||||
- command: `npm run robots:verification -w packages/scraper -- --status`
|
||||
@ -95,3 +98,9 @@ npm run robots:verification -w packages/scraper -- --enqueue=details-fast-lane -
|
||||
## Dirty Worktree Note
|
||||
|
||||
There are existing uncommitted changes outside `sync/`. Some are Codex work from this session, some appear pre-existing or from earlier Claude/Codex work. Do not blindly revert them. Review `git status --short` before committing broader changes.
|
||||
|
||||
## Latest Sync Commits
|
||||
|
||||
- `6c42ca7 docs: add shared agent sync handoff`
|
||||
- `8e7c5aa docs: link llm-gateway sync handoff`
|
||||
- Pending after this update: full Codex session handoff in `sync/history/`.
|
||||
|
||||
363
sync/history/2026-04-29-codex-full-session-handoff.md
Normal file
363
sync/history/2026-04-29-codex-full-session-handoff.md
Normal file
@ -0,0 +1,363 @@
|
||||
# 2026-04-29 Codex Full Session Handoff
|
||||
|
||||
This is the complete Codex-side handoff for the recent TIP work visible in this thread. It is intended for Claude Code, Codex, and laptop sync workflows via Gitea.
|
||||
|
||||
## Scope
|
||||
|
||||
Covered here:
|
||||
|
||||
- TIP product verification and crawler status.
|
||||
- Product image/details verification fixes.
|
||||
- Blog Engine Hot Topics fix.
|
||||
- TIPLLM-only robot planning policy.
|
||||
- Gitea-backed TIPLLM training pool experience logging.
|
||||
- Erik operational safety constraints.
|
||||
- Cross-repo sync with `rene/llm-gateway`.
|
||||
|
||||
Not covered:
|
||||
|
||||
- Chats or history that are not present in this Codex thread and not already written into this repository or sibling `sync/` folders.
|
||||
|
||||
## User Intent
|
||||
|
||||
Rene wants TIP to move toward completion:
|
||||
|
||||
- Product photos must be crawled and verified.
|
||||
- Product details must be verified.
|
||||
- Overall product verification must be trustworthy.
|
||||
- Blog Engine Hot Topics should rotate/use meaningful topics for blog creation.
|
||||
- Crawler/robot orchestration should use available Proxmox/Pi capacity.
|
||||
- Erik must not be overloaded by heavy crawlers.
|
||||
- TIPLLM must be the only AI used for crawler/robot planning and extraction feedback.
|
||||
- Robot/crawler experiences must become TIPLLM training data in Gitea.
|
||||
- All agent handoffs should live in `sync/` folders in Gitea so Claude Code/Codex/laptop workflows can continue cleanly.
|
||||
|
||||
## Live TIP Snapshot
|
||||
|
||||
Last checked live on 2026-04-29:
|
||||
|
||||
- API: healthy.
|
||||
- `tip-api`: online.
|
||||
- `tip-scraper-daemon`: online.
|
||||
- Total transceivers: `13,546`.
|
||||
- Price verified: `7,250`.
|
||||
- Image verified: `7,025`.
|
||||
- Details verified: `6,243`.
|
||||
- Fully verified: `5,812`.
|
||||
- Last price observation: `2026-04-29 19:15:53 UTC`.
|
||||
- Last stock observation: `2026-04-29 19:15:56 UTC`.
|
||||
- Transceiver updates in last 24h at time of check: `5,175`.
|
||||
- New transceivers in last 24h at time of check: `462`.
|
||||
|
||||
## Verification Blockers
|
||||
|
||||
At the DB blocker check:
|
||||
|
||||
- Missing price: `6,296`.
|
||||
- Missing image: `6,521`.
|
||||
- Missing details: `7,303`.
|
||||
- Near-full but missing details: `797`.
|
||||
- Near-full but missing image: `237`.
|
||||
- Near-full but missing price: `43`.
|
||||
|
||||
Top vendor blockers included:
|
||||
|
||||
- Juniper Networks: `464` not fully verified, mostly images.
|
||||
- GAO Tek: `414`, mostly details.
|
||||
- FS.COM: `378`, details/images.
|
||||
- Cisco Systems: `330`, all signals missing.
|
||||
- Ascent Optics: `305`, all signals missing.
|
||||
- Eoptolink: `287`, all signals missing.
|
||||
- ATGBICS: about `250` not fully verified.
|
||||
- Flexoptix: about `119` details.
|
||||
- FiberMall: about `72` details.
|
||||
|
||||
Recommended verification strategy:
|
||||
|
||||
1. Details fast lane first, because near-full missing-details rows convert fastest.
|
||||
2. Then targeted image backfill for large OEMs.
|
||||
3. Treat OEM price verification separately; many OEM catalog products may not have direct prices.
|
||||
|
||||
## Product Verification Work Completed
|
||||
|
||||
Implemented verification pipeline changes:
|
||||
|
||||
- Product image crawl writes `image_verified`, `image_verified_url`, `image_verified_at`.
|
||||
- Product detail scrape writes `details_verified`, `details_source_url`, `details_verified_at`.
|
||||
- Scraped product pages now preserve/backfill `product_page_url`.
|
||||
- Maintenance reconcile promotes old data into verification flags.
|
||||
- CLI exposes `--backfill-images`.
|
||||
- Migration added:
|
||||
- `sql/102-product-verification-reconcile.sql`
|
||||
|
||||
Important touched paths:
|
||||
|
||||
- `packages/scraper/src/utils/db.ts`
|
||||
- `packages/scraper/src/utils/backfill-images.ts`
|
||||
- `packages/scraper/src/utils/image-downloader.ts`
|
||||
- `packages/scraper/src/utils/spec-updater.ts`
|
||||
- `packages/scraper/src/index.ts`
|
||||
- `packages/scraper/src/scheduler.ts`
|
||||
- `packages/scraper/src/scrapers/atgbics.ts`
|
||||
- `packages/scraper/src/scrapers/fiber24.ts`
|
||||
- `packages/scraper/src/scrapers/fibermall.ts`
|
||||
- `sql/102-product-verification-reconcile.sql`
|
||||
|
||||
Migration result on Erik:
|
||||
|
||||
- Total: `13,084` at that earlier time.
|
||||
- Image verified: `6,423`.
|
||||
- Details verified: `6,231`.
|
||||
- Fully verified: `5,704`.
|
||||
|
||||
Then image backfill ran:
|
||||
|
||||
- GAO Tek: `313` updated, `6` no-image, `95` errors/404s.
|
||||
- Other vendors: `289 / 309` updated.
|
||||
- Total new images: `602`.
|
||||
- Backfill elapsed: about `1369.1s`.
|
||||
|
||||
After restart at that time:
|
||||
|
||||
- Image verified: `7,025`.
|
||||
- Fully verified: `5,812`.
|
||||
|
||||
## Blog Engine Hot Topics Work Completed
|
||||
|
||||
User reported:
|
||||
|
||||
- Blog Engine Hot Topics always showed the same topics.
|
||||
- These topics are used to create blog posts.
|
||||
- More content/context for BlogLLM would help.
|
||||
|
||||
Root causes found:
|
||||
|
||||
- Hot Topics API effectively sorted by `urgency` only, so static `hot/breaking` topics dominated.
|
||||
- Rotating research/evergreen topics existed but were lower priority and often invisible.
|
||||
- Dashboard sent `customTitle` / `customAngle`, but API expected `custom_title` / `additional_context`.
|
||||
- `blog_title_created` badge existed in UI but API did not populate it.
|
||||
|
||||
Implemented:
|
||||
|
||||
- Diversified ranking with urgency, source score, freshness, deterministic jitter and source caps.
|
||||
- Refresh shuffle via query seed.
|
||||
- Already-created topics demoted via recent `blog_drafts`.
|
||||
- API returns:
|
||||
- `blog_title_created`
|
||||
- `last_blog_created_at`
|
||||
- `rank_score`
|
||||
- `llm_context`
|
||||
- Dashboard passes:
|
||||
- `custom_title`
|
||||
- `additional_context`
|
||||
- Blog route injects Hot Topic briefing into master-draft context as well as topic expansion.
|
||||
|
||||
Important paths:
|
||||
|
||||
- `packages/api/src/routes/hot-topics.ts`
|
||||
- `packages/dashboard/hot-topics.js`
|
||||
- `packages/api/src/routes/blog.ts`
|
||||
|
||||
Verified live:
|
||||
|
||||
- `/api/hot-topics?limit=...&shuffle=...` returns varied ordering.
|
||||
- `llm_context` is present.
|
||||
- API remained healthy after restart.
|
||||
|
||||
## TIPLLM Robot Policy
|
||||
|
||||
User explicitly requested:
|
||||
|
||||
- Use TIPLLM only.
|
||||
- No other AI for this crawler/robot planning lane.
|
||||
- Write experiences into a Gitea training pool.
|
||||
- If TIPLLM training pool does not exist, create it.
|
||||
|
||||
Implemented local code:
|
||||
|
||||
- `packages/scraper/src/robots/verification-robots.ts`
|
||||
- `--status`
|
||||
- `--tipllm-plan --limit=N`
|
||||
- `--enqueue=details-fast-lane|priority-vendors|all`
|
||||
- `--profile=erik-safe|pi-fetch|proxmox-heavy`
|
||||
- `--dry-run`
|
||||
- `--max-queues=N`
|
||||
- `packages/scraper/src/crawler-llm/training-data-writer.ts`
|
||||
- added `writeRobotExperience`.
|
||||
- writes raw robot audit rows.
|
||||
- writes SFT records for TIPLLM.
|
||||
- removed hardcoded Gitea token fallback.
|
||||
- uses existing git remote when no `GITEA_TOKEN` env var is set.
|
||||
- `scripts/tip-learning-pool-build.ts`
|
||||
- imports `TIP_TRAINING_REPO/qa-pairs/**/*.jsonl` into `tip_llm`.
|
||||
- `docs/TIP_SELFLEARNING_WORKFLOW.md`
|
||||
- documented robot experience pool and safety defaults.
|
||||
- `packages/scraper/package.json`
|
||||
- added `robots:verification`.
|
||||
|
||||
Safety defaults:
|
||||
|
||||
- Default profile: `erik-safe`.
|
||||
- `erik-safe` max queues: `3`.
|
||||
- `erik-safe` excludes heavy Playwright/discovery queues.
|
||||
- `pi-fetch` excludes heavy/discovery queues.
|
||||
- `proxmox-heavy` is explicit and intended for heavy crawler work.
|
||||
|
||||
No crawler jobs were started while building this.
|
||||
No queue waves were enqueued while building this.
|
||||
|
||||
## Gitea TIPLLM Training Pool
|
||||
|
||||
Found local clone:
|
||||
|
||||
- `/tmp/tip-training-data`
|
||||
- remote: `rene/tip-training-data`
|
||||
|
||||
Erik did not have `/tmp/tip-training-data/.git` at the time of check.
|
||||
|
||||
Wrote first robot experience record locally and pushed to Gitea:
|
||||
|
||||
```text
|
||||
f1c83f8 crawl: add robot-status training records [2026-04-29T20:11:24.091Z]
|
||||
```
|
||||
|
||||
Files in Gitea training pool:
|
||||
|
||||
- `qa-pairs/robot-control-high.jsonl`
|
||||
- `robot-experiences/2026-04-29.jsonl`
|
||||
|
||||
This record encodes:
|
||||
|
||||
- TIPLLM-only policy.
|
||||
- Erik controller-only policy.
|
||||
- Proxmox/Pi heavy worker policy.
|
||||
- No crawler jobs started.
|
||||
|
||||
## Erik Notes
|
||||
|
||||
Synced robot/training code to `/opt/tip`.
|
||||
|
||||
Did not:
|
||||
|
||||
- start crawler jobs.
|
||||
- enqueue robot waves.
|
||||
- restart PM2 services.
|
||||
|
||||
Remote scraper TypeScript build initially failed because of stale misplaced remote-only duplicate files:
|
||||
|
||||
- `/opt/tip/packages/scraper/src/scrapers/scheduler.ts`
|
||||
- `/opt/tip/packages/scraper/src/vendor-discovery-crawler.ts`
|
||||
|
||||
These did not exist locally and had wrong relative imports. Removed only these duplicates. Remote scraper build passed afterward.
|
||||
|
||||
PM2 status after this:
|
||||
|
||||
- `tip-api`: online.
|
||||
- `tip-scraper-daemon`: online.
|
||||
|
||||
## Cross-Repo Sync
|
||||
|
||||
Claude Code created a similar sync handoff in `rene/llm-gateway`.
|
||||
|
||||
From user screenshot:
|
||||
|
||||
```text
|
||||
e272105 sync: add chat handoff + context scaffolding for Codex integration (2026-04-29)
|
||||
```
|
||||
|
||||
Gitea path shown:
|
||||
|
||||
```text
|
||||
http://192.168.178.196:3000/rene/llm-gateway/src/main/sync/
|
||||
```
|
||||
|
||||
Rule:
|
||||
|
||||
When work touches TIP, Magatama, LLM Gateway, bridges, auth, or shared Erik infrastructure, read both:
|
||||
|
||||
- `transceiver-db/sync/CURRENT.md`
|
||||
- `llm-gateway/sync/CURRENT.md`
|
||||
|
||||
## Sync Folder Work
|
||||
|
||||
Created in this repo:
|
||||
|
||||
- `sync/README.md`
|
||||
- `sync/CURRENT.md`
|
||||
- `sync/history/2026-04-29-tipllm-robot-learning.md`
|
||||
- `sync/history/2026-04-29-cross-repo-sync.md`
|
||||
- this file.
|
||||
|
||||
Already pushed earlier:
|
||||
|
||||
```text
|
||||
6c42ca7 docs: add shared agent sync handoff
|
||||
8e7c5aa docs: link llm-gateway sync handoff
|
||||
```
|
||||
|
||||
## Current Dirty Worktree
|
||||
|
||||
As of this handoff, many non-sync files remain modified/untracked:
|
||||
|
||||
- `CHANGELOG_PENDING.md`
|
||||
- `docs/TIP_SELFLEARNING_WORKFLOW.md`
|
||||
- `packages/api/src/routes/hot-topics.ts`
|
||||
- `packages/dashboard/hot-topics.js`
|
||||
- `packages/mcp-server/src/index.ts`
|
||||
- `packages/scraper/package.json`
|
||||
- `packages/scraper/src/crawler-llm/core.ts`
|
||||
- `packages/scraper/src/crawler-llm/training-data-writer.ts`
|
||||
- `packages/scraper/src/scrapers/atgbics.ts`
|
||||
- `packages/scraper/src/scrapers/fiber24.ts`
|
||||
- `packages/scraper/src/scrapers/fibermall.ts`
|
||||
- `packages/scraper/src/utils/backfill-images.ts`
|
||||
- `packages/scraper/src/utils/db.ts`
|
||||
- `packages/scraper/src/utils/image-downloader.ts`
|
||||
- `packages/scraper/src/utils/spec-updater.ts`
|
||||
- `scripts/tip-learning-pool-build.ts`
|
||||
- `packages/scraper/src/robots/`
|
||||
- `packages/scraper/src/scrapers/audiocodes-oem.ts`
|
||||
- `packages/scraper/src/seed-batch35.ts`
|
||||
- `packages/scraper/src/seed-batch36.ts`
|
||||
- `packages/scraper/src/seed-batch37.ts`
|
||||
- `sql/102-product-verification-reconcile.sql`
|
||||
|
||||
Do not revert blindly. Some are Codex changes from this session; some appear to be pre-existing Claude/Codex work.
|
||||
|
||||
## Safe Commands
|
||||
|
||||
Read-only/status:
|
||||
|
||||
```bash
|
||||
npm run robots:verification -w packages/scraper -- --status
|
||||
```
|
||||
|
||||
TIPLLM planning only, no crawl jobs:
|
||||
|
||||
```bash
|
||||
npm run robots:verification -w packages/scraper -- --tipllm-plan --limit=3
|
||||
```
|
||||
|
||||
Dry-run queue plan only:
|
||||
|
||||
```bash
|
||||
npm run robots:verification -w packages/scraper -- --enqueue=details-fast-lane --profile=erik-safe --dry-run
|
||||
```
|
||||
|
||||
Build checks:
|
||||
|
||||
```bash
|
||||
npm run build -w packages/scraper
|
||||
npm run build -w packages/api
|
||||
```
|
||||
|
||||
## Next Recommended Steps
|
||||
|
||||
1. Pull both sync folders from Gitea:
|
||||
- `rene/transceiver-db`
|
||||
- `rene/llm-gateway`
|
||||
2. Review dirty worktree before committing code.
|
||||
3. Decide whether to commit TIP verification + Hot Topics + robot learning code as one or several commits.
|
||||
4. If running robots, start with TIPLLM planning only.
|
||||
5. If dispatching crawl work, send heavy profiles to Proxmox/Pi, not Erik.
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user