diff --git a/sync/CURRENT.md b/sync/CURRENT.md index 0cb42de..a6390a2 100644 --- a/sync/CURRENT.md +++ b/sync/CURRENT.md @@ -1,6 +1,6 @@ # Current TIP Sync State -Updated: 2026-04-29 20:25 UTC +Updated: 2026-04-29 20:40 UTC ## Active Policy @@ -27,6 +27,9 @@ When work touches TIP, Magatama, LLM Gateway, bridges, auth, or shared Erik infr ## Latest Work +- Full Codex session handoff was added: + - `sync/history/2026-04-29-codex-full-session-handoff.md` + - covers TIP verification, product image/detail crawling, Blog Engine Hot Topics, TIPLLM robots, training pool, Erik status, and cross-repo sync. - Added a verification robot controller: - `packages/scraper/src/robots/verification-robots.ts` - command: `npm run robots:verification -w packages/scraper -- --status` @@ -95,3 +98,9 @@ npm run robots:verification -w packages/scraper -- --enqueue=details-fast-lane - ## Dirty Worktree Note There are existing uncommitted changes outside `sync/`. Some are Codex work from this session, some appear pre-existing or from earlier Claude/Codex work. Do not blindly revert them. Review `git status --short` before committing broader changes. + +## Latest Sync Commits + +- `6c42ca7 docs: add shared agent sync handoff` +- `8e7c5aa docs: link llm-gateway sync handoff` +- Pending after this update: full Codex session handoff in `sync/history/`. diff --git a/sync/history/2026-04-29-codex-full-session-handoff.md b/sync/history/2026-04-29-codex-full-session-handoff.md new file mode 100644 index 0000000..b65c418 --- /dev/null +++ b/sync/history/2026-04-29-codex-full-session-handoff.md @@ -0,0 +1,363 @@ +# 2026-04-29 Codex Full Session Handoff + +This is the complete Codex-side handoff for the recent TIP work visible in this thread. It is intended for Claude Code, Codex, and laptop sync workflows via Gitea. + +## Scope + +Covered here: + +- TIP product verification and crawler status. +- Product image/details verification fixes. +- Blog Engine Hot Topics fix. +- TIPLLM-only robot planning policy. +- Gitea-backed TIPLLM training pool experience logging. +- Erik operational safety constraints. +- Cross-repo sync with `rene/llm-gateway`. + +Not covered: + +- Chats or history that are not present in this Codex thread and not already written into this repository or sibling `sync/` folders. + +## User Intent + +Rene wants TIP to move toward completion: + +- Product photos must be crawled and verified. +- Product details must be verified. +- Overall product verification must be trustworthy. +- Blog Engine Hot Topics should rotate/use meaningful topics for blog creation. +- Crawler/robot orchestration should use available Proxmox/Pi capacity. +- Erik must not be overloaded by heavy crawlers. +- TIPLLM must be the only AI used for crawler/robot planning and extraction feedback. +- Robot/crawler experiences must become TIPLLM training data in Gitea. +- All agent handoffs should live in `sync/` folders in Gitea so Claude Code/Codex/laptop workflows can continue cleanly. + +## Live TIP Snapshot + +Last checked live on 2026-04-29: + +- API: healthy. +- `tip-api`: online. +- `tip-scraper-daemon`: online. +- Total transceivers: `13,546`. +- Price verified: `7,250`. +- Image verified: `7,025`. +- Details verified: `6,243`. +- Fully verified: `5,812`. +- Last price observation: `2026-04-29 19:15:53 UTC`. +- Last stock observation: `2026-04-29 19:15:56 UTC`. +- Transceiver updates in last 24h at time of check: `5,175`. +- New transceivers in last 24h at time of check: `462`. + +## Verification Blockers + +At the DB blocker check: + +- Missing price: `6,296`. +- Missing image: `6,521`. +- Missing details: `7,303`. +- Near-full but missing details: `797`. +- Near-full but missing image: `237`. +- Near-full but missing price: `43`. + +Top vendor blockers included: + +- Juniper Networks: `464` not fully verified, mostly images. +- GAO Tek: `414`, mostly details. +- FS.COM: `378`, details/images. +- Cisco Systems: `330`, all signals missing. +- Ascent Optics: `305`, all signals missing. +- Eoptolink: `287`, all signals missing. +- ATGBICS: about `250` not fully verified. +- Flexoptix: about `119` details. +- FiberMall: about `72` details. + +Recommended verification strategy: + +1. Details fast lane first, because near-full missing-details rows convert fastest. +2. Then targeted image backfill for large OEMs. +3. Treat OEM price verification separately; many OEM catalog products may not have direct prices. + +## Product Verification Work Completed + +Implemented verification pipeline changes: + +- Product image crawl writes `image_verified`, `image_verified_url`, `image_verified_at`. +- Product detail scrape writes `details_verified`, `details_source_url`, `details_verified_at`. +- Scraped product pages now preserve/backfill `product_page_url`. +- Maintenance reconcile promotes old data into verification flags. +- CLI exposes `--backfill-images`. +- Migration added: + - `sql/102-product-verification-reconcile.sql` + +Important touched paths: + +- `packages/scraper/src/utils/db.ts` +- `packages/scraper/src/utils/backfill-images.ts` +- `packages/scraper/src/utils/image-downloader.ts` +- `packages/scraper/src/utils/spec-updater.ts` +- `packages/scraper/src/index.ts` +- `packages/scraper/src/scheduler.ts` +- `packages/scraper/src/scrapers/atgbics.ts` +- `packages/scraper/src/scrapers/fiber24.ts` +- `packages/scraper/src/scrapers/fibermall.ts` +- `sql/102-product-verification-reconcile.sql` + +Migration result on Erik: + +- Total: `13,084` at that earlier time. +- Image verified: `6,423`. +- Details verified: `6,231`. +- Fully verified: `5,704`. + +Then image backfill ran: + +- GAO Tek: `313` updated, `6` no-image, `95` errors/404s. +- Other vendors: `289 / 309` updated. +- Total new images: `602`. +- Backfill elapsed: about `1369.1s`. + +After restart at that time: + +- Image verified: `7,025`. +- Fully verified: `5,812`. + +## Blog Engine Hot Topics Work Completed + +User reported: + +- Blog Engine Hot Topics always showed the same topics. +- These topics are used to create blog posts. +- More content/context for BlogLLM would help. + +Root causes found: + +- Hot Topics API effectively sorted by `urgency` only, so static `hot/breaking` topics dominated. +- Rotating research/evergreen topics existed but were lower priority and often invisible. +- Dashboard sent `customTitle` / `customAngle`, but API expected `custom_title` / `additional_context`. +- `blog_title_created` badge existed in UI but API did not populate it. + +Implemented: + +- Diversified ranking with urgency, source score, freshness, deterministic jitter and source caps. +- Refresh shuffle via query seed. +- Already-created topics demoted via recent `blog_drafts`. +- API returns: + - `blog_title_created` + - `last_blog_created_at` + - `rank_score` + - `llm_context` +- Dashboard passes: + - `custom_title` + - `additional_context` +- Blog route injects Hot Topic briefing into master-draft context as well as topic expansion. + +Important paths: + +- `packages/api/src/routes/hot-topics.ts` +- `packages/dashboard/hot-topics.js` +- `packages/api/src/routes/blog.ts` + +Verified live: + +- `/api/hot-topics?limit=...&shuffle=...` returns varied ordering. +- `llm_context` is present. +- API remained healthy after restart. + +## TIPLLM Robot Policy + +User explicitly requested: + +- Use TIPLLM only. +- No other AI for this crawler/robot planning lane. +- Write experiences into a Gitea training pool. +- If TIPLLM training pool does not exist, create it. + +Implemented local code: + +- `packages/scraper/src/robots/verification-robots.ts` + - `--status` + - `--tipllm-plan --limit=N` + - `--enqueue=details-fast-lane|priority-vendors|all` + - `--profile=erik-safe|pi-fetch|proxmox-heavy` + - `--dry-run` + - `--max-queues=N` +- `packages/scraper/src/crawler-llm/training-data-writer.ts` + - added `writeRobotExperience`. + - writes raw robot audit rows. + - writes SFT records for TIPLLM. + - removed hardcoded Gitea token fallback. + - uses existing git remote when no `GITEA_TOKEN` env var is set. +- `scripts/tip-learning-pool-build.ts` + - imports `TIP_TRAINING_REPO/qa-pairs/**/*.jsonl` into `tip_llm`. +- `docs/TIP_SELFLEARNING_WORKFLOW.md` + - documented robot experience pool and safety defaults. +- `packages/scraper/package.json` + - added `robots:verification`. + +Safety defaults: + +- Default profile: `erik-safe`. +- `erik-safe` max queues: `3`. +- `erik-safe` excludes heavy Playwright/discovery queues. +- `pi-fetch` excludes heavy/discovery queues. +- `proxmox-heavy` is explicit and intended for heavy crawler work. + +No crawler jobs were started while building this. +No queue waves were enqueued while building this. + +## Gitea TIPLLM Training Pool + +Found local clone: + +- `/tmp/tip-training-data` +- remote: `rene/tip-training-data` + +Erik did not have `/tmp/tip-training-data/.git` at the time of check. + +Wrote first robot experience record locally and pushed to Gitea: + +```text +f1c83f8 crawl: add robot-status training records [2026-04-29T20:11:24.091Z] +``` + +Files in Gitea training pool: + +- `qa-pairs/robot-control-high.jsonl` +- `robot-experiences/2026-04-29.jsonl` + +This record encodes: + +- TIPLLM-only policy. +- Erik controller-only policy. +- Proxmox/Pi heavy worker policy. +- No crawler jobs started. + +## Erik Notes + +Synced robot/training code to `/opt/tip`. + +Did not: + +- start crawler jobs. +- enqueue robot waves. +- restart PM2 services. + +Remote scraper TypeScript build initially failed because of stale misplaced remote-only duplicate files: + +- `/opt/tip/packages/scraper/src/scrapers/scheduler.ts` +- `/opt/tip/packages/scraper/src/vendor-discovery-crawler.ts` + +These did not exist locally and had wrong relative imports. Removed only these duplicates. Remote scraper build passed afterward. + +PM2 status after this: + +- `tip-api`: online. +- `tip-scraper-daemon`: online. + +## Cross-Repo Sync + +Claude Code created a similar sync handoff in `rene/llm-gateway`. + +From user screenshot: + +```text +e272105 sync: add chat handoff + context scaffolding for Codex integration (2026-04-29) +``` + +Gitea path shown: + +```text +http://192.168.178.196:3000/rene/llm-gateway/src/main/sync/ +``` + +Rule: + +When work touches TIP, Magatama, LLM Gateway, bridges, auth, or shared Erik infrastructure, read both: + +- `transceiver-db/sync/CURRENT.md` +- `llm-gateway/sync/CURRENT.md` + +## Sync Folder Work + +Created in this repo: + +- `sync/README.md` +- `sync/CURRENT.md` +- `sync/history/2026-04-29-tipllm-robot-learning.md` +- `sync/history/2026-04-29-cross-repo-sync.md` +- this file. + +Already pushed earlier: + +```text +6c42ca7 docs: add shared agent sync handoff +8e7c5aa docs: link llm-gateway sync handoff +``` + +## Current Dirty Worktree + +As of this handoff, many non-sync files remain modified/untracked: + +- `CHANGELOG_PENDING.md` +- `docs/TIP_SELFLEARNING_WORKFLOW.md` +- `packages/api/src/routes/hot-topics.ts` +- `packages/dashboard/hot-topics.js` +- `packages/mcp-server/src/index.ts` +- `packages/scraper/package.json` +- `packages/scraper/src/crawler-llm/core.ts` +- `packages/scraper/src/crawler-llm/training-data-writer.ts` +- `packages/scraper/src/scrapers/atgbics.ts` +- `packages/scraper/src/scrapers/fiber24.ts` +- `packages/scraper/src/scrapers/fibermall.ts` +- `packages/scraper/src/utils/backfill-images.ts` +- `packages/scraper/src/utils/db.ts` +- `packages/scraper/src/utils/image-downloader.ts` +- `packages/scraper/src/utils/spec-updater.ts` +- `scripts/tip-learning-pool-build.ts` +- `packages/scraper/src/robots/` +- `packages/scraper/src/scrapers/audiocodes-oem.ts` +- `packages/scraper/src/seed-batch35.ts` +- `packages/scraper/src/seed-batch36.ts` +- `packages/scraper/src/seed-batch37.ts` +- `sql/102-product-verification-reconcile.sql` + +Do not revert blindly. Some are Codex changes from this session; some appear to be pre-existing Claude/Codex work. + +## Safe Commands + +Read-only/status: + +```bash +npm run robots:verification -w packages/scraper -- --status +``` + +TIPLLM planning only, no crawl jobs: + +```bash +npm run robots:verification -w packages/scraper -- --tipllm-plan --limit=3 +``` + +Dry-run queue plan only: + +```bash +npm run robots:verification -w packages/scraper -- --enqueue=details-fast-lane --profile=erik-safe --dry-run +``` + +Build checks: + +```bash +npm run build -w packages/scraper +npm run build -w packages/api +``` + +## Next Recommended Steps + +1. Pull both sync folders from Gitea: + - `rene/transceiver-db` + - `rene/llm-gateway` +2. Review dirty worktree before committing code. +3. Decide whether to commit TIP verification + Hot Topics + robot learning code as one or several commits. +4. If running robots, start with TIPLLM planning only. +5. If dispatching crawl work, send heavy profiles to Proxmox/Pi, not Erik. +