364 lines
11 KiB
Markdown
364 lines
11 KiB
Markdown
# 2026-04-29 Codex Full Session Handoff
|
|
|
|
This is the complete Codex-side handoff for the recent TIP work visible in this thread. It is intended for Claude Code, Codex, and laptop sync workflows via Gitea.
|
|
|
|
## Scope
|
|
|
|
Covered here:
|
|
|
|
- TIP product verification and crawler status.
|
|
- Product image/details verification fixes.
|
|
- Blog Engine Hot Topics fix.
|
|
- TIPLLM-only robot planning policy.
|
|
- Gitea-backed TIPLLM training pool experience logging.
|
|
- Erik operational safety constraints.
|
|
- Cross-repo sync with `rene/llm-gateway`.
|
|
|
|
Not covered:
|
|
|
|
- Chats or history that are not present in this Codex thread and not already written into this repository or sibling `sync/` folders.
|
|
|
|
## User Intent
|
|
|
|
Rene wants TIP to move toward completion:
|
|
|
|
- Product photos must be crawled and verified.
|
|
- Product details must be verified.
|
|
- Overall product verification must be trustworthy.
|
|
- Blog Engine Hot Topics should rotate/use meaningful topics for blog creation.
|
|
- Crawler/robot orchestration should use available Proxmox/Pi capacity.
|
|
- Erik must not be overloaded by heavy crawlers.
|
|
- TIPLLM must be the only AI used for crawler/robot planning and extraction feedback.
|
|
- Robot/crawler experiences must become TIPLLM training data in Gitea.
|
|
- All agent handoffs should live in `sync/` folders in Gitea so Claude Code/Codex/laptop workflows can continue cleanly.
|
|
|
|
## Live TIP Snapshot
|
|
|
|
Last checked live on 2026-04-29:
|
|
|
|
- API: healthy.
|
|
- `tip-api`: online.
|
|
- `tip-scraper-daemon`: online.
|
|
- Total transceivers: `13,546`.
|
|
- Price verified: `7,250`.
|
|
- Image verified: `7,025`.
|
|
- Details verified: `6,243`.
|
|
- Fully verified: `5,812`.
|
|
- Last price observation: `2026-04-29 19:15:53 UTC`.
|
|
- Last stock observation: `2026-04-29 19:15:56 UTC`.
|
|
- Transceiver updates in last 24h at time of check: `5,175`.
|
|
- New transceivers in last 24h at time of check: `462`.
|
|
|
|
## Verification Blockers
|
|
|
|
At the DB blocker check:
|
|
|
|
- Missing price: `6,296`.
|
|
- Missing image: `6,521`.
|
|
- Missing details: `7,303`.
|
|
- Near-full but missing details: `797`.
|
|
- Near-full but missing image: `237`.
|
|
- Near-full but missing price: `43`.
|
|
|
|
Top vendor blockers included:
|
|
|
|
- Juniper Networks: `464` not fully verified, mostly images.
|
|
- GAO Tek: `414`, mostly details.
|
|
- FS.COM: `378`, details/images.
|
|
- Cisco Systems: `330`, all signals missing.
|
|
- Ascent Optics: `305`, all signals missing.
|
|
- Eoptolink: `287`, all signals missing.
|
|
- ATGBICS: about `250` not fully verified.
|
|
- Flexoptix: about `119` details.
|
|
- FiberMall: about `72` details.
|
|
|
|
Recommended verification strategy:
|
|
|
|
1. Details fast lane first, because near-full missing-details rows convert fastest.
|
|
2. Then targeted image backfill for large OEMs.
|
|
3. Treat OEM price verification separately; many OEM catalog products may not have direct prices.
|
|
|
|
## Product Verification Work Completed
|
|
|
|
Implemented verification pipeline changes:
|
|
|
|
- Product image crawl writes `image_verified`, `image_verified_url`, `image_verified_at`.
|
|
- Product detail scrape writes `details_verified`, `details_source_url`, `details_verified_at`.
|
|
- Scraped product pages now preserve/backfill `product_page_url`.
|
|
- Maintenance reconcile promotes old data into verification flags.
|
|
- CLI exposes `--backfill-images`.
|
|
- Migration added:
|
|
- `sql/102-product-verification-reconcile.sql`
|
|
|
|
Important touched paths:
|
|
|
|
- `packages/scraper/src/utils/db.ts`
|
|
- `packages/scraper/src/utils/backfill-images.ts`
|
|
- `packages/scraper/src/utils/image-downloader.ts`
|
|
- `packages/scraper/src/utils/spec-updater.ts`
|
|
- `packages/scraper/src/index.ts`
|
|
- `packages/scraper/src/scheduler.ts`
|
|
- `packages/scraper/src/scrapers/atgbics.ts`
|
|
- `packages/scraper/src/scrapers/fiber24.ts`
|
|
- `packages/scraper/src/scrapers/fibermall.ts`
|
|
- `sql/102-product-verification-reconcile.sql`
|
|
|
|
Migration result on Erik:
|
|
|
|
- Total: `13,084` at that earlier time.
|
|
- Image verified: `6,423`.
|
|
- Details verified: `6,231`.
|
|
- Fully verified: `5,704`.
|
|
|
|
Then image backfill ran:
|
|
|
|
- GAO Tek: `313` updated, `6` no-image, `95` errors/404s.
|
|
- Other vendors: `289 / 309` updated.
|
|
- Total new images: `602`.
|
|
- Backfill elapsed: about `1369.1s`.
|
|
|
|
After restart at that time:
|
|
|
|
- Image verified: `7,025`.
|
|
- Fully verified: `5,812`.
|
|
|
|
## Blog Engine Hot Topics Work Completed
|
|
|
|
User reported:
|
|
|
|
- Blog Engine Hot Topics always showed the same topics.
|
|
- These topics are used to create blog posts.
|
|
- More content/context for BlogLLM would help.
|
|
|
|
Root causes found:
|
|
|
|
- Hot Topics API effectively sorted by `urgency` only, so static `hot/breaking` topics dominated.
|
|
- Rotating research/evergreen topics existed but were lower priority and often invisible.
|
|
- Dashboard sent `customTitle` / `customAngle`, but API expected `custom_title` / `additional_context`.
|
|
- `blog_title_created` badge existed in UI but API did not populate it.
|
|
|
|
Implemented:
|
|
|
|
- Diversified ranking with urgency, source score, freshness, deterministic jitter and source caps.
|
|
- Refresh shuffle via query seed.
|
|
- Already-created topics demoted via recent `blog_drafts`.
|
|
- API returns:
|
|
- `blog_title_created`
|
|
- `last_blog_created_at`
|
|
- `rank_score`
|
|
- `llm_context`
|
|
- Dashboard passes:
|
|
- `custom_title`
|
|
- `additional_context`
|
|
- Blog route injects Hot Topic briefing into master-draft context as well as topic expansion.
|
|
|
|
Important paths:
|
|
|
|
- `packages/api/src/routes/hot-topics.ts`
|
|
- `packages/dashboard/hot-topics.js`
|
|
- `packages/api/src/routes/blog.ts`
|
|
|
|
Verified live:
|
|
|
|
- `/api/hot-topics?limit=...&shuffle=...` returns varied ordering.
|
|
- `llm_context` is present.
|
|
- API remained healthy after restart.
|
|
|
|
## TIPLLM Robot Policy
|
|
|
|
User explicitly requested:
|
|
|
|
- Use TIPLLM only.
|
|
- No other AI for this crawler/robot planning lane.
|
|
- Write experiences into a Gitea training pool.
|
|
- If TIPLLM training pool does not exist, create it.
|
|
|
|
Implemented local code:
|
|
|
|
- `packages/scraper/src/robots/verification-robots.ts`
|
|
- `--status`
|
|
- `--tipllm-plan --limit=N`
|
|
- `--enqueue=details-fast-lane|priority-vendors|all`
|
|
- `--profile=erik-safe|pi-fetch|proxmox-heavy`
|
|
- `--dry-run`
|
|
- `--max-queues=N`
|
|
- `packages/scraper/src/crawler-llm/training-data-writer.ts`
|
|
- added `writeRobotExperience`.
|
|
- writes raw robot audit rows.
|
|
- writes SFT records for TIPLLM.
|
|
- removed hardcoded Gitea token fallback.
|
|
- uses existing git remote when no `GITEA_TOKEN` env var is set.
|
|
- `scripts/tip-learning-pool-build.ts`
|
|
- imports `TIP_TRAINING_REPO/qa-pairs/**/*.jsonl` into `tip_llm`.
|
|
- `docs/TIP_SELFLEARNING_WORKFLOW.md`
|
|
- documented robot experience pool and safety defaults.
|
|
- `packages/scraper/package.json`
|
|
- added `robots:verification`.
|
|
|
|
Safety defaults:
|
|
|
|
- Default profile: `erik-safe`.
|
|
- `erik-safe` max queues: `3`.
|
|
- `erik-safe` excludes heavy Playwright/discovery queues.
|
|
- `pi-fetch` excludes heavy/discovery queues.
|
|
- `proxmox-heavy` is explicit and intended for heavy crawler work.
|
|
|
|
No crawler jobs were started while building this.
|
|
No queue waves were enqueued while building this.
|
|
|
|
## Gitea TIPLLM Training Pool
|
|
|
|
Found local clone:
|
|
|
|
- `/tmp/tip-training-data`
|
|
- remote: `rene/tip-training-data`
|
|
|
|
Erik did not have `/tmp/tip-training-data/.git` at the time of check.
|
|
|
|
Wrote first robot experience record locally and pushed to Gitea:
|
|
|
|
```text
|
|
f1c83f8 crawl: add robot-status training records [2026-04-29T20:11:24.091Z]
|
|
```
|
|
|
|
Files in Gitea training pool:
|
|
|
|
- `qa-pairs/robot-control-high.jsonl`
|
|
- `robot-experiences/2026-04-29.jsonl`
|
|
|
|
This record encodes:
|
|
|
|
- TIPLLM-only policy.
|
|
- Erik controller-only policy.
|
|
- Proxmox/Pi heavy worker policy.
|
|
- No crawler jobs started.
|
|
|
|
## Erik Notes
|
|
|
|
Synced robot/training code to `/opt/tip`.
|
|
|
|
Did not:
|
|
|
|
- start crawler jobs.
|
|
- enqueue robot waves.
|
|
- restart PM2 services.
|
|
|
|
Remote scraper TypeScript build initially failed because of stale misplaced remote-only duplicate files:
|
|
|
|
- `/opt/tip/packages/scraper/src/scrapers/scheduler.ts`
|
|
- `/opt/tip/packages/scraper/src/vendor-discovery-crawler.ts`
|
|
|
|
These did not exist locally and had wrong relative imports. Removed only these duplicates. Remote scraper build passed afterward.
|
|
|
|
PM2 status after this:
|
|
|
|
- `tip-api`: online.
|
|
- `tip-scraper-daemon`: online.
|
|
|
|
## Cross-Repo Sync
|
|
|
|
Claude Code created a similar sync handoff in `rene/llm-gateway`.
|
|
|
|
From user screenshot:
|
|
|
|
```text
|
|
e272105 sync: add chat handoff + context scaffolding for Codex integration (2026-04-29)
|
|
```
|
|
|
|
Gitea path shown:
|
|
|
|
```text
|
|
http://192.168.178.196:3000/rene/llm-gateway/src/main/sync/
|
|
```
|
|
|
|
Rule:
|
|
|
|
When work touches TIP, Magatama, LLM Gateway, bridges, auth, or shared Erik infrastructure, read both:
|
|
|
|
- `transceiver-db/sync/CURRENT.md`
|
|
- `llm-gateway/sync/CURRENT.md`
|
|
|
|
## Sync Folder Work
|
|
|
|
Created in this repo:
|
|
|
|
- `sync/README.md`
|
|
- `sync/CURRENT.md`
|
|
- `sync/history/2026-04-29-tipllm-robot-learning.md`
|
|
- `sync/history/2026-04-29-cross-repo-sync.md`
|
|
- this file.
|
|
|
|
Already pushed earlier:
|
|
|
|
```text
|
|
6c42ca7 docs: add shared agent sync handoff
|
|
8e7c5aa docs: link llm-gateway sync handoff
|
|
```
|
|
|
|
## Current Dirty Worktree
|
|
|
|
As of this handoff, many non-sync files remain modified/untracked:
|
|
|
|
- `CHANGELOG_PENDING.md`
|
|
- `docs/TIP_SELFLEARNING_WORKFLOW.md`
|
|
- `packages/api/src/routes/hot-topics.ts`
|
|
- `packages/dashboard/hot-topics.js`
|
|
- `packages/mcp-server/src/index.ts`
|
|
- `packages/scraper/package.json`
|
|
- `packages/scraper/src/crawler-llm/core.ts`
|
|
- `packages/scraper/src/crawler-llm/training-data-writer.ts`
|
|
- `packages/scraper/src/scrapers/atgbics.ts`
|
|
- `packages/scraper/src/scrapers/fiber24.ts`
|
|
- `packages/scraper/src/scrapers/fibermall.ts`
|
|
- `packages/scraper/src/utils/backfill-images.ts`
|
|
- `packages/scraper/src/utils/db.ts`
|
|
- `packages/scraper/src/utils/image-downloader.ts`
|
|
- `packages/scraper/src/utils/spec-updater.ts`
|
|
- `scripts/tip-learning-pool-build.ts`
|
|
- `packages/scraper/src/robots/`
|
|
- `packages/scraper/src/scrapers/audiocodes-oem.ts`
|
|
- `packages/scraper/src/seed-batch35.ts`
|
|
- `packages/scraper/src/seed-batch36.ts`
|
|
- `packages/scraper/src/seed-batch37.ts`
|
|
- `sql/102-product-verification-reconcile.sql`
|
|
|
|
Do not revert blindly. Some are Codex changes from this session; some appear to be pre-existing Claude/Codex work.
|
|
|
|
## Safe Commands
|
|
|
|
Read-only/status:
|
|
|
|
```bash
|
|
npm run robots:verification -w packages/scraper -- --status
|
|
```
|
|
|
|
TIPLLM planning only, no crawl jobs:
|
|
|
|
```bash
|
|
npm run robots:verification -w packages/scraper -- --tipllm-plan --limit=3
|
|
```
|
|
|
|
Dry-run queue plan only:
|
|
|
|
```bash
|
|
npm run robots:verification -w packages/scraper -- --enqueue=details-fast-lane --profile=erik-safe --dry-run
|
|
```
|
|
|
|
Build checks:
|
|
|
|
```bash
|
|
npm run build -w packages/scraper
|
|
npm run build -w packages/api
|
|
```
|
|
|
|
## Next Recommended Steps
|
|
|
|
1. Pull both sync folders from Gitea:
|
|
- `rene/transceiver-db`
|
|
- `rene/llm-gateway`
|
|
2. Review dirty worktree before committing code.
|
|
3. Decide whether to commit TIP verification + Hot Topics + robot learning code as one or several commits.
|
|
4. If running robots, start with TIPLLM planning only.
|
|
5. If dispatching crawl work, send heavy profiles to Proxmox/Pi, not Erik.
|
|
|