diff --git a/sync/CURRENT.md b/sync/CURRENT.md new file mode 100644 index 0000000..6c6d3e9 --- /dev/null +++ b/sync/CURRENT.md @@ -0,0 +1,83 @@ +# Current TIP Sync State + +Updated: 2026-04-29 20:25 UTC + +## Active Policy + +- Put coordination notes and handoffs in this `sync/` folder and push to Gitea. +- Use TIPLLM only for TIP crawler/robot planning and extraction feedback. +- Write robot/crawler experience into the Gitea-backed TIPLLM training pool. +- Keep Erik safe: no heavy crawler waves or uncontrolled Playwright/discovery jobs on Erik. +- Use Proxmox/Pi workers for crawl load. + +## Latest Work + +- Added a verification robot controller: + - `packages/scraper/src/robots/verification-robots.ts` + - command: `npm run robots:verification -w packages/scraper -- --status` +- Added TIPLLM robot experience writing: + - `packages/scraper/src/crawler-llm/training-data-writer.ts` + - writes raw robot audit rows and SFT records. +- Added Gitea training pool import to TIP learning-pool build: + - `scripts/tip-learning-pool-build.ts` + - imports `TIP_TRAINING_REPO/qa-pairs/*.jsonl` into the `tip_llm` lane. +- Added docs: + - `docs/TIP_SELFLEARNING_WORKFLOW.md` +- Added package script: + - `packages/scraper/package.json` + - `robots:verification` + +## Gitea Training Pool + +- Existing local clone: `/tmp/tip-training-data` +- Gitea repo: `rene/tip-training-data` +- Latest pushed training commit: + - `f1c83f8 crawl: add robot-status training records [2026-04-29T20:11:24.091Z]` +- First robot experience record was written to: + - `/tmp/tip-training-data/qa-pairs/robot-control-high.jsonl` + - `/tmp/tip-training-data/robot-experiences/2026-04-29.jsonl` + +## Erik Status + +- Synced TIPLLM robot/training code to `/opt/tip`. +- Did not start crawler jobs. +- Did not enqueue robot waves. +- Did not restart PM2 services. +- Remote scraper TypeScript build is passing after removing two stale misplaced remote-only duplicate files: + - `/opt/tip/packages/scraper/src/scrapers/scheduler.ts` + - `/opt/tip/packages/scraper/src/vendor-discovery-crawler.ts` +- `tip-api` and `tip-scraper-daemon` are online. + +## Last Live Verification Snapshot + +From 2026-04-29: + +- Total transceivers: `13,546` +- Price verified: `7,250` +- Image verified: `7,025` +- Details verified: `6,243` +- Fully verified: `5,812` +- Last price observation: `2026-04-29 19:15:53 UTC` +- Last stock observation: `2026-04-29 19:15:56 UTC` + +## Safe Next Steps + +1. Clone or pull Gitea `origin` on laptop/Claude Code. +2. Read this folder first. +3. If testing robots, start with dry runs only: + +```bash +npm run robots:verification -w packages/scraper -- --status +npm run robots:verification -w packages/scraper -- --tipllm-plan --limit=3 +npm run robots:verification -w packages/scraper -- --enqueue=details-fast-lane --profile=erik-safe --dry-run +``` + +4. Only dispatch real crawl work after deciding the target host: + - Erik: `erik-safe`, tiny batches only. + - Pi: `pi-fetch`. + - Proxmox: `proxmox-heavy`. + +## Dirty Worktree Note + +There are existing uncommitted changes outside `sync/`. Some are Codex work from this session, some appear pre-existing or from earlier Claude/Codex work. Do not blindly revert them. Review `git status --short` before committing broader changes. + diff --git a/sync/README.md b/sync/README.md new file mode 100644 index 0000000..272f2ad --- /dev/null +++ b/sync/README.md @@ -0,0 +1,26 @@ +# TIP Agent Sync + +This folder is the shared handoff area for Codex, Claude Code, and laptop sync workflows. + +Rules: + +- Always write current work status here before ending a session. +- Keep entries practical: what changed, what was deployed, what was not deployed, what is risky, and what should happen next. +- Do not store secrets, tokens, passwords, private keys, cookies, or bearer tokens here. +- Use Gitea `origin` as the common sync target. +- Erik is a controller only for crawler work. Heavy crawling belongs on Proxmox or Pi workers. +- TIPLLM is the only AI used for TIP crawler/robot planning and extraction feedback. +- Robot/crawler experiences must also be written to the TIPLLM training pool in Gitea. + +Suggested session closeout: + +```bash +git status --short +date -u +``` + +Then update: + +- `sync/CURRENT.md` for the latest operational state. +- A dated file under `sync/history/` for anything substantial. + diff --git a/sync/history/2026-04-29-tipllm-robot-learning.md b/sync/history/2026-04-29-tipllm-robot-learning.md new file mode 100644 index 0000000..915e4d6 --- /dev/null +++ b/sync/history/2026-04-29-tipllm-robot-learning.md @@ -0,0 +1,63 @@ +# 2026-04-29 TIPLLM Robot Learning Handoff + +## Summary + +Rene requested that TIP crawler/robot planning use TIPLLM only, no external AI layer, and that crawler/robot experience be written into a Gitea training pool so TIPLLM improves over time. + +Implemented locally and synced to Erik: + +- Verification robot controller with safe profiles: + - `erik-safe` + - `pi-fetch` + - `proxmox-heavy` +- TIPLLM planning mode: + - `--tipllm-plan --limit=N` +- Robot experience writer: + - status snapshots + - TIPLLM plans + - queue dry-runs + - queue enqueues + - rejected queue plans + - future crawler results +- Learning pool import: + - `TIP_TRAINING_REPO/qa-pairs/**/*.jsonl` is imported into the `tip_llm` lane by `learning-pool:build`. + +## Safety Choices + +- Default robot profile is `erik-safe`. +- `erik-safe` is capped to 3 lightweight queues by default. +- Playwright and discovery queues are excluded from `erik-safe`. +- No crawler jobs were started during this work. +- No queue waves were enqueued during this work. + +## Training Pool + +The local Gitea training pool clone already existed at `/tmp/tip-training-data`. + +A first robot experience was written and pushed: + +```text +f1c83f8 crawl: add robot-status training records [2026-04-29T20:11:24.091Z] +``` + +## Erik + +Synced code/docs to `/opt/tip` and ran the scraper TypeScript build. + +The build initially failed because two stale misplaced remote-only duplicate files existed on Erik: + +- `/opt/tip/packages/scraper/src/scrapers/scheduler.ts` +- `/opt/tip/packages/scraper/src/vendor-discovery-crawler.ts` + +They were not present locally and had wrong relative imports. Removed only those duplicates. Build passed after that. + +PM2 services were left running and were not restarted. + +## Commands To Start Safely + +```bash +npm run robots:verification -w packages/scraper -- --status +npm run robots:verification -w packages/scraper -- --tipllm-plan --limit=3 +npm run robots:verification -w packages/scraper -- --enqueue=details-fast-lane --profile=erik-safe --dry-run +``` +