84 lines
3.0 KiB
Markdown
84 lines
3.0 KiB
Markdown
# Current TIP Sync State
|
|
|
|
Updated: 2026-04-29 20:25 UTC
|
|
|
|
## Active Policy
|
|
|
|
- Put coordination notes and handoffs in this `sync/` folder and push to Gitea.
|
|
- Use TIPLLM only for TIP crawler/robot planning and extraction feedback.
|
|
- Write robot/crawler experience into the Gitea-backed TIPLLM training pool.
|
|
- Keep Erik safe: no heavy crawler waves or uncontrolled Playwright/discovery jobs on Erik.
|
|
- Use Proxmox/Pi workers for crawl load.
|
|
|
|
## Latest Work
|
|
|
|
- Added a verification robot controller:
|
|
- `packages/scraper/src/robots/verification-robots.ts`
|
|
- command: `npm run robots:verification -w packages/scraper -- --status`
|
|
- Added TIPLLM robot experience writing:
|
|
- `packages/scraper/src/crawler-llm/training-data-writer.ts`
|
|
- writes raw robot audit rows and SFT records.
|
|
- Added Gitea training pool import to TIP learning-pool build:
|
|
- `scripts/tip-learning-pool-build.ts`
|
|
- imports `TIP_TRAINING_REPO/qa-pairs/*.jsonl` into the `tip_llm` lane.
|
|
- Added docs:
|
|
- `docs/TIP_SELFLEARNING_WORKFLOW.md`
|
|
- Added package script:
|
|
- `packages/scraper/package.json`
|
|
- `robots:verification`
|
|
|
|
## Gitea Training Pool
|
|
|
|
- Existing local clone: `/tmp/tip-training-data`
|
|
- Gitea repo: `rene/tip-training-data`
|
|
- Latest pushed training commit:
|
|
- `f1c83f8 crawl: add robot-status training records [2026-04-29T20:11:24.091Z]`
|
|
- First robot experience record was written to:
|
|
- `/tmp/tip-training-data/qa-pairs/robot-control-high.jsonl`
|
|
- `/tmp/tip-training-data/robot-experiences/2026-04-29.jsonl`
|
|
|
|
## Erik Status
|
|
|
|
- Synced TIPLLM robot/training code to `/opt/tip`.
|
|
- Did not start crawler jobs.
|
|
- Did not enqueue robot waves.
|
|
- Did not restart PM2 services.
|
|
- Remote scraper TypeScript build is passing after removing two stale misplaced remote-only duplicate files:
|
|
- `/opt/tip/packages/scraper/src/scrapers/scheduler.ts`
|
|
- `/opt/tip/packages/scraper/src/vendor-discovery-crawler.ts`
|
|
- `tip-api` and `tip-scraper-daemon` are online.
|
|
|
|
## Last Live Verification Snapshot
|
|
|
|
From 2026-04-29:
|
|
|
|
- Total transceivers: `13,546`
|
|
- Price verified: `7,250`
|
|
- Image verified: `7,025`
|
|
- Details verified: `6,243`
|
|
- Fully verified: `5,812`
|
|
- Last price observation: `2026-04-29 19:15:53 UTC`
|
|
- Last stock observation: `2026-04-29 19:15:56 UTC`
|
|
|
|
## Safe Next Steps
|
|
|
|
1. Clone or pull Gitea `origin` on laptop/Claude Code.
|
|
2. Read this folder first.
|
|
3. If testing robots, start with dry runs only:
|
|
|
|
```bash
|
|
npm run robots:verification -w packages/scraper -- --status
|
|
npm run robots:verification -w packages/scraper -- --tipllm-plan --limit=3
|
|
npm run robots:verification -w packages/scraper -- --enqueue=details-fast-lane --profile=erik-safe --dry-run
|
|
```
|
|
|
|
4. Only dispatch real crawl work after deciding the target host:
|
|
- Erik: `erik-safe`, tiny batches only.
|
|
- Pi: `pi-fetch`.
|
|
- Proxmox: `proxmox-heavy`.
|
|
|
|
## Dirty Worktree Note
|
|
|
|
There are existing uncommitted changes outside `sync/`. Some are Codex work from this session, some appear pre-existing or from earlier Claude/Codex work. Do not blindly revert them. Review `git status --short` before committing broader changes.
|
|
|