transceiver-db/docs/TIP_SELFLEARNING_WORKFLOW.md

59 lines
2.2 KiB
Markdown

# TIP Selflearning Workflow
TIP now has two separate learning lanes:
- `TIP_LLM`: research, crawler planning, vendor/market intelligence and data preparation.
- `Blog_LLM`: FO_BlogLLM/founder content and practical technical blog generation.
Commands:
```bash
npm run learning-pool:build
npm run learning-pool:publish-hf
```
Dashboard/API:
- `GET /api/selflearning/status`
- `POST /api/selflearning/build`
- `POST /api/selflearning/publish-hf`
- `POST /api/selflearning/train` with `{ "lane": "tip_llm"|"blog_llm", "provider": "runpod"|"local" }`
Secrets are read from environment variables or macOS Keychain, never from committed files:
- RunPod: `RUNPOD_API_KEY` / `TIP_RUNPOD_API_KEY`, Keychain `magatama.runpod.api` / `tip.runpod.api`
- Hugging Face: `HF_TOKEN` / `HUGGINGFACE_TOKEN`, Keychain `magatama.huggingface.token` / `tip.huggingface.token`
- Endpoint: `TIP_RUNPOD_ENDPOINT_ID` or `RUNPOD_ENDPOINT_ID`
Default private Hugging Face datasets:
- `renefichtmueller/tip-llm-sft`
- `renefichtmueller/blog-llm-sft`
Local training is enabled by setting `TIP_LOCAL_TRAIN_COMMAND`; the API appends the lane name automatically.
## TIPLLM Robot Experience Pool
Crawler and verification robots must use TIPLLM only for planning/extraction feedback. Operational experience is written to the Gitea-backed TIP training pool:
- Default local clone: `/tmp/tip-training-data`
- Override: `TIP_TRAINING_REPO=/path/to/tip-training-data`
- Gitea repo: `rene/tip-training-data`
- SFT records: `qa-pairs/robot-control-high.jsonl`
- Raw audit records: `robot-experiences/YYYY-MM-DD.jsonl`
Useful commands:
```bash
npm run robots:verification -w packages/scraper -- --status
npm run robots:verification -w packages/scraper -- --tipllm-plan --limit=5
npm run robots:verification -w packages/scraper -- --enqueue=details-fast-lane --profile=erik-safe --dry-run
```
Safety defaults:
- `erik-safe` is the default profile and caps to 3 lightweight queues.
- Playwright/discovery work belongs on Proxmox or Pi workers, not Erik.
- Every status snapshot, TIPLLM plan, dry-run plan, enqueue result and crawler result should become a TIPLLM training example.
- `learning-pool:build` automatically imports Gitea pool SFT rows from `qa-pairs/` into the `tip_llm` lane.