sync: record magatama runpod adoption and lane truth

2026-05-06 20:23:53 +02:00 · 2026-05-06 20:23:53 +02:00 · e6f98c89bd
commit e6f98c89bd
parent b9a45f9f23
2 changed files with 242 additions and 0 deletions
--- a/sync/CURRENT.md
+++ b/sync/CURRENT.md
@ -343,6 +343,87 @@ From 2026-04-29:
 - Last price observation: `2026-04-29 19:15:53 UTC`
 - Last stock observation: `2026-04-29 19:15:56 UTC`

+## Latest MAGATAMA Training / RunPod Truth
+
+Confirmed on `2026-05-06`:
+
+- Lane-specific training pools are now materially separated and no longer all fallback to `magatamallm`.
+- Live Erik dashboard API now reports:
+  - `magatamallm`
+    - `1367 train`
+    - `152 eval`
+    - `1519 total`
+    - `newSinceLastTraining = 1367`
+  - `fo_blogllm`
+    - `17353 train`
+    - `1929 eval`
+    - `19282 total`
+    - `newSinceLastTraining = 17353`
+    - active local model resolves to `fo-blog-v7`
+  - `tip_llm`
+    - `6482 train`
+    - `721 eval`
+    - `7203 total`
+    - `newSinceLastTraining = 6482`
+    - target active model is `tip-llm-v1`, but this model is not yet present locally in Ollama
+- Result:
+  - previous `1097` everywhere was stale / wrong.
+  - selected lane now controls its own manifest, model label, and training counts.
+
+### Gitea-backed Pool Materialization
+
+- `magatamallm` Gitea pool remains canonical and populated.
+- `fo_blogllm` and `tip_llm` Gitea-backed pool folders were previously almost empty; they are now materialized from the local RunPod lane exports.
+- Lane manifests and JSONL exports now exist under:
+  - `training-data/gitea-learning-pool/fo_blogllm/`
+  - `training-data/gitea-learning-pool/tip_llm/`
+
+### RunPod Completion Hardening
+
+- MAGATAMA dashboard code now treats RunPod `COMPLETED` as success only after:
+  1. target model artifact is referenced
+  2. local Mac training API adopts/imports the artifact
+  3. lane-specific smoke tests pass
+  4. active Ollama alias is updated
+- New local adoption endpoint is:
+  - `POST /adopt-runpod-model`
+
+### Mac Training API State
+
+- The old LaunchAgent on Mac Studio was still serving the legacy training API from:
+  - `~/magatama-llm/service/training_api.py`
+- It has now been upgraded in place so Erik sees the new adoption-capable API.
+- Verified from Erik:
+  - `http://192.168.178.213:3214/health` returns the new service
+  - it now exposes `register_script` pointing into the MAGATAMA repo
+  - `POST /adopt-runpod-model` exists and rejects unauthenticated requests with `401`, proving the route is live
+
+### Still Outstanding
+
+- A fully successful end-to-end RunPod fine-tune with:
+  - real worker success
+  - real artifact
+  - successful local Ollama import
+  - active alias switch
+  - smoke-test proof
+  has not yet been re-verified after the new adoption pipeline was wired in.
+- `tip_llm-v1` is still not installed locally in Ollama.
+
+### Pulso AI Recommendation
+
+- Keep a shared network/transceiver/switch core corpus with TIP.
+- Do not collapse `Pulso AI` into the same instruction lane as `TIP_LLM`.
+- Recommended split:
+  - `TIP_LLM`
+    - research
+    - crawler / scraper / robot planning
+    - vendor / firmware / issue extraction
+  - `Pulso AI`
+    - product responses
+    - support
+    - diagnostics
+    - operator explanation layer
+
 ## Safe Next Steps

 1. Clone or pull Gitea `origin` on laptop/Claude Code.
--- a/sync/history/2026-05-06-magatama-runpod-adoption-and-lane-truth.md
+++ b/sync/history/2026-05-06-magatama-runpod-adoption-and-lane-truth.md
@ -0,0 +1,161 @@
+# 2026-05-06 — MAGATAMA RunPod Adoption + Lane Truth
+
+## Scope
+
+Finalize the MAGATAMA training path so that:
+
+1. lane-specific pools are real and visible
+2. RunPod `COMPLETED` is not treated as success without a real adoptable artifact
+3. Mac Studio exposes a live adoption endpoint for post-RunPod import + smoke tests
+4. `fo_blogllm` / `tip_llm` stop inheriting stale `magatamallm` counts
+
+## What Changed
+
+### 1. Gitea-backed lane pools are now materialized
+
+The sync/build chain was extended so `fo_blogllm` and `tip_llm` are not “README-only” placeholders anymore.
+
+Current local lane export truth:
+
+- `magatamallm`
+  - `1367 train`
+  - `152 eval`
+  - `1519 total`
+- `fo_blogllm`
+  - `17353 train`
+  - `1929 eval`
+  - `19282 total`
+- `tip_llm`
+  - `6482 train`
+  - `721 eval`
+  - `7203 total`
+
+`sync_gitea_training_pool.ts` now writes lane-specific manifests and JSONL exports back into the Gitea-backed learning-pool tree.
+
+### 2. RunPod completion gating was hardened
+
+The dashboard/server path was updated so RunPod `COMPLETED` is no longer enough by itself.
+
+The intended success chain is now:
+
+1. RunPod reports terminal state
+2. target model artifact is identified
+3. Mac Studio `/adopt-runpod-model` is called
+4. local candidate model is imported into Ollama
+5. lane-specific smoke suite passes
+6. active alias is switched
+7. only then is the run treated as truly successful
+
+Registry status extensions added:
+
+- `completed_and_adopted`
+- `completed_seed_preparation`
+- `completed_not_adopted`
+
+### 3. Mac Studio training API was upgraded in place
+
+Critical discovery:
+
+- Erik already used `http://192.168.178.213:3214`
+- but the Mac LaunchAgent still served the **old** training API from:
+  - `~/magatama-llm/service/training_api.py`
+
+This old service had no `/adopt-runpod-model`.
+
+Action taken:
+
+- upgraded the LaunchAgent-targeted file in place
+- made the training API portable enough to find `register_runpod_ollama_model.py` from either:
+  - the MAGATAMA repo
+  - or fallback candidate paths
+
+Verified from Erik:
+
+- `GET /health` works
+- response now contains:
+  - `register_script: /Users/renefichtmueller/Desktop/Claude Code/magatama/scripts/register_runpod_ollama_model.py`
+- `POST /adopt-runpod-model` exists and returns `401` without auth, which proves the new route is live
+
+### 4. Lane status is now honest
+
+The live Erik dashboard API now reports lane-specific values instead of silently reusing `magatamallm`.
+
+Also fixed:
+
+- `fo_blogllm` and `tip_llm` no longer inherit a false “last successful run” from the global Mac training state
+- lane-specific active model labels are now used:
+  - `fo_blogllm` -> `fo-blog-v7`
+  - `tip_llm` -> `tip-llm-v1`
+
+## Verified Live State on Erik
+
+### `magatamallm`
+
+- available: `true`
+- activeProvider: `ollama:magatama-coder:latest`
+- `newSinceLastTraining = 1367`
+
+### `fo_blogllm`
+
+- available: `true`
+- activeProvider: `ollama:fo-blog-v7`
+- `newSinceLastTraining = 17353`
+- `lastTrainingAt = null`
+- `neverTrained = true`
+
+### `tip_llm`
+
+- available: `false`
+- activeProvider falls back to `claude-bridge`
+- target model shown as `tip-llm-v1`
+- `newSinceLastTraining = 6482`
+- `lastTrainingAt = null`
+- `neverTrained = true`
+
+Interpretation:
+
+- `tip_llm` corpus is real, but the active Ollama alias is not installed locally yet.
+
+## Pulso AI Decision
+
+Recommended architecture:
+
+- shared network / transceiver / switch knowledge core with TIP
+- separate behavior lane
+
+Meaning:
+
+- `TIP_LLM`
+  - research
+  - crawler planning
+  - issue extraction
+  - vendor / firmware / compatibility search
+- `Pulso AI`
+  - support
+  - diagnostics
+  - operational explanation
+  - customer/product answer layer
+
+Do **not** blindly reuse the exact same instruction lane for both.
+
+## Still Open
+
+1. A fresh real RunPod run still needs full end-to-end proof after the new adoption path:
+   - successful worker execution
+   - artifact exists
+   - local import succeeds
+   - smoke suite passes
+   - alias switches
+2. `tip_llm-v1` still needs local Ollama adoption/installation.
+3. Further corpus enrichment is still desirable from:
+   - local transceiver/TIP/blog pools
+   - Susan/Fearghas if accessible
+   - GitHub / universities / security research sources
+
+## Operator Notes
+
+- TIP policy remains:
+  - TIPLLM-only for robot/crawler planning
+  - Erik is light controller only
+  - heavy crawling runs on Proxmox / Pis
+- Push only `sync/` to Gitea from this handoff update.