sync: record magatamallm adoption closure

2026-05-09 22:28:49 +02:00 · 2026-05-09 22:28:49 +02:00 · de2943ea79
commit de2943ea79
parent 1af4f090f7
2 changed files with 111 additions and 0 deletions
--- a/sync/CURRENT.md
+++ b/sync/CURRENT.md
@ -4,6 +4,48 @@ Updated: 2026-05-09 20:12 UTC

 ## Newest Work

+- MAGATAMA MagatamaLLM RunPod training and adoption closure on 2026-05-09:
+  - operator requirement:
+    - RunPod success only counts after artifact exists, local Ollama import works, smoke tests pass, aliases/version switch, remote registry is updated, and live MAGATAMA reports no stale active run
+    - do not spend another RunPod run when the paid training already completed; recover adoption instead
+  - RunPod job completed:
+    - endpoint `0rmkf28w2g5gip`
+    - job `a46de2ef-96e0-4adf-bbf8-d7a890e06c6f-e2`
+    - run id `magatamallm-2026-05-09T19-22-53`
+    - target artifact `renefichtmueller/magatama-magatamallm-magatamallm-2026-05-09t19-22-53`
+    - worker summary `RunPod QLoRA complete · train=605 · valid=114`
+  - adoption recovered:
+    - initial local adoption failed because Mac Studio had too little free disk for GGUF conversion after the merged model was written
+    - removed only temporary/import-safe blockers:
+      - failed MagatamaLLM merged `model.safetensors`
+      - already imported FO_BlogLLM and TIP_LLM source GGUF files
+      - old non-active Ollama test model `test-qwen32b:latest`
+    - kept active Ollama aliases intact: `magatama-coder:latest`, `fo-blog-v7`, `tip-llm-v1`
+  - adoption completed:
+    - local candidate `magatamallm-runpod-magatamallm-2026-05-09t19-22-53`
+    - release alias `magatama-coder-r1`
+    - active alias `magatama-coder:latest`
+    - candidate smoke `4/5` passed with the required threshold `4`
+    - direct local smoke returned exact `MAGATAMA-R1-READY`
+  - dashboard/server correction:
+    - deployed a MAGATAMA dashboard server fix so training registry ordering uses `recorded_at`, with `completed_at/adopted_at/created_at` fallbacks
+    - release/version selection now accepts top-level `release_alias` and `candidate_model` on adoption events
+    - legacy MagatamaLLM baseline mismatch guard no longer invalidates the new RunPod lane export
+    - restarted `magatama-dashboard`
+  - live verification:
+    - `magatamallm` reports `activeProvider=ollama:magatama-coder:latest`
+    - `modelVersion=magatama-coder-r1`
+    - `lastRegistryRunStatus=completed_and_adopted`
+    - `activeRun=null`
+    - `hasTrustedTrainingBaseline=true`
+    - `newSinceLastTraining=0`
+    - lane export shows `1367` train, `152` eval, `1519` total
+    - `fo_blogllm` remains `fo-blog-v7-r1`, `activeRun=null`, `newSinceLastTraining=0`
+    - `tip_llm` remains `tip-llm-v1-r1`, `activeRun=null`, `newSinceLastTraining=0`
+  - open:
+    - add more explicit training pairs for the “insufficient evidence => escalate/manual review” behavior because the new MagatamaLLM passed the required smoke threshold but still answered that one eval too passively
+    - complete dual-Gitea mirroring as a separate infrastructure closure item
+
 - TIP verification artifact cleanup and vendor completion on 2026-05-09:
  - operator requirement:
    - continue until all source-backed verification work is exhausted
--- a/sync/history/2026-05-09-magatamallm-runpod-adoption-closure.md
+++ b/sync/history/2026-05-09-magatamallm-runpod-adoption-closure.md
@ -0,0 +1,69 @@
+# MagatamaLLM RunPod Adoption Closure
+
+Date: 2026-05-09 20:25 UTC
+
+## What Changed
+
+- Completed the MagatamaLLM RunPod training closure without launching a new paid RunPod job.
+- Recovered the local adoption path after the RunPod worker had already trained and uploaded the adapter successfully.
+- Deployed a MAGATAMA dashboard server fix so the live training status reflects the final adopted model instead of stale `completed_not_adopted` metadata.
+- Synced the adoption metadata back to Erik and verified the public MAGATAMA status endpoint.
+
+## Run Details
+
+- Lane: `magatamallm`
+- Endpoint: `0rmkf28w2g5gip`
+- Job: `a46de2ef-96e0-4adf-bbf8-d7a890e06c6f-e2`
+- Run id: `magatamallm-2026-05-09T19-22-53`
+- HF artifact: `renefichtmueller/magatama-magatamallm-magatamallm-2026-05-09t19-22-53`
+- Worker summary: `RunPod QLoRA complete · train=605 · valid=114`
+- Local candidate: `magatamallm-runpod-magatamallm-2026-05-09t19-22-53`
+- Release alias: `magatama-coder-r1`
+- Active alias: `magatama-coder:latest`
+- Candidate smoke: `4/5` with required threshold `4`
+- Direct local smoke: exact `MAGATAMA-R1-READY`
+
+## Failure Recovery
+
+- First adoption failed because Mac Studio had too little free disk for GGUF conversion after writing the merged model.
+- Removed only safe temporary/import blockers:
+  - failed MagatamaLLM merged `model.safetensors`
+  - FO_BlogLLM/TIP_LLM source GGUF import files that were already registered in Ollama
+  - old non-active Ollama test model `test-qwen32b:latest`
+- Active aliases remained intact:
+  - `magatama-coder:latest`
+  - `fo-blog-v7`
+  - `tip-llm-v1`
+
+## Dashboard Fix
+
+- Registry ordering now uses `recorded_at` with fallback to `completed_at`, `adopted_at`, and `created_at`.
+- Successful adoption version selection now accepts top-level `release_alias` and `candidate_model`, not only nested `adoption.*` payloads.
+- Legacy MagatamaLLM baseline mismatch protection no longer invalidates the RunPod lane export.
+- Deployed rebuilt `packages/dashboard/dist/server.js` to Erik and restarted `magatama-dashboard`.
+
+## Live Verification
+
+- MAGATAMA `magatamallm` status:
+  - `activeProvider=ollama:magatama-coder:latest`
+  - `modelVersion=magatama-coder-r1`
+  - `lastRegistryRunStatus=completed_and_adopted`
+  - `activeRun=null`
+  - `hasTrustedTrainingBaseline=true`
+  - `newSinceLastTraining=0`
+  - `collectedExamples=1367`
+  - `evalExamples=152`
+  - `totalExamples=1519`
+- FO_BlogLLM stayed healthy:
+  - `modelVersion=fo-blog-v7-r1`
+  - `activeRun=null`
+  - `newSinceLastTraining=0`
+- TIP_LLM stayed healthy:
+  - `modelVersion=tip-llm-v1-r1`
+  - `activeRun=null`
+  - `newSinceLastTraining=0`
+
+## Open
+
+- Add more explicit MagatamaLLM examples for the rule: insufficient evidence means escalate/manual review rather than passive monitoring.
+- Complete dual-Gitea mirroring separately.