transceiver-db/sync/history/2026-05-09-magatamallm-runpod-adoption-closure.md

# MagatamaLLM RunPod Adoption Closure

Date: 2026-05-09 20:25 UTC

## What Changed

- Completed the MagatamaLLM RunPod training closure without launching a new paid RunPod job.
- Recovered the local adoption path after the RunPod worker had already trained and uploaded the adapter successfully.
- Deployed a MAGATAMA dashboard server fix so the live training status reflects the final adopted model instead of stale `completed_not_adopted` metadata.
- Synced the adoption metadata back to Erik and verified the public MAGATAMA status endpoint.

## Run Details

- Lane: `magatamallm`
- Endpoint: `0rmkf28w2g5gip`
- Job: `a46de2ef-96e0-4adf-bbf8-d7a890e06c6f-e2`
- Run id: `magatamallm-2026-05-09T19-22-53`
- HF artifact: `renefichtmueller/magatama-magatamallm-magatamallm-2026-05-09t19-22-53`
- Worker summary: `RunPod QLoRA complete · train=605 · valid=114`
- Local candidate: `magatamallm-runpod-magatamallm-2026-05-09t19-22-53`
- Release alias: `magatama-coder-r1`
- Active alias: `magatama-coder:latest`
- Candidate smoke: `4/5` with required threshold `4`
- Direct local smoke: exact `MAGATAMA-R1-READY`

## Failure Recovery

- First adoption failed because Mac Studio had too little free disk for GGUF conversion after writing the merged model.
- Removed only safe temporary/import blockers:
  - failed MagatamaLLM merged `model.safetensors`
  - FO_BlogLLM/TIP_LLM source GGUF import files that were already registered in Ollama
  - old non-active Ollama test model `test-qwen32b:latest`
- Active aliases remained intact:
  - `magatama-coder:latest`
  - `fo-blog-v7`
  - `tip-llm-v1`

## Dashboard Fix

- Registry ordering now uses `recorded_at` with fallback to `completed_at`, `adopted_at`, and `created_at`.
- Successful adoption version selection now accepts top-level `release_alias` and `candidate_model`, not only nested `adoption.*` payloads.
- Legacy MagatamaLLM baseline mismatch protection no longer invalidates the RunPod lane export.
- Deployed rebuilt `packages/dashboard/dist/server.js` to Erik and restarted `magatama-dashboard`.

## Live Verification

- MAGATAMA `magatamallm` status:
  - `activeProvider=ollama:magatama-coder:latest`
  - `modelVersion=magatama-coder-r1`
  - `lastRegistryRunStatus=completed_and_adopted`
  - `activeRun=null`
  - `hasTrustedTrainingBaseline=true`
  - `newSinceLastTraining=0`
  - `collectedExamples=1367`
  - `evalExamples=152`
  - `totalExamples=1519`
- FO_BlogLLM stayed healthy:
  - `modelVersion=fo-blog-v7-r1`
  - `activeRun=null`
  - `newSinceLastTraining=0`
- TIP_LLM stayed healthy:
  - `modelVersion=tip-llm-v1-r1`
  - `activeRun=null`
  - `newSinceLastTraining=0`

## Open

- Add more explicit MagatamaLLM examples for the rule: insufficient evidence means escalate/manual review rather than passive monitoring.
- Complete dual-Gitea mirroring separately.