70 lines
2.8 KiB
Markdown
70 lines
2.8 KiB
Markdown
# MagatamaLLM RunPod Adoption Closure
|
|
|
|
Date: 2026-05-09 20:25 UTC
|
|
|
|
## What Changed
|
|
|
|
- Completed the MagatamaLLM RunPod training closure without launching a new paid RunPod job.
|
|
- Recovered the local adoption path after the RunPod worker had already trained and uploaded the adapter successfully.
|
|
- Deployed a MAGATAMA dashboard server fix so the live training status reflects the final adopted model instead of stale `completed_not_adopted` metadata.
|
|
- Synced the adoption metadata back to Erik and verified the public MAGATAMA status endpoint.
|
|
|
|
## Run Details
|
|
|
|
- Lane: `magatamallm`
|
|
- Endpoint: `0rmkf28w2g5gip`
|
|
- Job: `a46de2ef-96e0-4adf-bbf8-d7a890e06c6f-e2`
|
|
- Run id: `magatamallm-2026-05-09T19-22-53`
|
|
- HF artifact: `renefichtmueller/magatama-magatamallm-magatamallm-2026-05-09t19-22-53`
|
|
- Worker summary: `RunPod QLoRA complete · train=605 · valid=114`
|
|
- Local candidate: `magatamallm-runpod-magatamallm-2026-05-09t19-22-53`
|
|
- Release alias: `magatama-coder-r1`
|
|
- Active alias: `magatama-coder:latest`
|
|
- Candidate smoke: `4/5` with required threshold `4`
|
|
- Direct local smoke: exact `MAGATAMA-R1-READY`
|
|
|
|
## Failure Recovery
|
|
|
|
- First adoption failed because Mac Studio had too little free disk for GGUF conversion after writing the merged model.
|
|
- Removed only safe temporary/import blockers:
|
|
- failed MagatamaLLM merged `model.safetensors`
|
|
- FO_BlogLLM/TIP_LLM source GGUF import files that were already registered in Ollama
|
|
- old non-active Ollama test model `test-qwen32b:latest`
|
|
- Active aliases remained intact:
|
|
- `magatama-coder:latest`
|
|
- `fo-blog-v7`
|
|
- `tip-llm-v1`
|
|
|
|
## Dashboard Fix
|
|
|
|
- Registry ordering now uses `recorded_at` with fallback to `completed_at`, `adopted_at`, and `created_at`.
|
|
- Successful adoption version selection now accepts top-level `release_alias` and `candidate_model`, not only nested `adoption.*` payloads.
|
|
- Legacy MagatamaLLM baseline mismatch protection no longer invalidates the RunPod lane export.
|
|
- Deployed rebuilt `packages/dashboard/dist/server.js` to Erik and restarted `magatama-dashboard`.
|
|
|
|
## Live Verification
|
|
|
|
- MAGATAMA `magatamallm` status:
|
|
- `activeProvider=ollama:magatama-coder:latest`
|
|
- `modelVersion=magatama-coder-r1`
|
|
- `lastRegistryRunStatus=completed_and_adopted`
|
|
- `activeRun=null`
|
|
- `hasTrustedTrainingBaseline=true`
|
|
- `newSinceLastTraining=0`
|
|
- `collectedExamples=1367`
|
|
- `evalExamples=152`
|
|
- `totalExamples=1519`
|
|
- FO_BlogLLM stayed healthy:
|
|
- `modelVersion=fo-blog-v7-r1`
|
|
- `activeRun=null`
|
|
- `newSinceLastTraining=0`
|
|
- TIP_LLM stayed healthy:
|
|
- `modelVersion=tip-llm-v1-r1`
|
|
- `activeRun=null`
|
|
- `newSinceLastTraining=0`
|
|
|
|
## Open
|
|
|
|
- Add more explicit MagatamaLLM examples for the rule: insufficient evidence means escalate/manual review rather than passive monitoring.
|
|
- Complete dual-Gitea mirroring separately.
|