diff --git a/sync/CURRENT.md b/sync/CURRENT.md index 2d16979..670a3d6 100644 --- a/sync/CURRENT.md +++ b/sync/CURRENT.md @@ -4,6 +4,48 @@ Updated: 2026-05-09 20:12 UTC ## Newest Work +- MAGATAMA MagatamaLLM RunPod training and adoption closure on 2026-05-09: + - operator requirement: + - RunPod success only counts after artifact exists, local Ollama import works, smoke tests pass, aliases/version switch, remote registry is updated, and live MAGATAMA reports no stale active run + - do not spend another RunPod run when the paid training already completed; recover adoption instead + - RunPod job completed: + - endpoint `0rmkf28w2g5gip` + - job `a46de2ef-96e0-4adf-bbf8-d7a890e06c6f-e2` + - run id `magatamallm-2026-05-09T19-22-53` + - target artifact `renefichtmueller/magatama-magatamallm-magatamallm-2026-05-09t19-22-53` + - worker summary `RunPod QLoRA complete · train=605 · valid=114` + - adoption recovered: + - initial local adoption failed because Mac Studio had too little free disk for GGUF conversion after the merged model was written + - removed only temporary/import-safe blockers: + - failed MagatamaLLM merged `model.safetensors` + - already imported FO_BlogLLM and TIP_LLM source GGUF files + - old non-active Ollama test model `test-qwen32b:latest` + - kept active Ollama aliases intact: `magatama-coder:latest`, `fo-blog-v7`, `tip-llm-v1` + - adoption completed: + - local candidate `magatamallm-runpod-magatamallm-2026-05-09t19-22-53` + - release alias `magatama-coder-r1` + - active alias `magatama-coder:latest` + - candidate smoke `4/5` passed with the required threshold `4` + - direct local smoke returned exact `MAGATAMA-R1-READY` + - dashboard/server correction: + - deployed a MAGATAMA dashboard server fix so training registry ordering uses `recorded_at`, with `completed_at/adopted_at/created_at` fallbacks + - release/version selection now accepts top-level `release_alias` and `candidate_model` on adoption events + - legacy MagatamaLLM baseline mismatch guard no longer invalidates the new RunPod lane export + - restarted `magatama-dashboard` + - live verification: + - `magatamallm` reports `activeProvider=ollama:magatama-coder:latest` + - `modelVersion=magatama-coder-r1` + - `lastRegistryRunStatus=completed_and_adopted` + - `activeRun=null` + - `hasTrustedTrainingBaseline=true` + - `newSinceLastTraining=0` + - lane export shows `1367` train, `152` eval, `1519` total + - `fo_blogllm` remains `fo-blog-v7-r1`, `activeRun=null`, `newSinceLastTraining=0` + - `tip_llm` remains `tip-llm-v1-r1`, `activeRun=null`, `newSinceLastTraining=0` + - open: + - add more explicit training pairs for the “insufficient evidence => escalate/manual review” behavior because the new MagatamaLLM passed the required smoke threshold but still answered that one eval too passively + - complete dual-Gitea mirroring as a separate infrastructure closure item + - TIP verification artifact cleanup and vendor completion on 2026-05-09: - operator requirement: - continue until all source-backed verification work is exhausted diff --git a/sync/history/2026-05-09-magatamallm-runpod-adoption-closure.md b/sync/history/2026-05-09-magatamallm-runpod-adoption-closure.md new file mode 100644 index 0000000..e7b1f88 --- /dev/null +++ b/sync/history/2026-05-09-magatamallm-runpod-adoption-closure.md @@ -0,0 +1,69 @@ +# MagatamaLLM RunPod Adoption Closure + +Date: 2026-05-09 20:25 UTC + +## What Changed + +- Completed the MagatamaLLM RunPod training closure without launching a new paid RunPod job. +- Recovered the local adoption path after the RunPod worker had already trained and uploaded the adapter successfully. +- Deployed a MAGATAMA dashboard server fix so the live training status reflects the final adopted model instead of stale `completed_not_adopted` metadata. +- Synced the adoption metadata back to Erik and verified the public MAGATAMA status endpoint. + +## Run Details + +- Lane: `magatamallm` +- Endpoint: `0rmkf28w2g5gip` +- Job: `a46de2ef-96e0-4adf-bbf8-d7a890e06c6f-e2` +- Run id: `magatamallm-2026-05-09T19-22-53` +- HF artifact: `renefichtmueller/magatama-magatamallm-magatamallm-2026-05-09t19-22-53` +- Worker summary: `RunPod QLoRA complete · train=605 · valid=114` +- Local candidate: `magatamallm-runpod-magatamallm-2026-05-09t19-22-53` +- Release alias: `magatama-coder-r1` +- Active alias: `magatama-coder:latest` +- Candidate smoke: `4/5` with required threshold `4` +- Direct local smoke: exact `MAGATAMA-R1-READY` + +## Failure Recovery + +- First adoption failed because Mac Studio had too little free disk for GGUF conversion after writing the merged model. +- Removed only safe temporary/import blockers: + - failed MagatamaLLM merged `model.safetensors` + - FO_BlogLLM/TIP_LLM source GGUF import files that were already registered in Ollama + - old non-active Ollama test model `test-qwen32b:latest` +- Active aliases remained intact: + - `magatama-coder:latest` + - `fo-blog-v7` + - `tip-llm-v1` + +## Dashboard Fix + +- Registry ordering now uses `recorded_at` with fallback to `completed_at`, `adopted_at`, and `created_at`. +- Successful adoption version selection now accepts top-level `release_alias` and `candidate_model`, not only nested `adoption.*` payloads. +- Legacy MagatamaLLM baseline mismatch protection no longer invalidates the RunPod lane export. +- Deployed rebuilt `packages/dashboard/dist/server.js` to Erik and restarted `magatama-dashboard`. + +## Live Verification + +- MAGATAMA `magatamallm` status: + - `activeProvider=ollama:magatama-coder:latest` + - `modelVersion=magatama-coder-r1` + - `lastRegistryRunStatus=completed_and_adopted` + - `activeRun=null` + - `hasTrustedTrainingBaseline=true` + - `newSinceLastTraining=0` + - `collectedExamples=1367` + - `evalExamples=152` + - `totalExamples=1519` +- FO_BlogLLM stayed healthy: + - `modelVersion=fo-blog-v7-r1` + - `activeRun=null` + - `newSinceLastTraining=0` +- TIP_LLM stayed healthy: + - `modelVersion=tip-llm-v1-r1` + - `activeRun=null` + - `newSinceLastTraining=0` + +## Open + +- Add more explicit MagatamaLLM examples for the rule: insufficient evidence means escalate/manual review rather than passive monitoring. +- Complete dual-Gitea mirroring separately.