sync: record magatamallm adoption closure
This commit is contained in:
parent
1af4f090f7
commit
de2943ea79
@ -4,6 +4,48 @@ Updated: 2026-05-09 20:12 UTC
|
||||
|
||||
## Newest Work
|
||||
|
||||
- MAGATAMA MagatamaLLM RunPod training and adoption closure on 2026-05-09:
|
||||
- operator requirement:
|
||||
- RunPod success only counts after artifact exists, local Ollama import works, smoke tests pass, aliases/version switch, remote registry is updated, and live MAGATAMA reports no stale active run
|
||||
- do not spend another RunPod run when the paid training already completed; recover adoption instead
|
||||
- RunPod job completed:
|
||||
- endpoint `0rmkf28w2g5gip`
|
||||
- job `a46de2ef-96e0-4adf-bbf8-d7a890e06c6f-e2`
|
||||
- run id `magatamallm-2026-05-09T19-22-53`
|
||||
- target artifact `renefichtmueller/magatama-magatamallm-magatamallm-2026-05-09t19-22-53`
|
||||
- worker summary `RunPod QLoRA complete · train=605 · valid=114`
|
||||
- adoption recovered:
|
||||
- initial local adoption failed because Mac Studio had too little free disk for GGUF conversion after the merged model was written
|
||||
- removed only temporary/import-safe blockers:
|
||||
- failed MagatamaLLM merged `model.safetensors`
|
||||
- already imported FO_BlogLLM and TIP_LLM source GGUF files
|
||||
- old non-active Ollama test model `test-qwen32b:latest`
|
||||
- kept active Ollama aliases intact: `magatama-coder:latest`, `fo-blog-v7`, `tip-llm-v1`
|
||||
- adoption completed:
|
||||
- local candidate `magatamallm-runpod-magatamallm-2026-05-09t19-22-53`
|
||||
- release alias `magatama-coder-r1`
|
||||
- active alias `magatama-coder:latest`
|
||||
- candidate smoke `4/5` passed with the required threshold `4`
|
||||
- direct local smoke returned exact `MAGATAMA-R1-READY`
|
||||
- dashboard/server correction:
|
||||
- deployed a MAGATAMA dashboard server fix so training registry ordering uses `recorded_at`, with `completed_at/adopted_at/created_at` fallbacks
|
||||
- release/version selection now accepts top-level `release_alias` and `candidate_model` on adoption events
|
||||
- legacy MagatamaLLM baseline mismatch guard no longer invalidates the new RunPod lane export
|
||||
- restarted `magatama-dashboard`
|
||||
- live verification:
|
||||
- `magatamallm` reports `activeProvider=ollama:magatama-coder:latest`
|
||||
- `modelVersion=magatama-coder-r1`
|
||||
- `lastRegistryRunStatus=completed_and_adopted`
|
||||
- `activeRun=null`
|
||||
- `hasTrustedTrainingBaseline=true`
|
||||
- `newSinceLastTraining=0`
|
||||
- lane export shows `1367` train, `152` eval, `1519` total
|
||||
- `fo_blogllm` remains `fo-blog-v7-r1`, `activeRun=null`, `newSinceLastTraining=0`
|
||||
- `tip_llm` remains `tip-llm-v1-r1`, `activeRun=null`, `newSinceLastTraining=0`
|
||||
- open:
|
||||
- add more explicit training pairs for the “insufficient evidence => escalate/manual review” behavior because the new MagatamaLLM passed the required smoke threshold but still answered that one eval too passively
|
||||
- complete dual-Gitea mirroring as a separate infrastructure closure item
|
||||
|
||||
- TIP verification artifact cleanup and vendor completion on 2026-05-09:
|
||||
- operator requirement:
|
||||
- continue until all source-backed verification work is exhausted
|
||||
|
||||
@ -0,0 +1,69 @@
|
||||
# MagatamaLLM RunPod Adoption Closure
|
||||
|
||||
Date: 2026-05-09 20:25 UTC
|
||||
|
||||
## What Changed
|
||||
|
||||
- Completed the MagatamaLLM RunPod training closure without launching a new paid RunPod job.
|
||||
- Recovered the local adoption path after the RunPod worker had already trained and uploaded the adapter successfully.
|
||||
- Deployed a MAGATAMA dashboard server fix so the live training status reflects the final adopted model instead of stale `completed_not_adopted` metadata.
|
||||
- Synced the adoption metadata back to Erik and verified the public MAGATAMA status endpoint.
|
||||
|
||||
## Run Details
|
||||
|
||||
- Lane: `magatamallm`
|
||||
- Endpoint: `0rmkf28w2g5gip`
|
||||
- Job: `a46de2ef-96e0-4adf-bbf8-d7a890e06c6f-e2`
|
||||
- Run id: `magatamallm-2026-05-09T19-22-53`
|
||||
- HF artifact: `renefichtmueller/magatama-magatamallm-magatamallm-2026-05-09t19-22-53`
|
||||
- Worker summary: `RunPod QLoRA complete · train=605 · valid=114`
|
||||
- Local candidate: `magatamallm-runpod-magatamallm-2026-05-09t19-22-53`
|
||||
- Release alias: `magatama-coder-r1`
|
||||
- Active alias: `magatama-coder:latest`
|
||||
- Candidate smoke: `4/5` with required threshold `4`
|
||||
- Direct local smoke: exact `MAGATAMA-R1-READY`
|
||||
|
||||
## Failure Recovery
|
||||
|
||||
- First adoption failed because Mac Studio had too little free disk for GGUF conversion after writing the merged model.
|
||||
- Removed only safe temporary/import blockers:
|
||||
- failed MagatamaLLM merged `model.safetensors`
|
||||
- FO_BlogLLM/TIP_LLM source GGUF import files that were already registered in Ollama
|
||||
- old non-active Ollama test model `test-qwen32b:latest`
|
||||
- Active aliases remained intact:
|
||||
- `magatama-coder:latest`
|
||||
- `fo-blog-v7`
|
||||
- `tip-llm-v1`
|
||||
|
||||
## Dashboard Fix
|
||||
|
||||
- Registry ordering now uses `recorded_at` with fallback to `completed_at`, `adopted_at`, and `created_at`.
|
||||
- Successful adoption version selection now accepts top-level `release_alias` and `candidate_model`, not only nested `adoption.*` payloads.
|
||||
- Legacy MagatamaLLM baseline mismatch protection no longer invalidates the RunPod lane export.
|
||||
- Deployed rebuilt `packages/dashboard/dist/server.js` to Erik and restarted `magatama-dashboard`.
|
||||
|
||||
## Live Verification
|
||||
|
||||
- MAGATAMA `magatamallm` status:
|
||||
- `activeProvider=ollama:magatama-coder:latest`
|
||||
- `modelVersion=magatama-coder-r1`
|
||||
- `lastRegistryRunStatus=completed_and_adopted`
|
||||
- `activeRun=null`
|
||||
- `hasTrustedTrainingBaseline=true`
|
||||
- `newSinceLastTraining=0`
|
||||
- `collectedExamples=1367`
|
||||
- `evalExamples=152`
|
||||
- `totalExamples=1519`
|
||||
- FO_BlogLLM stayed healthy:
|
||||
- `modelVersion=fo-blog-v7-r1`
|
||||
- `activeRun=null`
|
||||
- `newSinceLastTraining=0`
|
||||
- TIP_LLM stayed healthy:
|
||||
- `modelVersion=tip-llm-v1-r1`
|
||||
- `activeRun=null`
|
||||
- `newSinceLastTraining=0`
|
||||
|
||||
## Open
|
||||
|
||||
- Add more explicit MagatamaLLM examples for the rule: insufficient evidence means escalate/manual review rather than passive monitoring.
|
||||
- Complete dual-Gitea mirroring separately.
|
||||
Loading…
x
Reference in New Issue
Block a user