sync: record magatamallm adoption closure

2026-05-09 22:28:49 +02:00 · 2026-05-09 22:28:49 +02:00 · de2943ea79
commit de2943ea79
parent 1af4f090f7
2 changed files with 111 additions and 0 deletions
--- a/sync/CURRENT.md
+++ b/sync/CURRENT.md
@ -4,6 +4,48 @@ Updated: 2026-05-09 20:12 UTC
 ## Newest Work
 - MAGATAMA MagatamaLLM RunPod training and adoption closure on 2026-05-09:
  - operator requirement:
    - RunPod success only counts after artifact exists, local Ollama import works, smoke tests pass, aliases/version switch, remote registry is updated, and live MAGATAMA reports no stale active run
    - do not spend another RunPod run when the paid training already completed; recover adoption instead
  - RunPod job completed:
    - endpoint `0rmkf28w2g5gip`
    - job `a46de2ef-96e0-4adf-bbf8-d7a890e06c6f-e2`
    - run id `magatamallm-2026-05-09T19-22-53`
    - target artifact `renefichtmueller/magatama-magatamallm-magatamallm-2026-05-09t19-22-53`
    - worker summary `RunPod QLoRA complete · train=605 · valid=114`
  - adoption recovered:
    - initial local adoption failed because Mac Studio had too little free disk for GGUF conversion after the merged model was written
    - removed only temporary/import-safe blockers:
      - failed MagatamaLLM merged `model.safetensors`
      - already imported FO_BlogLLM and TIP_LLM source GGUF files
      - old non-active Ollama test model `test-qwen32b:latest`
    - kept active Ollama aliases intact: `magatama-coder:latest`, `fo-blog-v7`, `tip-llm-v1`
  - adoption completed:
    - local candidate `magatamallm-runpod-magatamallm-2026-05-09t19-22-53`
    - release alias `magatama-coder-r1`
    - active alias `magatama-coder:latest`
    - candidate smoke `4/5` passed with the required threshold `4`
    - direct local smoke returned exact `MAGATAMA-R1-READY`
  - dashboard/server correction:
    - deployed a MAGATAMA dashboard server fix so training registry ordering uses `recorded_at`, with `completed_at/adopted_at/created_at` fallbacks
    - release/version selection now accepts top-level `release_alias` and `candidate_model` on adoption events
    - legacy MagatamaLLM baseline mismatch guard no longer invalidates the new RunPod lane export
    - restarted `magatama-dashboard`
  - live verification:
    - `magatamallm` reports `activeProvider=ollama:magatama-coder:latest`
    - `modelVersion=magatama-coder-r1`
    - `lastRegistryRunStatus=completed_and_adopted`
    - `activeRun=null`
    - `hasTrustedTrainingBaseline=true`
    - `newSinceLastTraining=0`
    - lane export shows `1367` train, `152` eval, `1519` total
    - `fo_blogllm` remains `fo-blog-v7-r1`, `activeRun=null`, `newSinceLastTraining=0`
    - `tip_llm` remains `tip-llm-v1-r1`, `activeRun=null`, `newSinceLastTraining=0`
  - open:
    - add more explicit training pairs for the “insufficient evidence => escalate/manual review” behavior because the new MagatamaLLM passed the required smoke threshold but still answered that one eval too passively
    - complete dual-Gitea mirroring as a separate infrastructure closure item
 - TIP verification artifact cleanup and vendor completion on 2026-05-09:
  - operator requirement:
    - continue until all source-backed verification work is exhausted
--- a/sync/history/2026-05-09-magatamallm-runpod-adoption-closure.md
+++ b/sync/history/2026-05-09-magatamallm-runpod-adoption-closure.md
@ -0,0 +1,69 @@
 # MagatamaLLM RunPod Adoption Closure
 Date: 2026-05-09 20:25 UTC
 ## What Changed
 - Completed the MagatamaLLM RunPod training closure without launching a new paid RunPod job.
 - Recovered the local adoption path after the RunPod worker had already trained and uploaded the adapter successfully.
 - Deployed a MAGATAMA dashboard server fix so the live training status reflects the final adopted model instead of stale `completed_not_adopted` metadata.
 - Synced the adoption metadata back to Erik and verified the public MAGATAMA status endpoint.
 ## Run Details
 - Lane: `magatamallm`
 - Endpoint: `0rmkf28w2g5gip`
 - Job: `a46de2ef-96e0-4adf-bbf8-d7a890e06c6f-e2`
 - Run id: `magatamallm-2026-05-09T19-22-53`
 - HF artifact: `renefichtmueller/magatama-magatamallm-magatamallm-2026-05-09t19-22-53`
 - Worker summary: `RunPod QLoRA complete · train=605 · valid=114`
 - Local candidate: `magatamallm-runpod-magatamallm-2026-05-09t19-22-53`
 - Release alias: `magatama-coder-r1`
 - Active alias: `magatama-coder:latest`
 - Candidate smoke: `4/5` with required threshold `4`
 - Direct local smoke: exact `MAGATAMA-R1-READY`
 ## Failure Recovery
 - First adoption failed because Mac Studio had too little free disk for GGUF conversion after writing the merged model.
 - Removed only safe temporary/import blockers:
  - failed MagatamaLLM merged `model.safetensors`
  - FO_BlogLLM/TIP_LLM source GGUF import files that were already registered in Ollama
  - old non-active Ollama test model `test-qwen32b:latest`
 - Active aliases remained intact:
  - `magatama-coder:latest`
  - `fo-blog-v7`
  - `tip-llm-v1`
 ## Dashboard Fix
 - Registry ordering now uses `recorded_at` with fallback to `completed_at`, `adopted_at`, and `created_at`.
 - Successful adoption version selection now accepts top-level `release_alias` and `candidate_model`, not only nested `adoption.*` payloads.
 - Legacy MagatamaLLM baseline mismatch protection no longer invalidates the RunPod lane export.
 - Deployed rebuilt `packages/dashboard/dist/server.js` to Erik and restarted `magatama-dashboard`.
 ## Live Verification
 - MAGATAMA `magatamallm` status:
  - `activeProvider=ollama:magatama-coder:latest`
  - `modelVersion=magatama-coder-r1`
  - `lastRegistryRunStatus=completed_and_adopted`
  - `activeRun=null`
  - `hasTrustedTrainingBaseline=true`
  - `newSinceLastTraining=0`
  - `collectedExamples=1367`
  - `evalExamples=152`
  - `totalExamples=1519`
 - FO_BlogLLM stayed healthy:
  - `modelVersion=fo-blog-v7-r1`
  - `activeRun=null`
  - `newSinceLastTraining=0`
 - TIP_LLM stayed healthy:
  - `modelVersion=tip-llm-v1-r1`
  - `activeRun=null`
  - `newSinceLastTraining=0`
 ## Open
 - Add more explicit MagatamaLLM examples for the rule: insufficient evidence means escalate/manual review rather than passive monitoring.
 - Complete dual-Gitea mirroring separately.