From 3779de5b885f1d68b3f04cc97e3e2ec041e8da1b Mon Sep 17 00:00:00 2001 From: Rene Fichtmueller Date: Sat, 9 May 2026 20:10:27 +0200 Subject: [PATCH] sync: record fo blogllm adoption closure --- sync/CURRENT.md | 46 ++++++++++++++- ...5-09-fo-blogllm-runpod-adoption-closure.md | 57 +++++++++++++++++++ 2 files changed, 102 insertions(+), 1 deletion(-) create mode 100644 sync/history/2026-05-09-fo-blogllm-runpod-adoption-closure.md diff --git a/sync/CURRENT.md b/sync/CURRENT.md index d43bac2..bbd95f8 100644 --- a/sync/CURRENT.md +++ b/sync/CURRENT.md @@ -1,9 +1,53 @@ # Current TIP Sync State -Updated: 2026-05-09 16:20 UTC +Updated: 2026-05-09 18:07 UTC ## Newest Work +- MAGATAMA FO_BlogLLM RunPod training and adoption closure on 2026-05-09: + - operator requirement: + - training success must only count after artifact exists, local import works, smoke tests pass, Ollama alias/version switches, remote MAGATAMA registry is updated, and the live UI reports no active stale job + - no repeat of failed "COMPLETED but nothing adopted" serverless runs + - local Mac Studio training remains throttled by default to avoid saturating the workstation + - RunPod job completed: + - endpoint `0rmkf28w2g5gip` + - job `99d08ef2-9016-4488-ac69-3585c8a09f38-e2` + - run id `fo_blogllm-2026-05-09T17-14-16` + - target artifact `renefichtmueller/magatama-fo-blogllm-fo-blogllm-2026-05-09t17-14-16` + - worker summary `RunPod QLoRA complete · train=11473 · valid=1281` + - failure recovered: + - first local adoption failed because Mac Studio disk filled during F16 GGUF conversion + - removed stale partial F16 GGUF and obsolete merged safetensors to restore free space + - hardened importer to: + - require minimum free disk before conversion + - delete stale partial F16 before retry + - reuse existing GGUF when present + - delete temporary F16 in all cases + - remove merged safetensors/bin after successful Ollama registration unless `.keep-merged` exists + - adoption completed: + - local candidate `fo-blogllm-runpod-fo_blogllm-2026-05-09t17-14-16` + - release alias `fo-blog-v7-r1` + - active alias `fo-blog-v7` + - candidate smoke `5/5` passed + - direct local smoke returned exact `FO-BLOG-V7-READY` + - dashboard/server hardening: + - old baseline smoke is now non-blocking when the active alias does not exist yet; candidate smoke remains mandatory + - deployed updated dashboard bundle, fine-tuner API template, and RunPod-Ollama importer to Erik + - restarted `magatama-dashboard` + - copied `fo_blogllm-last_run.json` and adoption report to Erik + - appended remote training registry event `completed_and_adopted` + - live verification: + - `fo_blogllm` reports `activeProvider=ollama:fo-blog-v7` + - `modelVersion=fo-blog-v7-r1` + - `lastRegistryRunStatus=completed_and_adopted` + - `activeRun=null` + - `collectedExamples=17322`, `evalExamples=1926`, `totalExamples=19267` + - `newSinceLastTraining=0` + - `tip_llm` remains healthy with `tip-llm-v1-r1`, `activeRun=null`, `newSinceLastTraining=0` + - open: + - run the same end-to-end custom-worker/adoption path for `magatamallm` + - complete dual-Gitea mirroring as separate infrastructure closure item + - Near-complete detail queue closed with lightweight vendor detail verifiers on 2026-05-09: - operator requirement: - keep Erik safe; no heavy browser crawler or Playwright wave diff --git a/sync/history/2026-05-09-fo-blogllm-runpod-adoption-closure.md b/sync/history/2026-05-09-fo-blogllm-runpod-adoption-closure.md new file mode 100644 index 0000000..17d27cb --- /dev/null +++ b/sync/history/2026-05-09-fo-blogllm-runpod-adoption-closure.md @@ -0,0 +1,57 @@ +# FO_BlogLLM RunPod Adoption Closure + +Date: 2026-05-09 18:07 UTC + +## What Changed + +- Completed the FO_BlogLLM RunPod full training and local adoption path. +- Recovered from the first adoption failure caused by Mac Studio disk exhaustion during F16 GGUF conversion. +- Hardened the local importer and Train API so future RunPod jobs fail only on real candidate/adoption problems, not on missing previous baseline aliases. +- Deployed the updated MAGATAMA dashboard bundle and training helper files to Erik. +- Synced the successful adoption metadata back into MAGATAMA's remote training registry. + +## Run Details + +- Lane: `fo_blogllm` +- Endpoint: `0rmkf28w2g5gip` +- Job: `99d08ef2-9016-4488-ac69-3585c8a09f38-e2` +- Run id: `fo_blogllm-2026-05-09T17-14-16` +- HF artifact: `renefichtmueller/magatama-fo-blogllm-fo-blogllm-2026-05-09t17-14-16` +- Local candidate: `fo-blogllm-runpod-fo_blogllm-2026-05-09t17-14-16` +- Release alias: `fo-blog-v7-r1` +- Active alias: `fo-blog-v7` +- Candidate smoke: `5/5` +- Direct local smoke: exact `FO-BLOG-V7-READY` + +## Live Verification + +- MAGATAMA FO_BlogLLM status: + - `activeProvider=ollama:fo-blog-v7` + - `modelVersion=fo-blog-v7-r1` + - `lastRegistryRunStatus=completed_and_adopted` + - `activeRun=null` + - `newSinceLastTraining=0` + - `collectedExamples=17322` + - `evalExamples=1926` + - `totalExamples=19267` +- TIP_LLM stayed healthy: + - `activeProvider=ollama:tip-llm-v1` + - `modelVersion=tip-llm-v1-r1` + - `activeRun=null` + - `newSinceLastTraining=0` + +## Decisions + +- Baseline smoke is comparison-only and must not block first adoption if the old active alias is missing. +- Candidate smoke remains mandatory and blocks adoption if it fails. +- Importer must keep Mac Studio safe: + - verify enough free disk before conversion + - delete stale partial F16 files before retry + - delete F16 temp files in all cases + - clean merged safetensors/bin after successful registration unless `.keep-merged` exists +- A RunPod run is not considered successful in MAGATAMA until the active Ollama alias and remote registry both reflect the new release. + +## Open + +- Repeat the same custom-worker/adoption closure for `magatamallm`. +- Complete Gitea-to-Proxmox Gitea mirroring separately.