sync: record fo blogllm adoption closure

This commit is contained in:
Rene Fichtmueller 2026-05-09 20:10:27 +02:00
parent 56ed88ac8c
commit 3779de5b88
2 changed files with 102 additions and 1 deletions

View File

@ -1,9 +1,53 @@
# Current TIP Sync State
Updated: 2026-05-09 16:20 UTC
Updated: 2026-05-09 18:07 UTC
## Newest Work
- MAGATAMA FO_BlogLLM RunPod training and adoption closure on 2026-05-09:
- operator requirement:
- training success must only count after artifact exists, local import works, smoke tests pass, Ollama alias/version switches, remote MAGATAMA registry is updated, and the live UI reports no active stale job
- no repeat of failed "COMPLETED but nothing adopted" serverless runs
- local Mac Studio training remains throttled by default to avoid saturating the workstation
- RunPod job completed:
- endpoint `0rmkf28w2g5gip`
- job `99d08ef2-9016-4488-ac69-3585c8a09f38-e2`
- run id `fo_blogllm-2026-05-09T17-14-16`
- target artifact `renefichtmueller/magatama-fo-blogllm-fo-blogllm-2026-05-09t17-14-16`
- worker summary `RunPod QLoRA complete · train=11473 · valid=1281`
- failure recovered:
- first local adoption failed because Mac Studio disk filled during F16 GGUF conversion
- removed stale partial F16 GGUF and obsolete merged safetensors to restore free space
- hardened importer to:
- require minimum free disk before conversion
- delete stale partial F16 before retry
- reuse existing GGUF when present
- delete temporary F16 in all cases
- remove merged safetensors/bin after successful Ollama registration unless `.keep-merged` exists
- adoption completed:
- local candidate `fo-blogllm-runpod-fo_blogllm-2026-05-09t17-14-16`
- release alias `fo-blog-v7-r1`
- active alias `fo-blog-v7`
- candidate smoke `5/5` passed
- direct local smoke returned exact `FO-BLOG-V7-READY`
- dashboard/server hardening:
- old baseline smoke is now non-blocking when the active alias does not exist yet; candidate smoke remains mandatory
- deployed updated dashboard bundle, fine-tuner API template, and RunPod-Ollama importer to Erik
- restarted `magatama-dashboard`
- copied `fo_blogllm-last_run.json` and adoption report to Erik
- appended remote training registry event `completed_and_adopted`
- live verification:
- `fo_blogllm` reports `activeProvider=ollama:fo-blog-v7`
- `modelVersion=fo-blog-v7-r1`
- `lastRegistryRunStatus=completed_and_adopted`
- `activeRun=null`
- `collectedExamples=17322`, `evalExamples=1926`, `totalExamples=19267`
- `newSinceLastTraining=0`
- `tip_llm` remains healthy with `tip-llm-v1-r1`, `activeRun=null`, `newSinceLastTraining=0`
- open:
- run the same end-to-end custom-worker/adoption path for `magatamallm`
- complete dual-Gitea mirroring as separate infrastructure closure item
- Near-complete detail queue closed with lightweight vendor detail verifiers on 2026-05-09:
- operator requirement:
- keep Erik safe; no heavy browser crawler or Playwright wave

View File

@ -0,0 +1,57 @@
# FO_BlogLLM RunPod Adoption Closure
Date: 2026-05-09 18:07 UTC
## What Changed
- Completed the FO_BlogLLM RunPod full training and local adoption path.
- Recovered from the first adoption failure caused by Mac Studio disk exhaustion during F16 GGUF conversion.
- Hardened the local importer and Train API so future RunPod jobs fail only on real candidate/adoption problems, not on missing previous baseline aliases.
- Deployed the updated MAGATAMA dashboard bundle and training helper files to Erik.
- Synced the successful adoption metadata back into MAGATAMA's remote training registry.
## Run Details
- Lane: `fo_blogllm`
- Endpoint: `0rmkf28w2g5gip`
- Job: `99d08ef2-9016-4488-ac69-3585c8a09f38-e2`
- Run id: `fo_blogllm-2026-05-09T17-14-16`
- HF artifact: `renefichtmueller/magatama-fo-blogllm-fo-blogllm-2026-05-09t17-14-16`
- Local candidate: `fo-blogllm-runpod-fo_blogllm-2026-05-09t17-14-16`
- Release alias: `fo-blog-v7-r1`
- Active alias: `fo-blog-v7`
- Candidate smoke: `5/5`
- Direct local smoke: exact `FO-BLOG-V7-READY`
## Live Verification
- MAGATAMA FO_BlogLLM status:
- `activeProvider=ollama:fo-blog-v7`
- `modelVersion=fo-blog-v7-r1`
- `lastRegistryRunStatus=completed_and_adopted`
- `activeRun=null`
- `newSinceLastTraining=0`
- `collectedExamples=17322`
- `evalExamples=1926`
- `totalExamples=19267`
- TIP_LLM stayed healthy:
- `activeProvider=ollama:tip-llm-v1`
- `modelVersion=tip-llm-v1-r1`
- `activeRun=null`
- `newSinceLastTraining=0`
## Decisions
- Baseline smoke is comparison-only and must not block first adoption if the old active alias is missing.
- Candidate smoke remains mandatory and blocks adoption if it fails.
- Importer must keep Mac Studio safe:
- verify enough free disk before conversion
- delete stale partial F16 files before retry
- delete F16 temp files in all cases
- clean merged safetensors/bin after successful registration unless `.keep-merged` exists
- A RunPod run is not considered successful in MAGATAMA until the active Ollama alias and remote registry both reflect the new release.
## Open
- Repeat the same custom-worker/adoption closure for `magatamallm`.
- Complete Gitea-to-Proxmox Gitea mirroring separately.