sync: record fo blogllm adoption closure
This commit is contained in:
parent
56ed88ac8c
commit
3779de5b88
@ -1,9 +1,53 @@
|
||||
# Current TIP Sync State
|
||||
|
||||
Updated: 2026-05-09 16:20 UTC
|
||||
Updated: 2026-05-09 18:07 UTC
|
||||
|
||||
## Newest Work
|
||||
|
||||
- MAGATAMA FO_BlogLLM RunPod training and adoption closure on 2026-05-09:
|
||||
- operator requirement:
|
||||
- training success must only count after artifact exists, local import works, smoke tests pass, Ollama alias/version switches, remote MAGATAMA registry is updated, and the live UI reports no active stale job
|
||||
- no repeat of failed "COMPLETED but nothing adopted" serverless runs
|
||||
- local Mac Studio training remains throttled by default to avoid saturating the workstation
|
||||
- RunPod job completed:
|
||||
- endpoint `0rmkf28w2g5gip`
|
||||
- job `99d08ef2-9016-4488-ac69-3585c8a09f38-e2`
|
||||
- run id `fo_blogllm-2026-05-09T17-14-16`
|
||||
- target artifact `renefichtmueller/magatama-fo-blogllm-fo-blogllm-2026-05-09t17-14-16`
|
||||
- worker summary `RunPod QLoRA complete · train=11473 · valid=1281`
|
||||
- failure recovered:
|
||||
- first local adoption failed because Mac Studio disk filled during F16 GGUF conversion
|
||||
- removed stale partial F16 GGUF and obsolete merged safetensors to restore free space
|
||||
- hardened importer to:
|
||||
- require minimum free disk before conversion
|
||||
- delete stale partial F16 before retry
|
||||
- reuse existing GGUF when present
|
||||
- delete temporary F16 in all cases
|
||||
- remove merged safetensors/bin after successful Ollama registration unless `.keep-merged` exists
|
||||
- adoption completed:
|
||||
- local candidate `fo-blogllm-runpod-fo_blogllm-2026-05-09t17-14-16`
|
||||
- release alias `fo-blog-v7-r1`
|
||||
- active alias `fo-blog-v7`
|
||||
- candidate smoke `5/5` passed
|
||||
- direct local smoke returned exact `FO-BLOG-V7-READY`
|
||||
- dashboard/server hardening:
|
||||
- old baseline smoke is now non-blocking when the active alias does not exist yet; candidate smoke remains mandatory
|
||||
- deployed updated dashboard bundle, fine-tuner API template, and RunPod-Ollama importer to Erik
|
||||
- restarted `magatama-dashboard`
|
||||
- copied `fo_blogllm-last_run.json` and adoption report to Erik
|
||||
- appended remote training registry event `completed_and_adopted`
|
||||
- live verification:
|
||||
- `fo_blogllm` reports `activeProvider=ollama:fo-blog-v7`
|
||||
- `modelVersion=fo-blog-v7-r1`
|
||||
- `lastRegistryRunStatus=completed_and_adopted`
|
||||
- `activeRun=null`
|
||||
- `collectedExamples=17322`, `evalExamples=1926`, `totalExamples=19267`
|
||||
- `newSinceLastTraining=0`
|
||||
- `tip_llm` remains healthy with `tip-llm-v1-r1`, `activeRun=null`, `newSinceLastTraining=0`
|
||||
- open:
|
||||
- run the same end-to-end custom-worker/adoption path for `magatamallm`
|
||||
- complete dual-Gitea mirroring as separate infrastructure closure item
|
||||
|
||||
- Near-complete detail queue closed with lightweight vendor detail verifiers on 2026-05-09:
|
||||
- operator requirement:
|
||||
- keep Erik safe; no heavy browser crawler or Playwright wave
|
||||
|
||||
@ -0,0 +1,57 @@
|
||||
# FO_BlogLLM RunPod Adoption Closure
|
||||
|
||||
Date: 2026-05-09 18:07 UTC
|
||||
|
||||
## What Changed
|
||||
|
||||
- Completed the FO_BlogLLM RunPod full training and local adoption path.
|
||||
- Recovered from the first adoption failure caused by Mac Studio disk exhaustion during F16 GGUF conversion.
|
||||
- Hardened the local importer and Train API so future RunPod jobs fail only on real candidate/adoption problems, not on missing previous baseline aliases.
|
||||
- Deployed the updated MAGATAMA dashboard bundle and training helper files to Erik.
|
||||
- Synced the successful adoption metadata back into MAGATAMA's remote training registry.
|
||||
|
||||
## Run Details
|
||||
|
||||
- Lane: `fo_blogllm`
|
||||
- Endpoint: `0rmkf28w2g5gip`
|
||||
- Job: `99d08ef2-9016-4488-ac69-3585c8a09f38-e2`
|
||||
- Run id: `fo_blogllm-2026-05-09T17-14-16`
|
||||
- HF artifact: `renefichtmueller/magatama-fo-blogllm-fo-blogllm-2026-05-09t17-14-16`
|
||||
- Local candidate: `fo-blogllm-runpod-fo_blogllm-2026-05-09t17-14-16`
|
||||
- Release alias: `fo-blog-v7-r1`
|
||||
- Active alias: `fo-blog-v7`
|
||||
- Candidate smoke: `5/5`
|
||||
- Direct local smoke: exact `FO-BLOG-V7-READY`
|
||||
|
||||
## Live Verification
|
||||
|
||||
- MAGATAMA FO_BlogLLM status:
|
||||
- `activeProvider=ollama:fo-blog-v7`
|
||||
- `modelVersion=fo-blog-v7-r1`
|
||||
- `lastRegistryRunStatus=completed_and_adopted`
|
||||
- `activeRun=null`
|
||||
- `newSinceLastTraining=0`
|
||||
- `collectedExamples=17322`
|
||||
- `evalExamples=1926`
|
||||
- `totalExamples=19267`
|
||||
- TIP_LLM stayed healthy:
|
||||
- `activeProvider=ollama:tip-llm-v1`
|
||||
- `modelVersion=tip-llm-v1-r1`
|
||||
- `activeRun=null`
|
||||
- `newSinceLastTraining=0`
|
||||
|
||||
## Decisions
|
||||
|
||||
- Baseline smoke is comparison-only and must not block first adoption if the old active alias is missing.
|
||||
- Candidate smoke remains mandatory and blocks adoption if it fails.
|
||||
- Importer must keep Mac Studio safe:
|
||||
- verify enough free disk before conversion
|
||||
- delete stale partial F16 files before retry
|
||||
- delete F16 temp files in all cases
|
||||
- clean merged safetensors/bin after successful registration unless `.keep-merged` exists
|
||||
- A RunPod run is not considered successful in MAGATAMA until the active Ollama alias and remote registry both reflect the new release.
|
||||
|
||||
## Open
|
||||
|
||||
- Repeat the same custom-worker/adoption closure for `magatamallm`.
|
||||
- Complete Gitea-to-Proxmox Gitea mirroring separately.
|
||||
Loading…
x
Reference in New Issue
Block a user