49 lines
2.6 KiB
Markdown
49 lines
2.6 KiB
Markdown
# MAGATAMA All-Lane RunPod Training Complete
|
|
|
|
Date: 2026-05-10 02:58 UTC
|
|
|
|
## Result
|
|
|
|
All five MAGATAMA trainable LLM lanes completed a real RunPod training/adoption cycle and are now visible as adopted in the public MAGATAMA status API.
|
|
|
|
## Verified Lanes
|
|
|
|
- `magatamallm`: active `magatama-coder:latest`, model version `magatama-coder-r2`, `1375 train / 153 eval / 1528 total`
|
|
- `fo_blogllm`: active `fo-blog-v8`, model version `fo-blog-v8-r2`, `17342 train / 1929 eval / 19271 total`
|
|
- `tip_llm`: active `tip-llm-v2`, model version `tip-llm-v2-r2`, `276 train / 31 eval / 307 total`
|
|
- `pulso_llm`: active `pulso-llm-v1`, model version `pulso-llm-v1-r1`, `28 train / 5 eval / 33 total`
|
|
- `contact_llm`: active `contact-llm-v1`, model version `contact-llm-v1-r1`, `18 train / 4 eval / 22 total`
|
|
|
|
## Fixes Made
|
|
|
|
- Added/verified first-class local adoption support for `pulso_llm` and `contact_llm`.
|
|
- Added authenticated adoption-report recovery endpoint on the Mac training/adoption service.
|
|
- Hardened dashboard adoption flow so transient network/fetch errors can recover from local adoption reports.
|
|
- Hardened RunPod reconciler so completed jobs can be adopted after a failed live SSE/browser path.
|
|
- Registry success events now include explicit active model, release alias, model version, version counter and candidate model.
|
|
- Rebuilt the MAGATAMA model registry and restarted `magatama-dashboard` after successful TIP and Contact adoption.
|
|
|
|
## Issues Resolved
|
|
|
|
- `pulso_llm` showed `unknown lane: pulso_llm` after RunPod finished; this was a local adoption mapping issue, not a training failure. Pulso is now active.
|
|
- `tip_llm` failed local adoption because Mac disk space dropped below the GGUF conversion threshold. Obsolete non-active Ollama versions and already imported intermediate GGUFs were removed, then TIP was reconciled successfully.
|
|
- `contact_llm` had never been trained before this block. It now has a first adopted version.
|
|
|
|
## Evaluation Notes
|
|
|
|
- ContactLLM smoke test passed `4/5`.
|
|
- Open improvement: ContactLLM should consistently return provenance fields for public business contacts: source URL, timestamp, confidence and contact type.
|
|
|
|
## Operating Rule
|
|
|
|
Do not mark RunPod training successful on `COMPLETED` alone. A successful lane run must have:
|
|
|
|
- uploaded adapter artifact
|
|
- successful local Mac adoption
|
|
- Ollama candidate + release alias + active alias
|
|
- smoke tests meeting threshold
|
|
- registry entry with `completed_and_adopted`
|
|
- public MAGATAMA `/api/llm/status?lane=...` showing the new active model/version
|
|
|
|
No secrets, tokens or credentials are recorded in this handoff.
|