transceiver-db/sync/history/2026-05-10-magatama-all-lane-runpod-training-complete.md
2026-05-10 04:59:46 +02:00

2.6 KiB

MAGATAMA All-Lane RunPod Training Complete

Date: 2026-05-10 02:58 UTC

Result

All five MAGATAMA trainable LLM lanes completed a real RunPod training/adoption cycle and are now visible as adopted in the public MAGATAMA status API.

Verified Lanes

  • magatamallm: active magatama-coder:latest, model version magatama-coder-r2, 1375 train / 153 eval / 1528 total
  • fo_blogllm: active fo-blog-v8, model version fo-blog-v8-r2, 17342 train / 1929 eval / 19271 total
  • tip_llm: active tip-llm-v2, model version tip-llm-v2-r2, 276 train / 31 eval / 307 total
  • pulso_llm: active pulso-llm-v1, model version pulso-llm-v1-r1, 28 train / 5 eval / 33 total
  • contact_llm: active contact-llm-v1, model version contact-llm-v1-r1, 18 train / 4 eval / 22 total

Fixes Made

  • Added/verified first-class local adoption support for pulso_llm and contact_llm.
  • Added authenticated adoption-report recovery endpoint on the Mac training/adoption service.
  • Hardened dashboard adoption flow so transient network/fetch errors can recover from local adoption reports.
  • Hardened RunPod reconciler so completed jobs can be adopted after a failed live SSE/browser path.
  • Registry success events now include explicit active model, release alias, model version, version counter and candidate model.
  • Rebuilt the MAGATAMA model registry and restarted magatama-dashboard after successful TIP and Contact adoption.

Issues Resolved

  • pulso_llm showed unknown lane: pulso_llm after RunPod finished; this was a local adoption mapping issue, not a training failure. Pulso is now active.
  • tip_llm failed local adoption because Mac disk space dropped below the GGUF conversion threshold. Obsolete non-active Ollama versions and already imported intermediate GGUFs were removed, then TIP was reconciled successfully.
  • contact_llm had never been trained before this block. It now has a first adopted version.

Evaluation Notes

  • ContactLLM smoke test passed 4/5.
  • Open improvement: ContactLLM should consistently return provenance fields for public business contacts: source URL, timestamp, confidence and contact type.

Operating Rule

Do not mark RunPod training successful on COMPLETED alone. A successful lane run must have:

  • uploaded adapter artifact
  • successful local Mac adoption
  • Ollama candidate + release alias + active alias
  • smoke tests meeting threshold
  • registry entry with completed_and_adopted
  • public MAGATAMA /api/llm/status?lane=... showing the new active model/version

No secrets, tokens or credentials are recorded in this handoff.