3.2 KiB
MAGATAMA Lane-Specific Training Pools + URL RunPod Dataset Mode
Date: 2026-05-06 Author: Codex
Problem
The MAGATAMA training modal still showed the magatamallm pool even when the operator selected:
FO_BlogLLMTIP_LLM
As a result, the UI implied that all training lanes reused the same pool and counts.
At the same time, RunPod launches still depended on Hugging Face dataset publication unless explicitly changed.
Root Cause
- The training modal fetched:
/api/llm/status
without a lane parameter.
-
The backend status route therefore always returned the default
magatamallmtraining corpus/lane. -
Dashboard env on Erik was still effectively using the Hugging Face dataset path for RunPod dataset source.
Fix
Lane-aware status
/api/llm/status now accepts the selected lane and returns lane-specific training metadata.
The training modal was updated to:
- fetch
/api/llm/status?lane=<selected lane> - update title and runtime text per lane
- show lane-specific:
- manifest path
- train/eval/total counts
- dataset source
URL dataset mode
The live dashboard environment on Erik was updated through ecosystem.config.cjs:
RUNPOD_DATASET_SOURCE=urlRUNPOD_DATASET_SOURCE_MAGATAMALLM=urlRUNPOD_DATASET_SOURCE_FO_BLOGLLM=urlRUNPOD_DATASET_SOURCE_TIP_LLM=urlMAGATAMA_PUBLIC_BASE_URL=https://magatama.fichtmueller.org
Then magatama-dashboard was restarted with --update-env.
Live Verification
Verified directly on Erik through:
http://127.0.0.1:3211/api/llm/status?lane=fo_blogllmhttp://127.0.0.1:3211/api/llm/status?lane=tip_llm
fo_blogllm
datasetSource = urlcollectionsPath = /opt/magatama/training-data/runpod/fo_blogllm/manifest.jsontrainFile = /opt/magatama/training-data/runpod/fo_blogllm/fo_blogllm-sft-train.jsonlvalidFile = /opt/magatama/training-data/runpod/fo_blogllm/fo_blogllm-sft-eval.jsonlcollectedExamples = 28evalExamples = 4totalExamples = 32
tip_llm
datasetSource = urlcollectionsPath = /opt/magatama/training-data/runpod/tip_llm/manifest.jsontrainFile = /opt/magatama/training-data/runpod/tip_llm/tip_llm-sft-train.jsonlvalidFile = /opt/magatama/training-data/runpod/tip_llm/tip_llm-sft-eval.jsonlcollectedExamples = 36evalExamples = 4totalExamples = 40
magatamallm
Still correctly shows the larger lane export:
collectedExamples = 15620evalExamples = 1736totalExamples = 17356
Operational Meaning
MAGATAMA training is now materially closer to the intended fully automated flow:
- each LLM lane shows and uses its own pool
- RunPod dataset preparation no longer requires Hugging Face dataset publication
- dataset fetch comes from MAGATAMA URL-bundle / lane export
This removes one major manual/external blocker from the RunPod training path.
Remaining Truth
This fix does not automatically prove that every RunPod worker run itself succeeds end-to-end.
What is fixed:
- lane-specific training pool selection
- lane-specific UI/status
- URL dataset source activation
What still depends on RunPod worker behavior:
- real successful training execution
- durable model artifact production
- artifact adoption after completion