# MAGATAMA Lane-Specific Training Pools + URL RunPod Dataset Mode Date: 2026-05-06 Author: Codex ## Problem The MAGATAMA training modal still showed the `magatamallm` pool even when the operator selected: - `FO_BlogLLM` - `TIP_LLM` As a result, the UI implied that all training lanes reused the same pool and counts. At the same time, RunPod launches still depended on Hugging Face dataset publication unless explicitly changed. ## Root Cause 1. The training modal fetched: - `/api/llm/status` without a lane parameter. 2. The backend status route therefore always returned the default `magatamallm` training corpus/lane. 3. Dashboard env on Erik was still effectively using the Hugging Face dataset path for RunPod dataset source. ## Fix ### Lane-aware status `/api/llm/status` now accepts the selected lane and returns lane-specific training metadata. The training modal was updated to: - fetch `/api/llm/status?lane=` - update title and runtime text per lane - show lane-specific: - manifest path - train/eval/total counts - dataset source ### URL dataset mode The live dashboard environment on Erik was updated through `ecosystem.config.cjs`: - `RUNPOD_DATASET_SOURCE=url` - `RUNPOD_DATASET_SOURCE_MAGATAMALLM=url` - `RUNPOD_DATASET_SOURCE_FO_BLOGLLM=url` - `RUNPOD_DATASET_SOURCE_TIP_LLM=url` - `MAGATAMA_PUBLIC_BASE_URL=https://magatama.fichtmueller.org` Then `magatama-dashboard` was restarted with `--update-env`. ## Live Verification Verified directly on Erik through: - `http://127.0.0.1:3211/api/llm/status?lane=fo_blogllm` - `http://127.0.0.1:3211/api/llm/status?lane=tip_llm` ### `fo_blogllm` - `datasetSource = url` - `collectionsPath = /opt/magatama/training-data/runpod/fo_blogllm/manifest.json` - `trainFile = /opt/magatama/training-data/runpod/fo_blogllm/fo_blogllm-sft-train.jsonl` - `validFile = /opt/magatama/training-data/runpod/fo_blogllm/fo_blogllm-sft-eval.jsonl` - `collectedExamples = 28` - `evalExamples = 4` - `totalExamples = 32` ### `tip_llm` - `datasetSource = url` - `collectionsPath = /opt/magatama/training-data/runpod/tip_llm/manifest.json` - `trainFile = /opt/magatama/training-data/runpod/tip_llm/tip_llm-sft-train.jsonl` - `validFile = /opt/magatama/training-data/runpod/tip_llm/tip_llm-sft-eval.jsonl` - `collectedExamples = 36` - `evalExamples = 4` - `totalExamples = 40` ### `magatamallm` Still correctly shows the larger lane export: - `collectedExamples = 15620` - `evalExamples = 1736` - `totalExamples = 17356` ## Operational Meaning MAGATAMA training is now materially closer to the intended fully automated flow: - each LLM lane shows and uses its own pool - RunPod dataset preparation no longer requires Hugging Face dataset publication - dataset fetch comes from MAGATAMA URL-bundle / lane export This removes one major manual/external blocker from the RunPod training path. ## Remaining Truth This fix does **not** automatically prove that every RunPod worker run itself succeeds end-to-end. What is fixed: - lane-specific training pool selection - lane-specific UI/status - URL dataset source activation What still depends on RunPod worker behavior: - real successful training execution - durable model artifact production - artifact adoption after completion