transceiver-db/sync/history/2026-05-06-magatama-lane-specific-training-pools-and-url-runpod.md

# MAGATAMA Lane-Specific Training Pools + URL RunPod Dataset Mode

Date: 2026-05-06
Author: Codex

## Problem

The MAGATAMA training modal still showed the `magatamallm` pool even when the operator selected:

- `FO_BlogLLM`
- `TIP_LLM`

As a result, the UI implied that all training lanes reused the same pool and counts.

At the same time, RunPod launches still depended on Hugging Face dataset publication unless explicitly changed.

## Root Cause

1. The training modal fetched:

- `/api/llm/status`

without a lane parameter.

2. The backend status route therefore always returned the default `magatamallm` training corpus/lane.

3. Dashboard env on Erik was still effectively using the Hugging Face dataset path for RunPod dataset source.

## Fix

### Lane-aware status

`/api/llm/status` now accepts the selected lane and returns lane-specific training metadata.

The training modal was updated to:

- fetch `/api/llm/status?lane=<selected lane>`
- update title and runtime text per lane
- show lane-specific:
  - manifest path
  - train/eval/total counts
  - dataset source

### URL dataset mode

The live dashboard environment on Erik was updated through `ecosystem.config.cjs`:

- `RUNPOD_DATASET_SOURCE=url`
- `RUNPOD_DATASET_SOURCE_MAGATAMALLM=url`
- `RUNPOD_DATASET_SOURCE_FO_BLOGLLM=url`
- `RUNPOD_DATASET_SOURCE_TIP_LLM=url`
- `MAGATAMA_PUBLIC_BASE_URL=https://magatama.fichtmueller.org`

Then `magatama-dashboard` was restarted with `--update-env`.

## Live Verification

Verified directly on Erik through:

- `http://127.0.0.1:3211/api/llm/status?lane=fo_blogllm`
- `http://127.0.0.1:3211/api/llm/status?lane=tip_llm`

### `fo_blogllm`

- `datasetSource = url`
- `collectionsPath = /opt/magatama/training-data/runpod/fo_blogllm/manifest.json`
- `trainFile = /opt/magatama/training-data/runpod/fo_blogllm/fo_blogllm-sft-train.jsonl`
- `validFile = /opt/magatama/training-data/runpod/fo_blogllm/fo_blogllm-sft-eval.jsonl`
- `collectedExamples = 28`
- `evalExamples = 4`
- `totalExamples = 32`

### `tip_llm`

- `datasetSource = url`
- `collectionsPath = /opt/magatama/training-data/runpod/tip_llm/manifest.json`
- `trainFile = /opt/magatama/training-data/runpod/tip_llm/tip_llm-sft-train.jsonl`
- `validFile = /opt/magatama/training-data/runpod/tip_llm/tip_llm-sft-eval.jsonl`
- `collectedExamples = 36`
- `evalExamples = 4`
- `totalExamples = 40`

### `magatamallm`

Still correctly shows the larger lane export:

- `collectedExamples = 15620`
- `evalExamples = 1736`
- `totalExamples = 17356`

## Operational Meaning

MAGATAMA training is now materially closer to the intended fully automated flow:

- each LLM lane shows and uses its own pool
- RunPod dataset preparation no longer requires Hugging Face dataset publication
- dataset fetch comes from MAGATAMA URL-bundle / lane export

This removes one major manual/external blocker from the RunPod training path.

## Remaining Truth

This fix does **not** automatically prove that every RunPod worker run itself succeeds end-to-end.

What is fixed:

- lane-specific training pool selection
- lane-specific UI/status
- URL dataset source activation

What still depends on RunPod worker behavior:

- real successful training execution
- durable model artifact production
- artifact adoption after completion