51 lines
1.6 KiB
Markdown
51 lines
1.6 KiB
Markdown
# 2026-05-06 — MAGATAMA RunPod Status Truthfulness
|
|
|
|
## Why this was needed
|
|
|
|
After the script/registry repair, MAGATAMA could refresh the local RunPod datasets again, but the operator-facing status flow was still too coarse:
|
|
|
|
- failures in local dataset preparation
|
|
- failures in optional Hugging Face publish
|
|
- and actual RunPod availability
|
|
|
|
were too easy to confuse.
|
|
|
|
This produced the impression that “RunPod is broken” even when the real problem was just dataset preparation on Erik.
|
|
|
|
## Changes
|
|
|
|
Patched:
|
|
|
|
- `magatama/packages/dashboard/src/server.ts`
|
|
|
|
Behavior now:
|
|
|
|
- dataset source is normalized to either:
|
|
- `huggingface`
|
|
- `url`
|
|
- local dataset refresh (`training:refresh-all`) is wrapped with a dedicated error:
|
|
- `Dataset-Refresh fehlgeschlagen: ...`
|
|
- Hugging Face publish is wrapped with a dedicated error:
|
|
- `HuggingFace-Publish fehlgeschlagen: ...`
|
|
- if Hugging Face mode is selected but `HF_TOKEN` is missing, this is reported directly
|
|
- after successful preparation, the SSE stream now explicitly states:
|
|
- Hugging Face dataset source in use
|
|
- or URL-bundle dataset source in use, with no external publish required
|
|
|
|
## Live effect
|
|
|
|
The dashboard process was rebuilt and restarted on Erik after this change.
|
|
|
|
Result:
|
|
|
|
- RunPod preparation status is more honest
|
|
- operators can distinguish:
|
|
- data refresh problem
|
|
- optional external publish problem
|
|
- actual RunPod training job submission/polling problem
|
|
|
|
## Notes
|
|
|
|
- This does not itself force a Hugging Face publish.
|
|
- It only makes the control plane truthful about what step is happening and what actually failed.
|