transceiver-db/sync/history/2026-05-06-magatama-runpod-status-truthfulness.md
2026-05-06 12:18:17 +02:00

1.6 KiB

2026-05-06 — MAGATAMA RunPod Status Truthfulness

Why this was needed

After the script/registry repair, MAGATAMA could refresh the local RunPod datasets again, but the operator-facing status flow was still too coarse:

  • failures in local dataset preparation
  • failures in optional Hugging Face publish
  • and actual RunPod availability

were too easy to confuse.

This produced the impression that “RunPod is broken” even when the real problem was just dataset preparation on Erik.

Changes

Patched:

  • magatama/packages/dashboard/src/server.ts

Behavior now:

  • dataset source is normalized to either:
    • huggingface
    • url
  • local dataset refresh (training:refresh-all) is wrapped with a dedicated error:
    • Dataset-Refresh fehlgeschlagen: ...
  • Hugging Face publish is wrapped with a dedicated error:
    • HuggingFace-Publish fehlgeschlagen: ...
  • if Hugging Face mode is selected but HF_TOKEN is missing, this is reported directly
  • after successful preparation, the SSE stream now explicitly states:
    • Hugging Face dataset source in use
    • or URL-bundle dataset source in use, with no external publish required

Live effect

The dashboard process was rebuilt and restarted on Erik after this change.

Result:

  • RunPod preparation status is more honest
  • operators can distinguish:
    • data refresh problem
    • optional external publish problem
    • actual RunPod training job submission/polling problem

Notes

  • This does not itself force a Hugging Face publish.
  • It only makes the control plane truthful about what step is happening and what actually failed.