2.1 KiB
2.1 KiB
MAGATAMA Custom RunPod Worker Build/Publish Prep
Date: 2026-05-07
What Changed
- committed and pushed the previously pending RunPod root-cause sync handoff:
2a35761 sync: record runpod managed endpoint root cause
- added a real custom-worker build/publish helper to MAGATAMA:
magatama/scripts/runpod_worker_publish.sh
- added package entrypoint:
pnpm runpod:worker:publish
- extended:
magatama/packages/fine-tuner/RUNPOD.mdso the target end-to-end automation path is documented from lane pool through alias switch
Erik Reality Check
dockerexists on Erik:/usr/bin/docker
docker buildxexists:github.com/docker/buildx v0.33.0
- no preexisting docker registry login/config found:
~/.docker/config.jsonabsent
Interpretation:
- Erik can act as a builder
- but cannot yet publish a worker image to GHCR/Docker Hub without credentials or a registry login
Live Remote Worker Build Attempt
Synced to Erik:
/opt/magatama/packages/fine-tuner/Dockerfile.runpod/opt/magatama/packages/fine-tuner/RUNPOD.md
Then attempted:
- build image tag:
magatama-runpod-worker:test
Observed build truth:
- base
runpod/pytorch:2.2.0-py3.10-cuda12.1.1-devel-ubuntu22.04pulled successfully - worker dependencies installed successfully
- build progressed through:
COPY train_cuda.py runpod_handler.py ./exporting to image
But:
- the image was not yet visible afterward in
docker images - therefore the build still needs one more clean verification pass
Current Bottleneck
The remaining blocker is no longer MAGATAMA lane logic or adoption code.
It is now:
- publish the custom worker image to a registry RunPod can consume
- create/switch the endpoint to that image
- set on Erik:
RUNPOD_WORKER_KIND=custom-magatamaRUNPOD_ENDPOINT_ID=<custom endpoint id>
Only then can MAGATAMA complete the intended full automation:
- training pool refresh
- lane-specific dataset build
- RunPod fine-tune
- returned artifact reference
- local adoption/import
- smoke tests
- new release alias
- active alias switch