rene/transceiver-db

Rene Fichtmueller 72d61add47 sync: record custom runpod worker build prep

2026-05-07 11:04:22 +02:00

2.1 KiB

Raw Permalink Blame History

MAGATAMA Custom RunPod Worker Build/Publish Prep

Date: 2026-05-07

What Changed

committed and pushed the previously pending RunPod root-cause sync handoff:
- 2a35761 sync: record runpod managed endpoint root cause
added a real custom-worker build/publish helper to MAGATAMA:
- magatama/scripts/runpod_worker_publish.sh
added package entrypoint:
- pnpm runpod:worker:publish
extended:
- magatama/packages/fine-tuner/RUNPOD.md so the target end-to-end automation path is documented from lane pool through alias switch

Erik Reality Check

docker exists on Erik:
- /usr/bin/docker
docker buildx exists:
- github.com/docker/buildx v0.33.0
no preexisting docker registry login/config found:
- ~/.docker/config.json absent

Interpretation:

Erik can act as a builder
but cannot yet publish a worker image to GHCR/Docker Hub without credentials or a registry login

Live Remote Worker Build Attempt

Synced to Erik:

/opt/magatama/packages/fine-tuner/Dockerfile.runpod
/opt/magatama/packages/fine-tuner/RUNPOD.md

Then attempted:

build image tag:
- magatama-runpod-worker:test

Observed build truth:

base runpod/pytorch:2.2.0-py3.10-cuda12.1.1-devel-ubuntu22.04 pulled successfully
worker dependencies installed successfully
build progressed through:
- COPY train_cuda.py runpod_handler.py ./
- exporting to image

But:

the image was not yet visible afterward in docker images
therefore the build still needs one more clean verification pass

Current Bottleneck

The remaining blocker is no longer MAGATAMA lane logic or adoption code.

It is now:

publish the custom worker image to a registry RunPod can consume
create/switch the endpoint to that image
set on Erik:
- RUNPOD_WORKER_KIND=custom-magatama
- RUNPOD_ENDPOINT_ID=<custom endpoint id>

Only then can MAGATAMA complete the intended full automation:

training pool refresh
lane-specific dataset build
RunPod fine-tune
returned artifact reference
local adoption/import
smoke tests
new release alias
active alias switch