diff --git a/sync/CURRENT.md b/sync/CURRENT.md index 34f153c..244742b 100644 --- a/sync/CURRENT.md +++ b/sync/CURRENT.md @@ -4,6 +4,75 @@ Updated: 2026-05-07 08:05 UTC ## Newest Work +- MAGATAMA RunPod custom worker preparation continued on 2026-05-07: + - the pending sync handoff was committed and **successfully pushed to Gitea**: + - commit: + - `2a35761 sync: record runpod managed endpoint root cause` + - MAGATAMA repo now includes an explicit helper for building/publishing the custom RunPod worker image: + - `magatama/scripts/runpod_worker_publish.sh` + - new package script: + - `pnpm runpod:worker:publish` + - helper behavior: + - expects: + - `RUNPOD_WORKER_IMAGE` + - supports: + - `GHCR_USERNAME` + - `GHCR_TOKEN` + - `RUNPOD_WORKER_TAG` + - `RUNPOD_WORKER_PUSH_MODE=push|load` + - prints the exact next environment variables required on Erik after image publication: + - `RUNPOD_WORKER_KIND=custom-magatama` + - `RUNPOD_ENDPOINT_ID=` + - `magatama/packages/fine-tuner/RUNPOD.md` was extended so the full automation target is now documented end-to-end: + - lane pool sync + - RunPod dataset URL bundle + - custom worker training + - adapter upload + - local adoption + - smoke tests + - release alias minting + - active alias switch + - Erik infrastructure truth was rechecked: + - `docker` exists: + - `/usr/bin/docker` + - `docker buildx` exists: + - `github.com/docker/buildx v0.33.0` + - **no docker registry login/config** is currently present on Erik: + - `~/.docker/config.json` absent + - interpretation: + - Erik can build images + - but cannot yet push a public/private worker image to GHCR/Docker Hub without credentials or a pre-authenticated registry path + - the missing custom worker files were synced live to Erik: + - `/opt/magatama/packages/fine-tuner/Dockerfile.runpod` + - `/opt/magatama/packages/fine-tuner/RUNPOD.md` + - a real remote worker image build was then attempted on Erik: + - image tag requested: + - `magatama-runpod-worker:test` + - build truth: + - base `runpod/pytorch:2.2.0-py3.10-cuda12.1.1-devel-ubuntu22.04` pulled successfully + - Python dependencies for the worker installed successfully + - build reached: + - `COPY train_cuda.py runpod_handler.py ./` + - `exporting to image` + - however: + - final image was **not yet visible** in `docker images` + - therefore the build still needs one more clean verification pass before being treated as green + - current operational conclusion: + - MAGATAMA training pools, lane separation, signed dataset URL path, and local adoption API are ready + - the final blocking step remains infrastructure: + - publish the custom worker image to a registry RunPod can consume + - create/switch the endpoint + - then set on Erik: + - `RUNPOD_WORKER_KIND=custom-magatama` + - `RUNPOD_ENDPOINT_ID=` + - once that is done, MAGATAMA's already-prepared code path can finally perform: + - train + - verify artifact + - adopt locally + - smoke-test + - bump version + - switch alias + - MAGATAMA RunPod training return-path deep dive on 2026-05-07: - Attack Paths `Open Fix Guidance` placebo button was fixed live on Erik: - `magatama/packages/dashboard/public/index-v2.html` diff --git a/sync/history/2026-05-07-magatama-custom-worker-build-publish-prep.md b/sync/history/2026-05-07-magatama-custom-worker-build-publish-prep.md new file mode 100644 index 0000000..0705d75 --- /dev/null +++ b/sync/history/2026-05-07-magatama-custom-worker-build-publish-prep.md @@ -0,0 +1,77 @@ +# MAGATAMA Custom RunPod Worker Build/Publish Prep + +Date: 2026-05-07 + +## What Changed + +- committed and pushed the previously pending RunPod root-cause sync handoff: + - `2a35761 sync: record runpod managed endpoint root cause` +- added a real custom-worker build/publish helper to MAGATAMA: + - `magatama/scripts/runpod_worker_publish.sh` +- added package entrypoint: + - `pnpm runpod:worker:publish` +- extended: + - `magatama/packages/fine-tuner/RUNPOD.md` + so the target end-to-end automation path is documented from lane pool through alias switch + +## Erik Reality Check + +- `docker` exists on Erik: + - `/usr/bin/docker` +- `docker buildx` exists: + - `github.com/docker/buildx v0.33.0` +- no preexisting docker registry login/config found: + - `~/.docker/config.json` absent + +Interpretation: + +- Erik can act as a builder +- but cannot yet publish a worker image to GHCR/Docker Hub without credentials or a registry login + +## Live Remote Worker Build Attempt + +Synced to Erik: + +- `/opt/magatama/packages/fine-tuner/Dockerfile.runpod` +- `/opt/magatama/packages/fine-tuner/RUNPOD.md` + +Then attempted: + +- build image tag: + - `magatama-runpod-worker:test` + +Observed build truth: + +- base `runpod/pytorch:2.2.0-py3.10-cuda12.1.1-devel-ubuntu22.04` pulled successfully +- worker dependencies installed successfully +- build progressed through: + - `COPY train_cuda.py runpod_handler.py ./` + - `exporting to image` + +But: + +- the image was not yet visible afterward in `docker images` +- therefore the build still needs one more clean verification pass + +## Current Bottleneck + +The remaining blocker is no longer MAGATAMA lane logic or adoption code. + +It is now: + +1. publish the custom worker image to a registry RunPod can consume +2. create/switch the endpoint to that image +3. set on Erik: + - `RUNPOD_WORKER_KIND=custom-magatama` + - `RUNPOD_ENDPOINT_ID=` + +Only then can MAGATAMA complete the intended full automation: + +- training pool refresh +- lane-specific dataset build +- RunPod fine-tune +- returned artifact reference +- local adoption/import +- smoke tests +- new release alias +- active alias switch