71 lines
2.0 KiB
Markdown
71 lines
2.0 KiB
Markdown
# Cross-Agent Chat Sync Refresh
|
|
|
|
Date: 2026-05-07
|
|
|
|
## Purpose
|
|
|
|
The user explicitly requested that all chats and relevant findings continue to be secured in Gitea and reflected into the shared `sync/` handoff so Codex, Claude, and the laptop remain aligned.
|
|
|
|
## Confirmed Current State
|
|
|
|
- `sync/` is the authoritative cross-agent handoff location
|
|
- recent sync commits already pushed to Gitea:
|
|
- `2a35761 sync: record runpod managed endpoint root cause`
|
|
- `72d61ad sync: record custom runpod worker build prep`
|
|
|
|
## Current MAGATAMA / RunPod Truth
|
|
|
|
- lane-specific training pools are now separated correctly:
|
|
- `magatamallm`
|
|
- `fo_blogllm`
|
|
- `tip_llm`
|
|
- signed MAGATAMA dataset URL bundles are already used
|
|
- local adoption and smoke-test logic exists
|
|
- version bump + alias switch logic exists
|
|
|
|
But:
|
|
|
|
- the active RunPod endpoint still behaves like the managed Axolotl endpoint
|
|
- that endpoint does not return a verifiable adoptable artifact reference to MAGATAMA
|
|
- therefore fully automatic:
|
|
- train
|
|
- adopt
|
|
- smoke-test
|
|
- version bump
|
|
- alias switch
|
|
is still blocked on the infrastructure side
|
|
|
|
## Current Infrastructure Truth
|
|
|
|
- Erik has:
|
|
- `docker`
|
|
- `docker buildx`
|
|
- Erik currently does **not** have:
|
|
- a docker registry login/config in `~/.docker/config.json`
|
|
- therefore the final missing piece is still:
|
|
- publish the custom worker image to a registry RunPod can consume
|
|
|
|
## Needed Next Inputs
|
|
|
|
To fully close the automation loop, the operator must provide one of:
|
|
|
|
- `GHCR_USERNAME` + `GHCR_TOKEN`
|
|
- Docker Hub repo + credentials
|
|
- an already approved container image destination
|
|
|
|
## Final Remaining Sequence
|
|
|
|
1. publish custom worker image
|
|
2. create/update RunPod endpoint to use that image
|
|
3. set on Erik:
|
|
- `RUNPOD_WORKER_KIND=custom-magatama`
|
|
- `RUNPOD_ENDPOINT_ID=<custom endpoint id>`
|
|
4. restart MAGATAMA dashboard
|
|
5. run lane-specific canary
|
|
6. verify:
|
|
- artifact exists
|
|
- adoption succeeds
|
|
- smoke tests pass
|
|
- release alias increments
|
|
- active alias switches automatically
|