sync: record magatamallm local training verification

2026-05-07 01:16:25 +02:00 · 2026-05-07 01:16:25 +02:00 · a6278a5041
commit a6278a5041
parent a0ea4ccbae
2 changed files with 143 additions and 1 deletions
--- a/sync/CURRENT.md
+++ b/sync/CURRENT.md
@ -1,6 +1,6 @@
 # Current TIP Sync State

-Updated: 2026-05-06 22:55 UTC
+Updated: 2026-05-07 01:16 UTC

 ## Active Policy

@ -27,6 +27,54 @@ When work touches TIP, Magatama, LLM Gateway, bridges, auth, or shared Erik infr

 ## Latest Work

+- MAGATAMA local MagatamaLLM training state was re-verified on 2026-05-07:
+  - result:
+    - the lane export / dataset refresh worked
+    - a new locally adopted MagatamaLLM model did **not** land
+    - active MAGATAMA provider remains the older alias:
+      - `ollama:magatama-coder:latest`
+  - live/public evidence:
+    - `GET https://magatama.fichtmueller.org/api/llm/status`
+      - `activeProvider = ollama:magatama-coder:latest`
+      - `autoFixProvider = ollama:magatama-coder:latest`
+      - `training.lastTrainingAt = 2026-05-06T22:43:20Z`
+      - `training.modelVersion = magatama-coder:latest`
+      - `training.activeRun = null`
+    - this means the UI timestamp currently reflects the latest dataset/training-state update, not proof of a newly adopted local model.
+  - local Mac evidence:
+    - `ollama list` still shows:
+      - `magatama-coder:latest` → modified `3 weeks ago`
+      - `magatama-llm-v2-0:latest` → modified `11 days ago`
+    - no newer Magatama candidate/import alias appeared locally
+  - registry/adoption evidence:
+    - Erik lane manifest exists and is fresh:
+      - `/opt/magatama/training-data/runpod/magatamallm/manifest.json`
+      - `generatedAt = 2026-05-06T22:45:15.944Z`
+      - `train = 15679`
+      - `eval = 1743`
+      - `total = 17422`
+    - but Erik had no populated local adoption/registry state files in:
+      - `/opt/magatama/training-data/model-registry/models.json`
+      - `/opt/magatama/training-data/model-registry/runs.json`
+      - `/opt/magatama/training-data/model-registry/active.json`
+      - `/opt/magatama/data/llm-status.json`
+    - local repo only had historical `training-data/model-registry/training-runs.json`
+  - historical run evidence:
+    - recent `magatamallm` training-run records still show:
+      - `submitted`
+      - then `not_found_after_submit`
+      - or other non-adopted / worker-failure states
+    - there is still no verified “completed_and_adopted” proof for a new MagatamaLLM local model.
+  - operational conclusion:
+    - current truth:
+      - dataset/lane preparation works
+      - local model adoption is still the missing step
+      - MAGATAMA does **not** currently know more than the already active `magatama-coder:latest` alias
+    - next fix block remains:
+      - make RunPod/local completion count only when adoption succeeds
+      - persist adoption report + model registry state
+      - update active alias and version only after smoke-tested import succeeds
+
 - MAGATAMA Switchblade port intelligence is now truly flowing end-to-end on 2026-05-06:
  - live root cause:
    - Switchblade itself already had the rich SG350 data (`description`, LLDP neighbor, peer port, octets), but MAGATAMA had still shown mostly flat port chips.
--- a/sync/history/2026-05-07-magatamallm-local-training-verification.md
+++ b/sync/history/2026-05-07-magatamallm-local-training-verification.md
@ -0,0 +1,94 @@
+# 2026-05-07 – MagatamaLLM Local Training Verification
+
+## Question
+
+Did the recent local / MAGATAMA-side MagatamaLLM training actually succeed and increase the active model’s knowledge?
+
+## Answer
+
+No. The dataset refresh succeeded, but a newer locally adopted MagatamaLLM model was **not** verified.
+
+## Evidence
+
+### 1. Public MAGATAMA status
+
+`GET https://magatama.fichtmueller.org/api/llm/status`
+
+Observed:
+- `activeProvider = ollama:magatama-coder:latest`
+- `autoFixProvider = ollama:magatama-coder:latest`
+- `training.lastTrainingAt = 2026-05-06T22:43:20Z`
+- `training.modelVersion = magatama-coder:latest`
+- `training.activeRun = null`
+
+Interpretation:
+- the dashboard timestamp reflects the latest dataset/training-state update
+- it does **not** prove that a new local model was imported and activated
+
+### 2. Local Ollama state on the Mac
+
+`ollama list`
+
+Relevant entries:
+- `magatama-coder:latest` → modified `3 weeks ago`
+- `magatama-llm-v2-0:latest` → modified `11 days ago`
+
+Interpretation:
+- no newly imported Magatama candidate/adopted model is visible locally
+- the active alias still points to an older model image
+
+### 3. Dataset/lane export did work
+
+Fresh Erik manifest exists:
+- `/opt/magatama/training-data/runpod/magatamallm/manifest.json`
+
+Observed:
+- `generatedAt = 2026-05-06T22:45:15.944Z`
+- `train = 15679`
+- `eval = 1743`
+- `total = 17422`
+
+Interpretation:
+- the lane export / pool sync is healthy
+- training input exists and was rebuilt
+
+### 4. Adoption/registry proof is missing
+
+On Erik, these expected local state files were absent:
+- `/opt/magatama/training-data/model-registry/models.json`
+- `/opt/magatama/training-data/model-registry/runs.json`
+- `/opt/magatama/training-data/model-registry/active.json`
+- `/opt/magatama/data/llm-status.json`
+
+Interpretation:
+- no trustworthy proof that a new model artifact was imported, registered, and activated
+
+### 5. Historical run records still show failed/non-adopted outcomes
+
+Local `training-data/model-registry/training-runs.json` still contains recent `magatamallm` runs such as:
+- `submitted`
+- `not_found_after_submit`
+
+There is still no verified “completed_and_adopted” proof for a new MagatamaLLM local model.
+
+## Conclusion
+
+Current state:
+- pool refresh works
+- lane export works
+- active alias/version switching after training is still not proven
+
+Therefore:
+- MagatamaLLM did **not** yet gain a verified newer local knowledge state from the recent run attempts
+- MAGATAMA is still operating on the older active alias `magatama-coder:latest`
+
+## Next Required Fix
+
+The remaining training-automation gap is still:
+
+1. run completes
+2. artifact existence is verified
+3. artifact is adopted/imported locally
+4. smoke tests pass
+5. active alias + model version are updated
+6. only then mark training as successful