sync: record magatamallm local training verification

2026-05-07 01:16:25 +02:00 · 2026-05-07 01:16:25 +02:00 · a6278a5041
commit a6278a5041
parent a0ea4ccbae
2 changed files with 143 additions and 1 deletions
--- a/sync/CURRENT.md
+++ b/sync/CURRENT.md
@ -1,6 +1,6 @@
 # Current TIP Sync State
-Updated: 2026-05-06 22:55 UTC
+Updated: 2026-05-07 01:16 UTC
 ## Active Policy
@ -27,6 +27,54 @@ When work touches TIP, Magatama, LLM Gateway, bridges, auth, or shared Erik infr
 ## Latest Work
 - MAGATAMA local MagatamaLLM training state was re-verified on 2026-05-07:
  - result:
    - the lane export / dataset refresh worked
    - a new locally adopted MagatamaLLM model did **not** land
    - active MAGATAMA provider remains the older alias:
      - `ollama:magatama-coder:latest`
  - live/public evidence:
    - `GET https://magatama.fichtmueller.org/api/llm/status`
      - `activeProvider = ollama:magatama-coder:latest`
      - `autoFixProvider = ollama:magatama-coder:latest`
      - `training.lastTrainingAt = 2026-05-06T22:43:20Z`
      - `training.modelVersion = magatama-coder:latest`
      - `training.activeRun = null`
    - this means the UI timestamp currently reflects the latest dataset/training-state update, not proof of a newly adopted local model.
  - local Mac evidence:
    - `ollama list` still shows:
      - `magatama-coder:latest` → modified `3 weeks ago`
      - `magatama-llm-v2-0:latest` → modified `11 days ago`
    - no newer Magatama candidate/import alias appeared locally
  - registry/adoption evidence:
    - Erik lane manifest exists and is fresh:
      - `/opt/magatama/training-data/runpod/magatamallm/manifest.json`
      - `generatedAt = 2026-05-06T22:45:15.944Z`
      - `train = 15679`
      - `eval = 1743`
      - `total = 17422`
    - but Erik had no populated local adoption/registry state files in:
      - `/opt/magatama/training-data/model-registry/models.json`
      - `/opt/magatama/training-data/model-registry/runs.json`
      - `/opt/magatama/training-data/model-registry/active.json`
      - `/opt/magatama/data/llm-status.json`
    - local repo only had historical `training-data/model-registry/training-runs.json`
  - historical run evidence:
    - recent `magatamallm` training-run records still show:
      - `submitted`
      - then `not_found_after_submit`
      - or other non-adopted / worker-failure states
    - there is still no verified “completed_and_adopted” proof for a new MagatamaLLM local model.
  - operational conclusion:
    - current truth:
      - dataset/lane preparation works
      - local model adoption is still the missing step
      - MAGATAMA does **not** currently know more than the already active `magatama-coder:latest` alias
    - next fix block remains:
      - make RunPod/local completion count only when adoption succeeds
      - persist adoption report + model registry state
      - update active alias and version only after smoke-tested import succeeds
 - MAGATAMA Switchblade port intelligence is now truly flowing end-to-end on 2026-05-06:
  - live root cause:
    - Switchblade itself already had the rich SG350 data (`description`, LLDP neighbor, peer port, octets), but MAGATAMA had still shown mostly flat port chips.
--- a/sync/history/2026-05-07-magatamallm-local-training-verification.md
+++ b/sync/history/2026-05-07-magatamallm-local-training-verification.md
@ -0,0 +1,94 @@
 # 2026-05-07 – MagatamaLLM Local Training Verification
 ## Question
 Did the recent local / MAGATAMA-side MagatamaLLM training actually succeed and increase the active model’s knowledge?
 ## Answer
 No. The dataset refresh succeeded, but a newer locally adopted MagatamaLLM model was **not** verified.
 ## Evidence
 ### 1. Public MAGATAMA status
 `GET https://magatama.fichtmueller.org/api/llm/status`
 Observed:
 - `activeProvider = ollama:magatama-coder:latest`
 - `autoFixProvider = ollama:magatama-coder:latest`
 - `training.lastTrainingAt = 2026-05-06T22:43:20Z`
 - `training.modelVersion = magatama-coder:latest`
 - `training.activeRun = null`
 Interpretation:
 - the dashboard timestamp reflects the latest dataset/training-state update
 - it does **not** prove that a new local model was imported and activated
 ### 2. Local Ollama state on the Mac
 `ollama list`
 Relevant entries:
 - `magatama-coder:latest` → modified `3 weeks ago`
 - `magatama-llm-v2-0:latest` → modified `11 days ago`
 Interpretation:
 - no newly imported Magatama candidate/adopted model is visible locally
 - the active alias still points to an older model image
 ### 3. Dataset/lane export did work
 Fresh Erik manifest exists:
 - `/opt/magatama/training-data/runpod/magatamallm/manifest.json`
 Observed:
 - `generatedAt = 2026-05-06T22:45:15.944Z`
 - `train = 15679`
 - `eval = 1743`
 - `total = 17422`
 Interpretation:
 - the lane export / pool sync is healthy
 - training input exists and was rebuilt
 ### 4. Adoption/registry proof is missing
 On Erik, these expected local state files were absent:
 - `/opt/magatama/training-data/model-registry/models.json`
 - `/opt/magatama/training-data/model-registry/runs.json`
 - `/opt/magatama/training-data/model-registry/active.json`
 - `/opt/magatama/data/llm-status.json`
 Interpretation:
 - no trustworthy proof that a new model artifact was imported, registered, and activated
 ### 5. Historical run records still show failed/non-adopted outcomes
 Local `training-data/model-registry/training-runs.json` still contains recent `magatamallm` runs such as:
 - `submitted`
 - `not_found_after_submit`
 There is still no verified “completed_and_adopted” proof for a new MagatamaLLM local model.
 ## Conclusion
 Current state:
 - pool refresh works
 - lane export works
 - active alias/version switching after training is still not proven
 Therefore:
 - MagatamaLLM did **not** yet gain a verified newer local knowledge state from the recent run attempts
 - MAGATAMA is still operating on the older active alias `magatama-coder:latest`
 ## Next Required Fix
 The remaining training-automation gap is still:
 1. run completes
 2. artifact existence is verified
 3. artifact is adopted/imported locally
 4. smoke tests pass
 5. active alias + model version are updated
 6. only then mark training as successful