sync: record magatama training count source fix

This commit is contained in:
Rene Fichtmueller 2026-05-06 16:27:14 +02:00
parent 9bc84a89ee
commit 77a4aab592
2 changed files with 51 additions and 0 deletions

View File

@ -90,6 +90,17 @@ When work touches TIP, Magatama, LLM Gateway, bridges, auth, or shared Erik infr
- operational rule: - operational rule:
- do not treat `submitted` or a brief `IN_QUEUE` as proof of a usable serverless training run. - do not treat `submitted` or a brief `IN_QUEUE` as proof of a usable serverless training run.
- only trust the run once it reaches `IN_PROGRESS` or a durable terminal state with artifact evidence. - only trust the run once it reaches `IN_PROGRESS` or a durable terminal state with artifact evidence.
- follow-up training count fix on 2026-05-06 corrected the Training UI source-of-truth:
- MAGATAMA had still shown `1097` because the dashboard was counting the legacy deduplicated fix corpus instead of the current lane-specific RunPod export.
- dashboard now prefers `training-data/runpod/magatamallm/manifest.json` for the visible MagatamaLLM training count.
- synced current lane export to Erik and restarted `magatama-dashboard`.
- verified public API now returns:
- `collectedExamples = 1367`
- `effectiveExamples = 1367`
- `evalExamples = 152`
- `totalExamples = 1519`
- `newSinceLastTraining = 1367`
- if the browser still shows `1097`, treat it as stale cached UI and hard reload.
- MAGATAMA was repaired end-to-end to a clean operational baseline: - MAGATAMA was repaired end-to-end to a clean operational baseline:
- live guard host-audits for Erik, Mac Studio, and Proxmox were corrected and rerun. - live guard host-audits for Erik, Mac Studio, and Proxmox were corrected and rerun.

View File

@ -0,0 +1,40 @@
# 2026-05-06 — MAGATAMA training count source fix
## Summary
MAGATAMA training UI was still showing `1097` because the dashboard counted the legacy deduplicated fix corpus instead of the current lane-specific RunPod export.
## Root cause
- Dashboard training summary read `getTrainingCorpusStats()` from `gitea-learning-pool/magatamallm/fixes.jsonl`.
- Live Erik state still had a huge raw `fixes.jsonl` and an old dedupe-derived effective count path.
- The actual current training source for RunPod is the lane export under:
- `training-data/runpod/magatamallm/magatamallm-sft-train.jsonl`
- `training-data/runpod/magatamallm/magatamallm-sft-eval.jsonl`
- `training-data/runpod/magatamallm/manifest.json`
## Fix
- `packages/dashboard/src/server.ts` now prefers the lane manifest for `magatamallm` training counts.
- Live summary now uses:
- `train = 1367`
- `eval = 152`
- `totalAfterDedupe = 1519`
- `duplicatesRemoved = 1368`
- Synced the current local `training-data/runpod/magatamallm/` directory to Erik.
- Restarted `magatama-dashboard`.
## Verified live
Public API now returns:
- `training.collectedExamples = 1367`
- `training.effectiveExamples = 1367`
- `training.evalExamples = 152`
- `training.totalExamples = 1519`
- `training.newSinceLastTraining = 1367`
- `training.collectionsPath = /opt/magatama/training-data/runpod/magatamallm/manifest.json`
## Operator note
If the UI still shows `1097`, it is a browser cache/stale page issue. Hard reload the MAGATAMA dashboard.