From 77a4aab5925bc58d273d7cacb81555f77848181f Mon Sep 17 00:00:00 2001 From: Rene Fichtmueller Date: Wed, 6 May 2026 16:27:14 +0200 Subject: [PATCH] sync: record magatama training count source fix --- sync/CURRENT.md | 11 +++++ ...5-06-magatama-training-count-source-fix.md | 40 +++++++++++++++++++ 2 files changed, 51 insertions(+) create mode 100644 sync/history/2026-05-06-magatama-training-count-source-fix.md diff --git a/sync/CURRENT.md b/sync/CURRENT.md index cdb59a6..aabf4e8 100644 --- a/sync/CURRENT.md +++ b/sync/CURRENT.md @@ -90,6 +90,17 @@ When work touches TIP, Magatama, LLM Gateway, bridges, auth, or shared Erik infr - operational rule: - do not treat `submitted` or a brief `IN_QUEUE` as proof of a usable serverless training run. - only trust the run once it reaches `IN_PROGRESS` or a durable terminal state with artifact evidence. + - follow-up training count fix on 2026-05-06 corrected the Training UI source-of-truth: + - MAGATAMA had still shown `1097` because the dashboard was counting the legacy deduplicated fix corpus instead of the current lane-specific RunPod export. + - dashboard now prefers `training-data/runpod/magatamallm/manifest.json` for the visible MagatamaLLM training count. + - synced current lane export to Erik and restarted `magatama-dashboard`. + - verified public API now returns: + - `collectedExamples = 1367` + - `effectiveExamples = 1367` + - `evalExamples = 152` + - `totalExamples = 1519` + - `newSinceLastTraining = 1367` + - if the browser still shows `1097`, treat it as stale cached UI and hard reload. - MAGATAMA was repaired end-to-end to a clean operational baseline: - live guard host-audits for Erik, Mac Studio, and Proxmox were corrected and rerun. diff --git a/sync/history/2026-05-06-magatama-training-count-source-fix.md b/sync/history/2026-05-06-magatama-training-count-source-fix.md new file mode 100644 index 0000000..058b932 --- /dev/null +++ b/sync/history/2026-05-06-magatama-training-count-source-fix.md @@ -0,0 +1,40 @@ +# 2026-05-06 — MAGATAMA training count source fix + +## Summary + +MAGATAMA training UI was still showing `1097` because the dashboard counted the legacy deduplicated fix corpus instead of the current lane-specific RunPod export. + +## Root cause + +- Dashboard training summary read `getTrainingCorpusStats()` from `gitea-learning-pool/magatamallm/fixes.jsonl`. +- Live Erik state still had a huge raw `fixes.jsonl` and an old dedupe-derived effective count path. +- The actual current training source for RunPod is the lane export under: + - `training-data/runpod/magatamallm/magatamallm-sft-train.jsonl` + - `training-data/runpod/magatamallm/magatamallm-sft-eval.jsonl` + - `training-data/runpod/magatamallm/manifest.json` + +## Fix + +- `packages/dashboard/src/server.ts` now prefers the lane manifest for `magatamallm` training counts. +- Live summary now uses: + - `train = 1367` + - `eval = 152` + - `totalAfterDedupe = 1519` + - `duplicatesRemoved = 1368` +- Synced the current local `training-data/runpod/magatamallm/` directory to Erik. +- Restarted `magatama-dashboard`. + +## Verified live + +Public API now returns: + +- `training.collectedExamples = 1367` +- `training.effectiveExamples = 1367` +- `training.evalExamples = 152` +- `training.totalExamples = 1519` +- `training.newSinceLastTraining = 1367` +- `training.collectionsPath = /opt/magatama/training-data/runpod/magatamallm/manifest.json` + +## Operator note + +If the UI still shows `1097`, it is a browser cache/stale page issue. Hard reload the MAGATAMA dashboard.