transceiver-db/sync/history/2026-05-06-magatama-training-count-source-fix.md
2026-05-06 16:27:14 +02:00

41 lines
1.5 KiB
Markdown

# 2026-05-06 — MAGATAMA training count source fix
## Summary
MAGATAMA training UI was still showing `1097` because the dashboard counted the legacy deduplicated fix corpus instead of the current lane-specific RunPod export.
## Root cause
- Dashboard training summary read `getTrainingCorpusStats()` from `gitea-learning-pool/magatamallm/fixes.jsonl`.
- Live Erik state still had a huge raw `fixes.jsonl` and an old dedupe-derived effective count path.
- The actual current training source for RunPod is the lane export under:
- `training-data/runpod/magatamallm/magatamallm-sft-train.jsonl`
- `training-data/runpod/magatamallm/magatamallm-sft-eval.jsonl`
- `training-data/runpod/magatamallm/manifest.json`
## Fix
- `packages/dashboard/src/server.ts` now prefers the lane manifest for `magatamallm` training counts.
- Live summary now uses:
- `train = 1367`
- `eval = 152`
- `totalAfterDedupe = 1519`
- `duplicatesRemoved = 1368`
- Synced the current local `training-data/runpod/magatamallm/` directory to Erik.
- Restarted `magatama-dashboard`.
## Verified live
Public API now returns:
- `training.collectedExamples = 1367`
- `training.effectiveExamples = 1367`
- `training.evalExamples = 152`
- `training.totalExamples = 1519`
- `training.newSinceLastTraining = 1367`
- `training.collectionsPath = /opt/magatama/training-data/runpod/magatamallm/manifest.json`
## Operator note
If the UI still shows `1097`, it is a browser cache/stale page issue. Hard reload the MAGATAMA dashboard.