sync: record magatama training count source fix
This commit is contained in:
parent
9bc84a89ee
commit
77a4aab592
@ -90,6 +90,17 @@ When work touches TIP, Magatama, LLM Gateway, bridges, auth, or shared Erik infr
|
||||
- operational rule:
|
||||
- do not treat `submitted` or a brief `IN_QUEUE` as proof of a usable serverless training run.
|
||||
- only trust the run once it reaches `IN_PROGRESS` or a durable terminal state with artifact evidence.
|
||||
- follow-up training count fix on 2026-05-06 corrected the Training UI source-of-truth:
|
||||
- MAGATAMA had still shown `1097` because the dashboard was counting the legacy deduplicated fix corpus instead of the current lane-specific RunPod export.
|
||||
- dashboard now prefers `training-data/runpod/magatamallm/manifest.json` for the visible MagatamaLLM training count.
|
||||
- synced current lane export to Erik and restarted `magatama-dashboard`.
|
||||
- verified public API now returns:
|
||||
- `collectedExamples = 1367`
|
||||
- `effectiveExamples = 1367`
|
||||
- `evalExamples = 152`
|
||||
- `totalExamples = 1519`
|
||||
- `newSinceLastTraining = 1367`
|
||||
- if the browser still shows `1097`, treat it as stale cached UI and hard reload.
|
||||
|
||||
- MAGATAMA was repaired end-to-end to a clean operational baseline:
|
||||
- live guard host-audits for Erik, Mac Studio, and Proxmox were corrected and rerun.
|
||||
|
||||
@ -0,0 +1,40 @@
|
||||
# 2026-05-06 — MAGATAMA training count source fix
|
||||
|
||||
## Summary
|
||||
|
||||
MAGATAMA training UI was still showing `1097` because the dashboard counted the legacy deduplicated fix corpus instead of the current lane-specific RunPod export.
|
||||
|
||||
## Root cause
|
||||
|
||||
- Dashboard training summary read `getTrainingCorpusStats()` from `gitea-learning-pool/magatamallm/fixes.jsonl`.
|
||||
- Live Erik state still had a huge raw `fixes.jsonl` and an old dedupe-derived effective count path.
|
||||
- The actual current training source for RunPod is the lane export under:
|
||||
- `training-data/runpod/magatamallm/magatamallm-sft-train.jsonl`
|
||||
- `training-data/runpod/magatamallm/magatamallm-sft-eval.jsonl`
|
||||
- `training-data/runpod/magatamallm/manifest.json`
|
||||
|
||||
## Fix
|
||||
|
||||
- `packages/dashboard/src/server.ts` now prefers the lane manifest for `magatamallm` training counts.
|
||||
- Live summary now uses:
|
||||
- `train = 1367`
|
||||
- `eval = 152`
|
||||
- `totalAfterDedupe = 1519`
|
||||
- `duplicatesRemoved = 1368`
|
||||
- Synced the current local `training-data/runpod/magatamallm/` directory to Erik.
|
||||
- Restarted `magatama-dashboard`.
|
||||
|
||||
## Verified live
|
||||
|
||||
Public API now returns:
|
||||
|
||||
- `training.collectedExamples = 1367`
|
||||
- `training.effectiveExamples = 1367`
|
||||
- `training.evalExamples = 152`
|
||||
- `training.totalExamples = 1519`
|
||||
- `training.newSinceLastTraining = 1367`
|
||||
- `training.collectionsPath = /opt/magatama/training-data/runpod/magatamallm/manifest.json`
|
||||
|
||||
## Operator note
|
||||
|
||||
If the UI still shows `1097`, it is a browser cache/stale page issue. Hard reload the MAGATAMA dashboard.
|
||||
Loading…
x
Reference in New Issue
Block a user