sync: record magatama ui cache runpod tooltip changelog fix

This commit is contained in:
Rene Fichtmueller 2026-05-06 17:24:54 +02:00
parent 77a4aab592
commit 830ab57c3c
2 changed files with 176 additions and 1 deletions

View File

@ -1,6 +1,6 @@
# Current TIP Sync State # Current TIP Sync State
Updated: 2026-05-06 12:21 UTC Updated: 2026-05-06 15:24 UTC
## Active Policy ## Active Policy
@ -27,6 +27,44 @@ When work touches TIP, Magatama, LLM Gateway, bridges, auth, or shared Erik infr
## Latest Work ## Latest Work
- MAGATAMA frontend/runtime consistency was repaired again on 2026-05-06:
- dashboard and core were rebuilt locally and redeployed to Erik.
- live processes restarted successfully:
- `magatama-dashboard`
- `magatama`
- public `api/llm/status` now shows the true lane-export totals for `magatamallm`:
- `collectedExamples = 15620`
- `effectiveExamples = 15620`
- `evalExamples = 1736`
- `totalExamples = 17356`
- `newSinceLastTraining = 15620`
- root cause for the stale `1097` display:
- the RunPod start SSE path still logged the legacy deduplicated `fixes.jsonl` corpus.
- this was changed so RunPod launches no longer present the legacy `1097` count as the active training truth.
- after dataset refresh the UI now emits the lane manifest totals instead.
- RunPod completion handling was hardened:
- worker `COMPLETED` is no longer trusted blindly.
- MAGATAMA now scans RunPod worker logs for real training failures (`Traceback`, `SyntaxError`, non-zero exit, etc.) before treating the run as successful.
- if the worker logs show a hidden failure, MAGATAMA records this as `completed_with_worker_failure` instead of pretending the run succeeded.
- public findings state remains currently empty:
- `GET /api/findings?limit=1` returned `{"findings":[],"total":0}`
- this is now rendered with an explicit empty-state row instead of a visually blank table.
- Attack Paths empty-state is now intentionally explicit rather than looking broken.
- Frontend cache and scope handling were hardened:
- cache version bumped to `2026-05-06b`
- stale legacy `magatama_api_cache:*` entries are cleared
- per-endpoint TTLs added
- invalid or empty scope selections are normalized instead of silently leaving the UI in misleading empty views
- Switchblade rack port hover was materially improved:
- port chips now carry `data-tooltip`
- custom tooltip CSS is live on Erik
- the old browser-native “question mark only” behavior should be replaced by a readable hover bubble
- Changelog self-healing was added in core:
- stale cached changelog data older than 6h now forces a rebuild from git history
- verified live via dashboard proxy on Erik:
- `generatedAt = 2026-05-06T15:18:42.708Z`
- latest visible entries include `2026-04-30` items again instead of appearing frozen at `30.05`
- MAGATAMA training + Attack Paths + Atlas exposure were corrected again on 2026-05-06: - MAGATAMA training + Attack Paths + Atlas exposure were corrected again on 2026-05-06:
- the RunPod serverless training start failure was not a RunPod outage. - the RunPod serverless training start failure was not a RunPod outage.
- root cause was missing training scripts on Erik (`training_full_refresh.ts` and related helpers were absent under `/opt/magatama/scripts`). - root cause was missing training scripts on Erik (`training_full_refresh.ts` and related helpers were absent under `/opt/magatama/scripts`).

View File

@ -0,0 +1,137 @@
# MAGATAMA UI / Cache / RunPod / Tooltip / Changelog Fix
Date: 2026-05-06
Author: Codex
## Scope
Addressed the current MAGATAMA operator complaints in one block:
- training UI still showed `1097`
- findings page looked blank
- attack paths looked empty/broken
- Switchblade port hover only showed a help cursor / question mark
- changelog looked stale
## What Was Fixed
### 1. Training truth source
`magatamallm` RunPod launches still logged the old legacy deduplicated `fixes.jsonl` count (`1097`) during SSE startup.
This was corrected so RunPod launches now:
- still dedupe the legacy fix corpus where needed
- but no longer present that count as the operator-facing training truth
- instead emit the lane-specific RunPod manifest totals after dataset refresh
Live verified via public MAGATAMA API:
- `collectedExamples = 15620`
- `effectiveExamples = 15620`
- `evalExamples = 1736`
- `totalExamples = 17356`
- `newSinceLastTraining = 15620`
### 2. RunPod completion truthfulness
RunPod worker jobs could return `COMPLETED` even though the logs contained real training failures.
MAGATAMA now inspects worker logs for markers such as:
- `Traceback`
- `SyntaxError`
- non-zero exit status
- explicit train/fine-tune failure text
If such evidence exists, the run is recorded as worker-failed instead of being treated as a clean success.
### 3. Findings page no longer looks broken when empty
The live findings API currently returns:
- `findings = []`
- `total = 0`
The UI now renders an explicit empty-state row when there are no open findings or when filters hide everything, instead of leaving the table visually blank.
### 4. Attack Paths empty-state clarified
Attack Paths previously looked broken when the selected scope had zero assets.
The UI now explicitly states:
- the current scope has `0 assets`
- operators should widen location/datacenter/rack scope
- the graph stays intentionally empty when no correlated multi-step paths exist
### 5. Frontend cache + scope hardening
Frontend cache handling was improved:
- cache version bumped to `2026-05-06b`
- stale legacy `magatama_api_cache:*` entries are cleared
- per-endpoint TTLs were introduced
- invalid scope selections are normalized
- empty scoped selections reset rather than silently trapping the UI in misleading empty views
### 6. Switchblade port hover improved
The old port chips relied only on browser-native `title` behavior.
Now:
- port chips carry `data-tooltip`
- custom tooltip CSS is shipped live
- usage/state text should appear as a real hover bubble
Live Erik file check confirmed:
- `data-tooltip` markers present
- tooltip CSS present
### 7. Changelog self-healing
The public changelog cache in MAGATAMA core previously returned cached data indefinitely if structurally valid.
Now:
- cached changelog older than 6 hours triggers a rebuild from git history
Live verified on Erik through dashboard proxy:
- `generatedAt = 2026-05-06T15:18:42.708Z`
- latest entries include fresh `2026-04-30` material again
## Files Touched In MAGATAMA
- `packages/dashboard/public/index-v2.html`
- `packages/dashboard/src/server.ts`
- `packages/core/src/routes/changelog.ts`
## Deployment Status
Built locally and redeployed to Erik:
- dashboard dist synced
- core dist synced
- `index-v2.html` synced
- PM2 restarted:
- `magatama-dashboard`
- `magatama`
## Important Live Evidence
- public `api/llm/status` shows lane-export counts, not `1097`
- public `api/findings?limit=1` returns empty findings cleanly
- Erik live dashboard file contains:
- `API_CACHE_VERSION = '2026-05-06b'`
- `data-tooltip`
- `Im aktuellen Scope liegen 0 Assets.`
- `Klicken für Details`
## Open Truths
- current live findings are genuinely `0`; this is not a hidden frontend-only failure
- Attack Paths can still be empty if there are truly no scoped assets or no correlated attack stories
- RunPod serverless still needs endpoint-side reliability; the MAGATAMA-side truthfulness improvements do not by themselves fix a broken RunPod release/worker pipeline