From 830ab57c3c565fe310b4582a9440e1f9329fa5b6 Mon Sep 17 00:00:00 2001 From: Rene Fichtmueller Date: Wed, 6 May 2026 17:24:54 +0200 Subject: [PATCH] sync: record magatama ui cache runpod tooltip changelog fix --- sync/CURRENT.md | 40 ++++- ...a-ui-cache-runpod-tooltip-changelog-fix.md | 137 ++++++++++++++++++ 2 files changed, 176 insertions(+), 1 deletion(-) create mode 100644 sync/history/2026-05-06-magatama-ui-cache-runpod-tooltip-changelog-fix.md diff --git a/sync/CURRENT.md b/sync/CURRENT.md index aabf4e8..6a6379f 100644 --- a/sync/CURRENT.md +++ b/sync/CURRENT.md @@ -1,6 +1,6 @@ # Current TIP Sync State -Updated: 2026-05-06 12:21 UTC +Updated: 2026-05-06 15:24 UTC ## Active Policy @@ -27,6 +27,44 @@ When work touches TIP, Magatama, LLM Gateway, bridges, auth, or shared Erik infr ## Latest Work +- MAGATAMA frontend/runtime consistency was repaired again on 2026-05-06: + - dashboard and core were rebuilt locally and redeployed to Erik. + - live processes restarted successfully: + - `magatama-dashboard` + - `magatama` + - public `api/llm/status` now shows the true lane-export totals for `magatamallm`: + - `collectedExamples = 15620` + - `effectiveExamples = 15620` + - `evalExamples = 1736` + - `totalExamples = 17356` + - `newSinceLastTraining = 15620` + - root cause for the stale `1097` display: + - the RunPod start SSE path still logged the legacy deduplicated `fixes.jsonl` corpus. + - this was changed so RunPod launches no longer present the legacy `1097` count as the active training truth. + - after dataset refresh the UI now emits the lane manifest totals instead. + - RunPod completion handling was hardened: + - worker `COMPLETED` is no longer trusted blindly. + - MAGATAMA now scans RunPod worker logs for real training failures (`Traceback`, `SyntaxError`, non-zero exit, etc.) before treating the run as successful. + - if the worker logs show a hidden failure, MAGATAMA records this as `completed_with_worker_failure` instead of pretending the run succeeded. + - public findings state remains currently empty: + - `GET /api/findings?limit=1` returned `{"findings":[],"total":0}` + - this is now rendered with an explicit empty-state row instead of a visually blank table. + - Attack Paths empty-state is now intentionally explicit rather than looking broken. + - Frontend cache and scope handling were hardened: + - cache version bumped to `2026-05-06b` + - stale legacy `magatama_api_cache:*` entries are cleared + - per-endpoint TTLs added + - invalid or empty scope selections are normalized instead of silently leaving the UI in misleading empty views + - Switchblade rack port hover was materially improved: + - port chips now carry `data-tooltip` + - custom tooltip CSS is live on Erik + - the old browser-native “question mark only” behavior should be replaced by a readable hover bubble + - Changelog self-healing was added in core: + - stale cached changelog data older than 6h now forces a rebuild from git history + - verified live via dashboard proxy on Erik: + - `generatedAt = 2026-05-06T15:18:42.708Z` + - latest visible entries include `2026-04-30` items again instead of appearing frozen at `30.05` + - MAGATAMA training + Attack Paths + Atlas exposure were corrected again on 2026-05-06: - the RunPod serverless training start failure was not a RunPod outage. - root cause was missing training scripts on Erik (`training_full_refresh.ts` and related helpers were absent under `/opt/magatama/scripts`). diff --git a/sync/history/2026-05-06-magatama-ui-cache-runpod-tooltip-changelog-fix.md b/sync/history/2026-05-06-magatama-ui-cache-runpod-tooltip-changelog-fix.md new file mode 100644 index 0000000..13e8466 --- /dev/null +++ b/sync/history/2026-05-06-magatama-ui-cache-runpod-tooltip-changelog-fix.md @@ -0,0 +1,137 @@ +# MAGATAMA UI / Cache / RunPod / Tooltip / Changelog Fix + +Date: 2026-05-06 +Author: Codex + +## Scope + +Addressed the current MAGATAMA operator complaints in one block: + +- training UI still showed `1097` +- findings page looked blank +- attack paths looked empty/broken +- Switchblade port hover only showed a help cursor / question mark +- changelog looked stale + +## What Was Fixed + +### 1. Training truth source + +`magatamallm` RunPod launches still logged the old legacy deduplicated `fixes.jsonl` count (`1097`) during SSE startup. + +This was corrected so RunPod launches now: + +- still dedupe the legacy fix corpus where needed +- but no longer present that count as the operator-facing training truth +- instead emit the lane-specific RunPod manifest totals after dataset refresh + +Live verified via public MAGATAMA API: + +- `collectedExamples = 15620` +- `effectiveExamples = 15620` +- `evalExamples = 1736` +- `totalExamples = 17356` +- `newSinceLastTraining = 15620` + +### 2. RunPod completion truthfulness + +RunPod worker jobs could return `COMPLETED` even though the logs contained real training failures. + +MAGATAMA now inspects worker logs for markers such as: + +- `Traceback` +- `SyntaxError` +- non-zero exit status +- explicit train/fine-tune failure text + +If such evidence exists, the run is recorded as worker-failed instead of being treated as a clean success. + +### 3. Findings page no longer looks broken when empty + +The live findings API currently returns: + +- `findings = []` +- `total = 0` + +The UI now renders an explicit empty-state row when there are no open findings or when filters hide everything, instead of leaving the table visually blank. + +### 4. Attack Paths empty-state clarified + +Attack Paths previously looked broken when the selected scope had zero assets. + +The UI now explicitly states: + +- the current scope has `0 assets` +- operators should widen location/datacenter/rack scope +- the graph stays intentionally empty when no correlated multi-step paths exist + +### 5. Frontend cache + scope hardening + +Frontend cache handling was improved: + +- cache version bumped to `2026-05-06b` +- stale legacy `magatama_api_cache:*` entries are cleared +- per-endpoint TTLs were introduced +- invalid scope selections are normalized +- empty scoped selections reset rather than silently trapping the UI in misleading empty views + +### 6. Switchblade port hover improved + +The old port chips relied only on browser-native `title` behavior. + +Now: + +- port chips carry `data-tooltip` +- custom tooltip CSS is shipped live +- usage/state text should appear as a real hover bubble + +Live Erik file check confirmed: + +- `data-tooltip` markers present +- tooltip CSS present + +### 7. Changelog self-healing + +The public changelog cache in MAGATAMA core previously returned cached data indefinitely if structurally valid. + +Now: + +- cached changelog older than 6 hours triggers a rebuild from git history + +Live verified on Erik through dashboard proxy: + +- `generatedAt = 2026-05-06T15:18:42.708Z` +- latest entries include fresh `2026-04-30` material again + +## Files Touched In MAGATAMA + +- `packages/dashboard/public/index-v2.html` +- `packages/dashboard/src/server.ts` +- `packages/core/src/routes/changelog.ts` + +## Deployment Status + +Built locally and redeployed to Erik: + +- dashboard dist synced +- core dist synced +- `index-v2.html` synced +- PM2 restarted: + - `magatama-dashboard` + - `magatama` + +## Important Live Evidence + +- public `api/llm/status` shows lane-export counts, not `1097` +- public `api/findings?limit=1` returns empty findings cleanly +- Erik live dashboard file contains: + - `API_CACHE_VERSION = '2026-05-06b'` + - `data-tooltip` + - `Im aktuellen Scope liegen 0 Assets.` + - `Klicken für Details` + +## Open Truths + +- current live findings are genuinely `0`; this is not a hidden frontend-only failure +- Attack Paths can still be empty if there are truly no scoped assets or no correlated attack stories +- RunPod serverless still needs endpoint-side reliability; the MAGATAMA-side truthfulness improvements do not by themselves fix a broken RunPod release/worker pipeline