sync: record magatama ui cache runpod tooltip changelog fix

2026-05-06 17:24:54 +02:00 · 2026-05-06 17:24:54 +02:00 · 830ab57c3c
commit 830ab57c3c
parent 77a4aab592
2 changed files with 176 additions and 1 deletions
--- a/sync/CURRENT.md
+++ b/sync/CURRENT.md
@ -1,6 +1,6 @@
 # Current TIP Sync State

-Updated: 2026-05-06 12:21 UTC
+Updated: 2026-05-06 15:24 UTC

 ## Active Policy

@ -27,6 +27,44 @@ When work touches TIP, Magatama, LLM Gateway, bridges, auth, or shared Erik infr

 ## Latest Work

+- MAGATAMA frontend/runtime consistency was repaired again on 2026-05-06:
+  - dashboard and core were rebuilt locally and redeployed to Erik.
+  - live processes restarted successfully:
+    - `magatama-dashboard`
+    - `magatama`
+  - public `api/llm/status` now shows the true lane-export totals for `magatamallm`:
+    - `collectedExamples = 15620`
+    - `effectiveExamples = 15620`
+    - `evalExamples = 1736`
+    - `totalExamples = 17356`
+    - `newSinceLastTraining = 15620`
+  - root cause for the stale `1097` display:
+    - the RunPod start SSE path still logged the legacy deduplicated `fixes.jsonl` corpus.
+    - this was changed so RunPod launches no longer present the legacy `1097` count as the active training truth.
+    - after dataset refresh the UI now emits the lane manifest totals instead.
+  - RunPod completion handling was hardened:
+    - worker `COMPLETED` is no longer trusted blindly.
+    - MAGATAMA now scans RunPod worker logs for real training failures (`Traceback`, `SyntaxError`, non-zero exit, etc.) before treating the run as successful.
+    - if the worker logs show a hidden failure, MAGATAMA records this as `completed_with_worker_failure` instead of pretending the run succeeded.
+  - public findings state remains currently empty:
+    - `GET /api/findings?limit=1` returned `{"findings":[],"total":0}`
+    - this is now rendered with an explicit empty-state row instead of a visually blank table.
+  - Attack Paths empty-state is now intentionally explicit rather than looking broken.
+  - Frontend cache and scope handling were hardened:
+    - cache version bumped to `2026-05-06b`
+    - stale legacy `magatama_api_cache:*` entries are cleared
+    - per-endpoint TTLs added
+    - invalid or empty scope selections are normalized instead of silently leaving the UI in misleading empty views
+  - Switchblade rack port hover was materially improved:
+    - port chips now carry `data-tooltip`
+    - custom tooltip CSS is live on Erik
+    - the old browser-native “question mark only” behavior should be replaced by a readable hover bubble
+  - Changelog self-healing was added in core:
+    - stale cached changelog data older than 6h now forces a rebuild from git history
+    - verified live via dashboard proxy on Erik:
+      - `generatedAt = 2026-05-06T15:18:42.708Z`
+      - latest visible entries include `2026-04-30` items again instead of appearing frozen at `30.05`
+
 - MAGATAMA training + Attack Paths + Atlas exposure were corrected again on 2026-05-06:
  - the RunPod serverless training start failure was not a RunPod outage.
  - root cause was missing training scripts on Erik (`training_full_refresh.ts` and related helpers were absent under `/opt/magatama/scripts`).
--- a/sync/history/2026-05-06-magatama-ui-cache-runpod-tooltip-changelog-fix.md
+++ b/sync/history/2026-05-06-magatama-ui-cache-runpod-tooltip-changelog-fix.md
@ -0,0 +1,137 @@
+# MAGATAMA UI / Cache / RunPod / Tooltip / Changelog Fix
+
+Date: 2026-05-06
+Author: Codex
+
+## Scope
+
+Addressed the current MAGATAMA operator complaints in one block:
+
+- training UI still showed `1097`
+- findings page looked blank
+- attack paths looked empty/broken
+- Switchblade port hover only showed a help cursor / question mark
+- changelog looked stale
+
+## What Was Fixed
+
+### 1. Training truth source
+
+`magatamallm` RunPod launches still logged the old legacy deduplicated `fixes.jsonl` count (`1097`) during SSE startup.
+
+This was corrected so RunPod launches now:
+
+- still dedupe the legacy fix corpus where needed
+- but no longer present that count as the operator-facing training truth
+- instead emit the lane-specific RunPod manifest totals after dataset refresh
+
+Live verified via public MAGATAMA API:
+
+- `collectedExamples = 15620`
+- `effectiveExamples = 15620`
+- `evalExamples = 1736`
+- `totalExamples = 17356`
+- `newSinceLastTraining = 15620`
+
+### 2. RunPod completion truthfulness
+
+RunPod worker jobs could return `COMPLETED` even though the logs contained real training failures.
+
+MAGATAMA now inspects worker logs for markers such as:
+
+- `Traceback`
+- `SyntaxError`
+- non-zero exit status
+- explicit train/fine-tune failure text
+
+If such evidence exists, the run is recorded as worker-failed instead of being treated as a clean success.
+
+### 3. Findings page no longer looks broken when empty
+
+The live findings API currently returns:
+
+- `findings = []`
+- `total = 0`
+
+The UI now renders an explicit empty-state row when there are no open findings or when filters hide everything, instead of leaving the table visually blank.
+
+### 4. Attack Paths empty-state clarified
+
+Attack Paths previously looked broken when the selected scope had zero assets.
+
+The UI now explicitly states:
+
+- the current scope has `0 assets`
+- operators should widen location/datacenter/rack scope
+- the graph stays intentionally empty when no correlated multi-step paths exist
+
+### 5. Frontend cache + scope hardening
+
+Frontend cache handling was improved:
+
+- cache version bumped to `2026-05-06b`
+- stale legacy `magatama_api_cache:*` entries are cleared
+- per-endpoint TTLs were introduced
+- invalid scope selections are normalized
+- empty scoped selections reset rather than silently trapping the UI in misleading empty views
+
+### 6. Switchblade port hover improved
+
+The old port chips relied only on browser-native `title` behavior.
+
+Now:
+
+- port chips carry `data-tooltip`
+- custom tooltip CSS is shipped live
+- usage/state text should appear as a real hover bubble
+
+Live Erik file check confirmed:
+
+- `data-tooltip` markers present
+- tooltip CSS present
+
+### 7. Changelog self-healing
+
+The public changelog cache in MAGATAMA core previously returned cached data indefinitely if structurally valid.
+
+Now:
+
+- cached changelog older than 6 hours triggers a rebuild from git history
+
+Live verified on Erik through dashboard proxy:
+
+- `generatedAt = 2026-05-06T15:18:42.708Z`
+- latest entries include fresh `2026-04-30` material again
+
+## Files Touched In MAGATAMA
+
+- `packages/dashboard/public/index-v2.html`
+- `packages/dashboard/src/server.ts`
+- `packages/core/src/routes/changelog.ts`
+
+## Deployment Status
+
+Built locally and redeployed to Erik:
+
+- dashboard dist synced
+- core dist synced
+- `index-v2.html` synced
+- PM2 restarted:
+  - `magatama-dashboard`
+  - `magatama`
+
+## Important Live Evidence
+
+- public `api/llm/status` shows lane-export counts, not `1097`
+- public `api/findings?limit=1` returns empty findings cleanly
+- Erik live dashboard file contains:
+  - `API_CACHE_VERSION = '2026-05-06b'`
+  - `data-tooltip`
+  - `Im aktuellen Scope liegen 0 Assets.`
+  - `Klicken für Details`
+
+## Open Truths
+
+- current live findings are genuinely `0`; this is not a hidden frontend-only failure
+- Attack Paths can still be empty if there are truly no scoped assets or no correlated attack stories
+- RunPod serverless still needs endpoint-side reliability; the MAGATAMA-side truthfulness improvements do not by themselves fix a broken RunPod release/worker pipeline