sync: record magatama ui cache runpod tooltip changelog fix

2026-05-06 17:24:54 +02:00 · 2026-05-06 17:24:54 +02:00 · 830ab57c3c
commit 830ab57c3c
parent 77a4aab592
2 changed files with 176 additions and 1 deletions
--- a/sync/CURRENT.md
+++ b/sync/CURRENT.md
@ -1,6 +1,6 @@
 # Current TIP Sync State
-Updated: 2026-05-06 12:21 UTC
+Updated: 2026-05-06 15:24 UTC
 ## Active Policy
@ -27,6 +27,44 @@ When work touches TIP, Magatama, LLM Gateway, bridges, auth, or shared Erik infr
 ## Latest Work
 - MAGATAMA frontend/runtime consistency was repaired again on 2026-05-06:
  - dashboard and core were rebuilt locally and redeployed to Erik.
  - live processes restarted successfully:
    - `magatama-dashboard`
    - `magatama`
  - public `api/llm/status` now shows the true lane-export totals for `magatamallm`:
    - `collectedExamples = 15620`
    - `effectiveExamples = 15620`
    - `evalExamples = 1736`
    - `totalExamples = 17356`
    - `newSinceLastTraining = 15620`
  - root cause for the stale `1097` display:
    - the RunPod start SSE path still logged the legacy deduplicated `fixes.jsonl` corpus.
    - this was changed so RunPod launches no longer present the legacy `1097` count as the active training truth.
    - after dataset refresh the UI now emits the lane manifest totals instead.
  - RunPod completion handling was hardened:
    - worker `COMPLETED` is no longer trusted blindly.
    - MAGATAMA now scans RunPod worker logs for real training failures (`Traceback`, `SyntaxError`, non-zero exit, etc.) before treating the run as successful.
    - if the worker logs show a hidden failure, MAGATAMA records this as `completed_with_worker_failure` instead of pretending the run succeeded.
  - public findings state remains currently empty:
    - `GET /api/findings?limit=1` returned `{"findings":[],"total":0}`
    - this is now rendered with an explicit empty-state row instead of a visually blank table.
  - Attack Paths empty-state is now intentionally explicit rather than looking broken.
  - Frontend cache and scope handling were hardened:
    - cache version bumped to `2026-05-06b`
    - stale legacy `magatama_api_cache:*` entries are cleared
    - per-endpoint TTLs added
    - invalid or empty scope selections are normalized instead of silently leaving the UI in misleading empty views
  - Switchblade rack port hover was materially improved:
    - port chips now carry `data-tooltip`
    - custom tooltip CSS is live on Erik
    - the old browser-native “question mark only” behavior should be replaced by a readable hover bubble
  - Changelog self-healing was added in core:
    - stale cached changelog data older than 6h now forces a rebuild from git history
    - verified live via dashboard proxy on Erik:
      - `generatedAt = 2026-05-06T15:18:42.708Z`
      - latest visible entries include `2026-04-30` items again instead of appearing frozen at `30.05`
 - MAGATAMA training + Attack Paths + Atlas exposure were corrected again on 2026-05-06:
  - the RunPod serverless training start failure was not a RunPod outage.
  - root cause was missing training scripts on Erik (`training_full_refresh.ts` and related helpers were absent under `/opt/magatama/scripts`).
--- a/sync/history/2026-05-06-magatama-ui-cache-runpod-tooltip-changelog-fix.md
+++ b/sync/history/2026-05-06-magatama-ui-cache-runpod-tooltip-changelog-fix.md
@ -0,0 +1,137 @@
 # MAGATAMA UI / Cache / RunPod / Tooltip / Changelog Fix
 Date: 2026-05-06
 Author: Codex
 ## Scope
 Addressed the current MAGATAMA operator complaints in one block:
 - training UI still showed `1097`
 - findings page looked blank
 - attack paths looked empty/broken
 - Switchblade port hover only showed a help cursor / question mark
 - changelog looked stale
 ## What Was Fixed
 ### 1. Training truth source
 `magatamallm` RunPod launches still logged the old legacy deduplicated `fixes.jsonl` count (`1097`) during SSE startup.
 This was corrected so RunPod launches now:
 - still dedupe the legacy fix corpus where needed
 - but no longer present that count as the operator-facing training truth
 - instead emit the lane-specific RunPod manifest totals after dataset refresh
 Live verified via public MAGATAMA API:
 - `collectedExamples = 15620`
 - `effectiveExamples = 15620`
 - `evalExamples = 1736`
 - `totalExamples = 17356`
 - `newSinceLastTraining = 15620`
 ### 2. RunPod completion truthfulness
 RunPod worker jobs could return `COMPLETED` even though the logs contained real training failures.
 MAGATAMA now inspects worker logs for markers such as:
 - `Traceback`
 - `SyntaxError`
 - non-zero exit status
 - explicit train/fine-tune failure text
 If such evidence exists, the run is recorded as worker-failed instead of being treated as a clean success.
 ### 3. Findings page no longer looks broken when empty
 The live findings API currently returns:
 - `findings = []`
 - `total = 0`
 The UI now renders an explicit empty-state row when there are no open findings or when filters hide everything, instead of leaving the table visually blank.
 ### 4. Attack Paths empty-state clarified
 Attack Paths previously looked broken when the selected scope had zero assets.
 The UI now explicitly states:
 - the current scope has `0 assets`
 - operators should widen location/datacenter/rack scope
 - the graph stays intentionally empty when no correlated multi-step paths exist
 ### 5. Frontend cache + scope hardening
 Frontend cache handling was improved:
 - cache version bumped to `2026-05-06b`
 - stale legacy `magatama_api_cache:*` entries are cleared
 - per-endpoint TTLs were introduced
 - invalid scope selections are normalized
 - empty scoped selections reset rather than silently trapping the UI in misleading empty views
 ### 6. Switchblade port hover improved
 The old port chips relied only on browser-native `title` behavior.
 Now:
 - port chips carry `data-tooltip`
 - custom tooltip CSS is shipped live
 - usage/state text should appear as a real hover bubble
 Live Erik file check confirmed:
 - `data-tooltip` markers present
 - tooltip CSS present
 ### 7. Changelog self-healing
 The public changelog cache in MAGATAMA core previously returned cached data indefinitely if structurally valid.
 Now:
 - cached changelog older than 6 hours triggers a rebuild from git history
 Live verified on Erik through dashboard proxy:
 - `generatedAt = 2026-05-06T15:18:42.708Z`
 - latest entries include fresh `2026-04-30` material again
 ## Files Touched In MAGATAMA
 - `packages/dashboard/public/index-v2.html`
 - `packages/dashboard/src/server.ts`
 - `packages/core/src/routes/changelog.ts`
 ## Deployment Status
 Built locally and redeployed to Erik:
 - dashboard dist synced
 - core dist synced
 - `index-v2.html` synced
 - PM2 restarted:
  - `magatama-dashboard`
  - `magatama`
 ## Important Live Evidence
 - public `api/llm/status` shows lane-export counts, not `1097`
 - public `api/findings?limit=1` returns empty findings cleanly
 - Erik live dashboard file contains:
  - `API_CACHE_VERSION = '2026-05-06b'`
  - `data-tooltip`
  - `Im aktuellen Scope liegen 0 Assets.`
  - `Klicken für Details`
 ## Open Truths
 - current live findings are genuinely `0`; this is not a hidden frontend-only failure
 - Attack Paths can still be empty if there are truly no scoped assets or no correlated attack stories
 - RunPod serverless still needs endpoint-side reliability; the MAGATAMA-side truthfulness improvements do not by themselves fix a broken RunPod release/worker pipeline