diff --git a/sync/CURRENT.md b/sync/CURRENT.md index c3789e1..5348cea 100644 --- a/sync/CURRENT.md +++ b/sync/CURRENT.md @@ -1,6 +1,6 @@ # Current TIP Sync State -Updated: 2026-05-06 10:28 UTC +Updated: 2026-05-06 12:02 UTC ## Active Policy @@ -27,6 +27,40 @@ When work touches TIP, Magatama, LLM Gateway, bridges, auth, or shared Erik infr ## Latest Work +- MAGATAMA training + Attack Paths + Atlas exposure were corrected again on 2026-05-06: + - the RunPod serverless training start failure was not a RunPod outage. + - root cause was missing training scripts on Erik (`training_full_refresh.ts` and related helpers were absent under `/opt/magatama/scripts`). + - Codex synced the full local `magatama/scripts/` tree to Erik, added a safe fallback in `scripts/model_registry_build.ts`, and synced the local `training-data/model-registry/` directory. + - verified on Erik: + - `pnpm training:refresh-all` now succeeds. + - fresh dataset totals after dedupe: + - `magatamallm`: `92,742` raw → `17,356` effective (`15,620 train / 1,736 eval`) + - `fo_blogllm`: `32` total (`28 train / 4 eval`) + - `tip_llm`: `40` total (`36 train / 4 eval`) + - important nuance: + - Codex did **not** execute the final Hugging Face publish step from Erik in this chat. + - local/script/build failures are fixed; external dataset publish still depends on the selected dataset source and explicit publish intent. +- MAGATAMA Attack Paths UX is no longer a misleading blank panel: + - the page now distinguishes between: + - no live attack paths + - historical fallback paths + - empty selected scope (`0 assets in scope`) + - when a user narrows the scope to a rack/location with zero scoped assets, the graph explicitly says so instead of looking broken. + - live dashboard HTML on Erik now contains: + - `Im aktuellen Scope liegen 0 Assets.` + - `Erweitere Standort oder Datacenter / Rack, damit MAGATAMA korrelierbare Assets und Pfade darstellen kann.` + - `Ohne offene mehrstufige Korrelationen bleibt die Graph-Sicht bewusst leer.` +- MAGATAMA code/training hardening was extended: + - `scripts/test_runpod_adapter.py` no longer loads tokenizer/model with `trust_remote_code=True`. + - `scripts/ollama_adapter_bridge.py` no longer loads tokenizer/model with `trust_remote_code=True`. + - this removed the live CODE finding around `HuggingFace trust_remote_code` on Erik. +- Atlas exposure logic was tightened to stop reopening noisy LAN management findings: + - generic `atlas-exposure` findings now only stay operationally open for exposure that is meaningful enough to track as a finding. + - internal RFC1918 management/service ports discovered by the broad atlas scan are no longer promoted into open Guard findings just because they exist on the LAN. + - host-specific posture for Proxmox / Erik / Mac Studio remains the job of explicit host-audit logic. + - after rebuild + deploy + health sync: + - live Postgres open findings returned to `0`. + - MAGATAMA was repaired end-to-end to a clean operational baseline: - live guard host-audits for Erik, Mac Studio, and Proxmox were corrected and rerun. - open findings were reduced all the way to `0` in Postgres. diff --git a/sync/history/2026-05-06-magatama-runpod-attackpaths-atlas-exposure-fixes.md b/sync/history/2026-05-06-magatama-runpod-attackpaths-atlas-exposure-fixes.md new file mode 100644 index 0000000..635728a --- /dev/null +++ b/sync/history/2026-05-06-magatama-runpod-attackpaths-atlas-exposure-fixes.md @@ -0,0 +1,152 @@ +# 2026-05-06 — MAGATAMA RunPod / Attack Paths / Atlas Exposure Fixes + +## Scope + +This handoff captures the follow-up fixes after MAGATAMA had already been cleaned to zero findings earlier in the day, but three practical issues remained: + +1. RunPod serverless training start was failing from MAGATAMA UI. +2. Attack Paths looked empty/broken to the operator. +3. Atlas exposure findings reopened as noisy internal LAN management alerts. + +## What Was Actually Broken + +### 1. RunPod training did not fail because of RunPod + +User-facing message: + +- `RunPod nicht erreichbar` + +Real root cause on Erik: + +- `/opt/magatama/package.json` already referenced `training:refresh-all` and `training:refresh-all:publish` +- but `/opt/magatama/scripts/training_full_refresh.ts` and related scripts were missing remotely + +Additional follow-up break: + +- `scripts/model_registry_build.ts` assumed `training-data/model-registry/external-sources.json` always existed remotely + +### 2. Attack Paths page looked dead + +The page was not broken, but it was misleading: + +- selected system scope in the screenshot had `0 Assets in Scope` +- at the same time there were either: + - no multi-step correlated live paths, or + - no open correlated findings + +Before the fix the empty canvas looked like a defect instead of an honest empty-state. + +### 3. Atlas exposure reopened 28 Guard findings + +Live breakdown before the final policy fix: + +- `guard | atlas-exposure | high | 9` +- `guard | atlas-exposure | low | 19` + +Examples: + +- `Exposure: Open ports on 192.168.178.213` +- `Exposure: Open ports on 192.168.178.2` +- `Exposure: Open ports on 192.168.178.5` + +These were not “internet exposed” incidents in the meaningful operational sense; they were generic LAN/internal management ports discovered by Atlas. + +## Changes Made + +### RunPod training pipeline + +Synced to Erik: + +- full local `/Users/renefichtmueller/Desktop/Claude Code/magatama/scripts/` tree into `/opt/magatama/scripts/` +- local `training-data/model-registry/` into `/opt/magatama/training-data/model-registry/` + +Patched: + +- `magatama/scripts/model_registry_build.ts` + +Behavior change: + +- missing external metadata files now fall back safely instead of crashing the refresh step + +Verified on Erik: + +- `pnpm training:refresh-all` now succeeds + +Fresh effective dataset totals: + +- `magatamallm`: `92,742 raw -> 17,356 effective` +- `fo_blogllm`: `32 total` +- `tip_llm`: `40 total` + +Important note: + +- Codex did **not** perform the final external Hugging Face publish step in this chat. +- Local refresh/build path is fixed. + +### Attack Paths UI + +Patched: + +- `magatama/packages/core/src/routes/attack-paths.ts` +- `magatama/packages/dashboard/public/index-v2.html` + +Behavior change: + +- if no live paths exist, MAGATAMA can still show historical correlated paths when available +- if the user-selected scope contains `0` assets, the graph now says so explicitly +- if there are simply no open multi-step correlations, the page says that honestly + +Live strings now present on Erik: + +- `Im aktuellen Scope liegen 0 Assets.` +- `Erweitere Standort oder Datacenter / Rack, damit MAGATAMA korrelierbare Assets und Pfade darstellen kann.` +- `Ohne offene mehrstufige Korrelationen bleibt die Graph-Sicht bewusst leer.` + +### trust_remote_code hardening + +Patched: + +- `magatama/scripts/test_runpod_adapter.py` +- `magatama/scripts/ollama_adapter_bridge.py` + +Behavior change: + +- local adapter/tokenizer/model loading no longer uses `trust_remote_code=True` + +Reason: + +- this was causing a live MAGATAMA CODE finding on Erik: + - `HuggingFace trust_remote_code` + +### Atlas exposure policy + +Patched: + +- `magatama/packages/core/src/routes/health-atlas.ts` + +Behavior change: + +- generic Atlas portscan findings on RFC1918/internal assets are no longer automatically promoted into open Guard findings unless the exposure is critical enough to deserve operational tracking +- host-audit remains the authoritative place for explicit posture on Erik / Proxmox / Mac Studio + +This removed the noisy LAN exposure findings without simply faking closure; the policy itself was corrected. + +## Live Verification + +After rebuild, deploy, restart, and health-triggered sync: + +- `open findings = 0` in Postgres on Erik +- `scripts/test_runpod_adapter.py` on Erik no longer contains `trust_remote_code=True` +- dashboard empty-state strings for Attack Paths are present in the live HTML path + +## Operational Meaning + +- MAGATAMA is no longer reopening Guard noise for normal internal management ports discovered by the broad Atlas scan +- Attack Paths no longer looks “broken” when scope or data legitimately yields no graph +- RunPod dataset refresh/build is back to a working state on Erik + +## TIP Policy Reminder + +- TIPLLM only for robot/crawler planning +- Erik controller/light only +- heavy crawlers on Proxmox / Pis