transceiver-db/sync/history/2026-05-06-magatama-runpod-attackpaths-atlas-exposure-fixes.md

153 lines
4.7 KiB
Markdown

# 2026-05-06 — MAGATAMA RunPod / Attack Paths / Atlas Exposure Fixes
## Scope
This handoff captures the follow-up fixes after MAGATAMA had already been cleaned to zero findings earlier in the day, but three practical issues remained:
1. RunPod serverless training start was failing from MAGATAMA UI.
2. Attack Paths looked empty/broken to the operator.
3. Atlas exposure findings reopened as noisy internal LAN management alerts.
## What Was Actually Broken
### 1. RunPod training did not fail because of RunPod
User-facing message:
- `RunPod nicht erreichbar`
Real root cause on Erik:
- `/opt/magatama/package.json` already referenced `training:refresh-all` and `training:refresh-all:publish`
- but `/opt/magatama/scripts/training_full_refresh.ts` and related scripts were missing remotely
Additional follow-up break:
- `scripts/model_registry_build.ts` assumed `training-data/model-registry/external-sources.json` always existed remotely
### 2. Attack Paths page looked dead
The page was not broken, but it was misleading:
- selected system scope in the screenshot had `0 Assets in Scope`
- at the same time there were either:
- no multi-step correlated live paths, or
- no open correlated findings
Before the fix the empty canvas looked like a defect instead of an honest empty-state.
### 3. Atlas exposure reopened 28 Guard findings
Live breakdown before the final policy fix:
- `guard | atlas-exposure | high | 9`
- `guard | atlas-exposure | low | 19`
Examples:
- `Exposure: Open ports on 192.168.178.213`
- `Exposure: Open ports on 192.168.178.2`
- `Exposure: Open ports on 192.168.178.5`
These were not “internet exposed” incidents in the meaningful operational sense; they were generic LAN/internal management ports discovered by Atlas.
## Changes Made
### RunPod training pipeline
Synced to Erik:
- full local `/Users/renefichtmueller/Desktop/Claude Code/magatama/scripts/` tree into `/opt/magatama/scripts/`
- local `training-data/model-registry/` into `/opt/magatama/training-data/model-registry/`
Patched:
- `magatama/scripts/model_registry_build.ts`
Behavior change:
- missing external metadata files now fall back safely instead of crashing the refresh step
Verified on Erik:
- `pnpm training:refresh-all` now succeeds
Fresh effective dataset totals:
- `magatamallm`: `92,742 raw -> 17,356 effective`
- `fo_blogllm`: `32 total`
- `tip_llm`: `40 total`
Important note:
- Codex did **not** perform the final external Hugging Face publish step in this chat.
- Local refresh/build path is fixed.
### Attack Paths UI
Patched:
- `magatama/packages/core/src/routes/attack-paths.ts`
- `magatama/packages/dashboard/public/index-v2.html`
Behavior change:
- if no live paths exist, MAGATAMA can still show historical correlated paths when available
- if the user-selected scope contains `0` assets, the graph now says so explicitly
- if there are simply no open multi-step correlations, the page says that honestly
Live strings now present on Erik:
- `Im aktuellen Scope liegen 0 Assets.`
- `Erweitere Standort oder Datacenter / Rack, damit MAGATAMA korrelierbare Assets und Pfade darstellen kann.`
- `Ohne offene mehrstufige Korrelationen bleibt die Graph-Sicht bewusst leer.`
### trust_remote_code hardening
Patched:
- `magatama/scripts/test_runpod_adapter.py`
- `magatama/scripts/ollama_adapter_bridge.py`
Behavior change:
- local adapter/tokenizer/model loading no longer uses `trust_remote_code=True`
Reason:
- this was causing a live MAGATAMA CODE finding on Erik:
- `HuggingFace trust_remote_code`
### Atlas exposure policy
Patched:
- `magatama/packages/core/src/routes/health-atlas.ts`
Behavior change:
- generic Atlas portscan findings on RFC1918/internal assets are no longer automatically promoted into open Guard findings unless the exposure is critical enough to deserve operational tracking
- host-audit remains the authoritative place for explicit posture on Erik / Proxmox / Mac Studio
This removed the noisy LAN exposure findings without simply faking closure; the policy itself was corrected.
## Live Verification
After rebuild, deploy, restart, and health-triggered sync:
- `open findings = 0` in Postgres on Erik
- `scripts/test_runpod_adapter.py` on Erik no longer contains `trust_remote_code=True`
- dashboard empty-state strings for Attack Paths are present in the live HTML path
## Operational Meaning
- MAGATAMA is no longer reopening Guard noise for normal internal management ports discovered by the broad Atlas scan
- Attack Paths no longer looks “broken” when scope or data legitimately yields no graph
- RunPod dataset refresh/build is back to a working state on Erik
## TIP Policy Reminder
- TIPLLM only for robot/crawler planning
- Erik controller/light only
- heavy crawlers on Proxmox / Pis