transceiver-db/sync/history/2026-05-06-magatama-zero-open-findings-and-resolver-truth.md
2026-05-06 08:38:07 +02:00

125 lines
4.2 KiB
Markdown

# MAGATAMA Handoff — 2026-05-06
## Scope
This handoff captures the MAGATAMA remediation block that started from "does MAGATAMA still really protect us?" and ended with a clean live state:
- all open MAGATAMA findings resolved to `0`
- host-audit false positives corrected
- resolver truth fixed so disabled Codex/Copilot are shown as disabled, not broken
## Binding Outcome
Live verified on Erik and public MAGATAMA:
- `open findings: 0`
- `/api/health` -> `status: ok`
- `/api/active-resolvers` ->
- `MAGATAMA Core: working`
- `MagatamaLLM: working`
- `Claude (secondary): working`
- `Codex (secondary/manual): idle`
- `Copilot (secondary/manual): idle`
- `/api/protection-proof` summary:
- `knownAssets: 79`
- `hostsWithTelemetry: 27`
- `assetsWithoutTelemetry: 52`
- `queueExecuting: 0`
- `queueBlocked: 0`
- `queueFailed: 0`
## What Was Fixed
### 1. Threat/queue/code false positives
Earlier in the same remediation chain, MAGATAMA was repaired so that:
- `threat-news-seed` items are triaged instead of piling up as operational noise
- generated artifacts/reports are excluded from code scanning
- queue counters only count unresolved findings instead of stale historical queue rows
- training metrics prefer verified, deduped, Gitea-backed examples
This had already reduced the system from hundreds of open findings down to the final guard host-audit tail.
### 2. Guard host-audit truth
The last remaining open finding was:
- `guard | high | atlas-host-audit | Baseline: NUC Proxmox protection gaps | 192.168.178.10`
The root cause was a too-naive Proxmox baseline rule. It treated these as hostile by default:
- SSH on `22`
- rpcbind on `111`
- Proxmox UI on `8006`
- SPICE proxy on `3128`
- `pve-firewall` reporting `disabled/running`
For this internal management host, that logic was wrong. The audit was updated so that:
- internal management exposure on `22`, `8006`, and `3128` is acceptable
- rpcbind is acceptable when only default `portmapper` / `status` services are present
- `pve-firewall` is accepted when the service is actually running
- only additional RPC services or genuinely missing firewall runtime produce a warning
After redeploying the audit script and rerunning the audit, the Proxmox finding cleared and MAGATAMA reached `0` open findings.
### 3. Resolver truth
The public resolver view still looked unhealthy because Codex was shown as unavailable. Live inspection found:
- `codex_enabled = false`
- `codex_bridge_url = ''`
- local codex bridge responds, but with `503 auth_required`
This is not the same as "runtime outage". The dashboard logic was changed so:
- if `codex_enabled=false`, Codex and Copilot show as:
- `status: idle`
- `detail: In MAGATAMA settings disabled`
This is now live on the public endpoint and is the correct interpretation unless the team intentionally re-enables Codex through settings and bridge/gateway auth.
## Important Files Touched
MAGATAMA repo:
- `packages/core/src/routes/health-builders.ts`
- `packages/core/src/routes/health-atlas.ts`
- `packages/core/src/learning/auto-fix-scheduler.ts`
- `packages/dashboard/src/server.ts`
- `packages/code/src/types.ts`
- `packages/core/src/routes/health-support.ts`
- `scripts/security_atlas_host_audit.py`
This sync repo:
- `sync/CURRENT.md`
- `sync/history/2026-05-06-magatama-zero-open-findings-and-resolver-truth.md`
## Evidence Summary
Live evidence gathered during the remediation:
- local codex bridge health:
- HTTP `503`
- `{"status":"auth_required","configured":true,"provider":"codex-cli"}`
- public active resolvers after dashboard patch:
- Codex/Copilot show `idle`, not `unavailable`
- live DB query on Erik:
- `SELECT count(*) AS open_findings FROM findings WHERE resolved_at IS NULL;`
- result: `0`
## Remaining Real Next Step
MAGATAMA is now clean from the findings/queue perspective, but not yet perfect in coverage:
- `52` assets remain discovery/inventory-only without live telemetry
- they are no longer open findings, but they are the next real operational expansion area
If work resumes on MAGATAMA protection depth, the next correct lane is:
1. expand live telemetry coverage beyond the current `27` hosts
2. keep the verified-only training corpus clean
3. only re-enable Codex/Copilot after gateway or bridge auth is intentionally restored