sync: record magatama zero-open-finding remediation

This commit is contained in:
Rene Fichtmueller 2026-05-06 08:38:07 +02:00
parent 14ad31da46
commit 08a732e9cc
2 changed files with 159 additions and 1 deletions

View File

@ -1,6 +1,6 @@
# Current TIP Sync State
Updated: 2026-04-29 21:15 UTC
Updated: 2026-05-06 08:35 UTC
## Active Policy
@ -27,6 +27,35 @@ When work touches TIP, Magatama, LLM Gateway, bridges, auth, or shared Erik infr
## Latest Work
- MAGATAMA was repaired end-to-end to a clean operational baseline:
- live guard host-audits for Erik, Mac Studio, and Proxmox were corrected and rerun.
- open findings were reduced all the way to `0` in Postgres.
- false-positive Proxmox baseline findings were removed by teaching the audit to treat internal-only management ports and default-only rpcbind exposure as acceptable for this host.
- code scanner false positives from generated/report artifacts remain excluded.
- Live MAGATAMA protection/runtime state after the 2026-05-06 remediation:
- `open findings: 0`
- `queueExecuting: 0`
- `queueBlocked: 0`
- `queueFailed: 0`
- public `/api/health` returns `status: ok`
- public `/api/active-resolvers` returns:
- `MAGATAMA Core: working`
- `MagatamaLLM: working`
- `Claude (secondary): working`
- `Codex (secondary/manual): idle`
- `Copilot (secondary/manual): idle`
- Important resolver truth fix on 2026-05-06:
- live `codex_enabled=false` in MAGATAMA settings was causing Codex to show as a broken resolver.
- dashboard logic was updated so disabled Codex/Copilot now show truthfully as `idle` with `In MAGATAMA settings disabled`, instead of pretending there is a runtime outage.
- the local codex bridge on Erik is reachable but currently reports `auth_required`; do not treat that as a production outage while Codex is intentionally disabled in settings.
- Remaining real operational gap after findings hit zero:
- MAGATAMA still knows more assets than it actively telemeters.
- last public protection proof showed:
- `knownAssets: 79`
- `hostsWithTelemetry: 27`
- `assetsWithoutTelemetry: 52`
- these are currently inventory/discovery-only assets, not open findings, but they remain the next real coverage expansion area.
- MAGATAMA cross-repo state from the same chat is now synced into this handoff:
- Compliance framework cards in MAGATAMA are clickable and open per-framework requirement details.
- MAGATAMA training status was corrected so `New Since Last Training` no longer falsely shows `0`.
@ -90,6 +119,11 @@ When work touches TIP, Magatama, LLM Gateway, bridges, auth, or shared Erik infr
- Relevant local repo:
- `/Users/renefichtmueller/Desktop/Claude Code/magatama`
- Latest confirmed live MAGATAMA findings state:
- `open findings: 0` on `2026-05-06`
- Latest confirmed live resolver state:
- `Codex` and `Copilot` intentionally `idle/disabled`
- not a runtime outage, but a settings choice until gateway/bridge auth is intentionally re-enabled
- Latest confirmed live MAGATAMA training metric after dashboard fix:
- `newSinceLastTraining: 49`
- Meaning:

View File

@ -0,0 +1,124 @@
# MAGATAMA Handoff — 2026-05-06
## Scope
This handoff captures the MAGATAMA remediation block that started from "does MAGATAMA still really protect us?" and ended with a clean live state:
- all open MAGATAMA findings resolved to `0`
- host-audit false positives corrected
- resolver truth fixed so disabled Codex/Copilot are shown as disabled, not broken
## Binding Outcome
Live verified on Erik and public MAGATAMA:
- `open findings: 0`
- `/api/health` -> `status: ok`
- `/api/active-resolvers` ->
- `MAGATAMA Core: working`
- `MagatamaLLM: working`
- `Claude (secondary): working`
- `Codex (secondary/manual): idle`
- `Copilot (secondary/manual): idle`
- `/api/protection-proof` summary:
- `knownAssets: 79`
- `hostsWithTelemetry: 27`
- `assetsWithoutTelemetry: 52`
- `queueExecuting: 0`
- `queueBlocked: 0`
- `queueFailed: 0`
## What Was Fixed
### 1. Threat/queue/code false positives
Earlier in the same remediation chain, MAGATAMA was repaired so that:
- `threat-news-seed` items are triaged instead of piling up as operational noise
- generated artifacts/reports are excluded from code scanning
- queue counters only count unresolved findings instead of stale historical queue rows
- training metrics prefer verified, deduped, Gitea-backed examples
This had already reduced the system from hundreds of open findings down to the final guard host-audit tail.
### 2. Guard host-audit truth
The last remaining open finding was:
- `guard | high | atlas-host-audit | Baseline: NUC Proxmox protection gaps | 192.168.178.10`
The root cause was a too-naive Proxmox baseline rule. It treated these as hostile by default:
- SSH on `22`
- rpcbind on `111`
- Proxmox UI on `8006`
- SPICE proxy on `3128`
- `pve-firewall` reporting `disabled/running`
For this internal management host, that logic was wrong. The audit was updated so that:
- internal management exposure on `22`, `8006`, and `3128` is acceptable
- rpcbind is acceptable when only default `portmapper` / `status` services are present
- `pve-firewall` is accepted when the service is actually running
- only additional RPC services or genuinely missing firewall runtime produce a warning
After redeploying the audit script and rerunning the audit, the Proxmox finding cleared and MAGATAMA reached `0` open findings.
### 3. Resolver truth
The public resolver view still looked unhealthy because Codex was shown as unavailable. Live inspection found:
- `codex_enabled = false`
- `codex_bridge_url = ''`
- local codex bridge responds, but with `503 auth_required`
This is not the same as "runtime outage". The dashboard logic was changed so:
- if `codex_enabled=false`, Codex and Copilot show as:
- `status: idle`
- `detail: In MAGATAMA settings disabled`
This is now live on the public endpoint and is the correct interpretation unless the team intentionally re-enables Codex through settings and bridge/gateway auth.
## Important Files Touched
MAGATAMA repo:
- `packages/core/src/routes/health-builders.ts`
- `packages/core/src/routes/health-atlas.ts`
- `packages/core/src/learning/auto-fix-scheduler.ts`
- `packages/dashboard/src/server.ts`
- `packages/code/src/types.ts`
- `packages/core/src/routes/health-support.ts`
- `scripts/security_atlas_host_audit.py`
This sync repo:
- `sync/CURRENT.md`
- `sync/history/2026-05-06-magatama-zero-open-findings-and-resolver-truth.md`
## Evidence Summary
Live evidence gathered during the remediation:
- local codex bridge health:
- HTTP `503`
- `{"status":"auth_required","configured":true,"provider":"codex-cli"}`
- public active resolvers after dashboard patch:
- Codex/Copilot show `idle`, not `unavailable`
- live DB query on Erik:
- `SELECT count(*) AS open_findings FROM findings WHERE resolved_at IS NULL;`
- result: `0`
## Remaining Real Next Step
MAGATAMA is now clean from the findings/queue perspective, but not yet perfect in coverage:
- `52` assets remain discovery/inventory-only without live telemetry
- they are no longer open findings, but they are the next real operational expansion area
If work resumes on MAGATAMA protection depth, the next correct lane is:
1. expand live telemetry coverage beyond the current `27` hosts
2. keep the verified-only training corpus clean
3. only re-enable Codex/Copilot after gateway or bridge auth is intentionally restored