MAGATAMA Coverage-Gap Hardening And Training Fix

Date: 2026-05-06
Author: Codex

Why this handoff exists

MAGATAMA looked clean at one point, then reopened a large batch of medium guard findings that were not all meaningful operational incidents. At the same time, verified MAGATAMA fixes were still at risk of landing in the wrong training output path.

This handoff records the concrete fixes, live verification, and training-pool updates.

Problem observed

Earlier live state:
- 49 open findings
- all guard
- all medium
- source: atlas-coverage-gap
Root cause:
- Atlas coverage logic treated passive inventory/discovery assets as if they were operationally managed assets that had failed telemetry obligations.
- This created noisy coverage findings for:
  - loopback
  - passive inventory-only entries
  - external reference assets
  - assets without explicit operator intent

Code changes made

MAGATAMA repo:

packages/core/src/routes/health-atlas.ts
packages/core/src/routes/health-builders.ts
packages/core/src/routes/health-types.ts
packages/core/src/learning/fix-tracking.ts

Coverage-gap hardening

Added scope-aware logic so Atlas coverage findings only open for genuinely managed operational assets.

Signals now used to justify an operational coverage finding include:

exposure evidence (atlas-exposure)
explicit non-auto owner
configured telemetry expectation
high/critical criticality
infrastructure metadata (vendor, model, platform, rack)
managed infra device types (server, switch, router, firewall, storage, nas, hypervisor)

Explicit exclusions:

loopback (127.0.0.1, localhost, ::1)
passive external reference assets
inventory-only noise without operational scope signals

Training integrity fix

Fixed inverted training output behavior:

verified successes now write to:
- training-data/gitea-learning-pool/magatamallm/fixes.jsonl
failed/escalated/report-only runs should go to:
- training-data/gitea-learning-pool/magatamallm/errors.jsonl

Deployment and verification

Build:

local npm run build in MAGATAMA completed successfully

Deploy:

synced updated core dist/routes/
synced updated core dist/learning/
restarted PM2 app:
- magatama

Live verification on Erik:

deployed files timestamp updated on /opt/magatama/packages/core/dist/...
first post-restart guard scan log:
- guard — first scan
- AutoResolve guard stale findings resolved: 33
Postgres after deploy:
- open findings = 0

Training pool updates

Two new explicit solution entries were appended to the Gitea-backed MAGATAMA fixes corpus:

Atlas coverage gaps should only become findings for managed operational assets
Verified fixes must never be written into the error corpus

The updated fixes.jsonl was also synced to Erik:

/opt/magatama/training-data/gitea-learning-pool/magatamallm/fixes.jsonl

Important remaining note

Historic pollution may still exist from older runs where failed/escalated items were appended to fixes.jsonl before this path correction. The pathing bug is fixed for future writes, but a later cleanup pass may still be needed to scrub old invalid rows and backfill errors.jsonl.

Operational takeaway

This was a real solution, not a suppression:

Atlas noise is reduced by narrowing operational scope.
Real managed assets can still produce coverage findings.
Verified fixes and failed runs are now separated for future MagatamaLLM training quality.

3.5 KiB Raw Blame History