transceiver-db/sync/history/2026-05-09-magatama-atlas-rematerialization-and-stale-resolver-fix.md
2026-05-09 08:02:54 +02:00

3.2 KiB

MAGATAMA Atlas Rematerialization and Stale Resolver Fix

Date: 2026-05-09

Problem

MAGATAMA had fallen back into an untrustworthy state:

  • Atlas raw sources on Erik still existed and were current:
    • security-atlas-audits.json with 3 audits
    • security-atlas-snapshot.json with 32 devices
  • but open findings in Postgres had collapsed back to 0
  • Atlas UI therefore looked implausibly empty / clean

The operator requirement was explicit:

  • this must not silently happen again
  • MAGATAMA must reflect real protection gaps honestly

Root Cause

Two independent backend problems combined:

  1. buildProtectionProofResponse() read Atlas raw files but did not resync findings from them.
  2. Generic stale finding auto-resolution in the scheduler treated Atlas-managed findings like ordinary guard findings and resolved them too aggressively.

Code Changes

packages/core/src/routes/health-builders.ts

  • added readAtlasSnapshot()
  • imported syncAtlasAuditFindings(...)
  • imported syncAtlasExposureFindings(...)
  • introduced syncAtlasOperationalFindings(...)
  • buildProtectionProofResponse() now calls that helper before building the proof payload

Effect:

  • normal proof/Atlas reads now rematerialize current Atlas findings from the raw audit/snapshot files

packages/core/src/scheduler.ts

  • added:
    • ATLAS_MANAGED_FINDING_SOURCES
    • isAtlasManagedFindingSource(...)
  • generic stale resolution now skips:
    • atlas-coverage-gap
    • atlas-exposure
    • atlas-host-audit

Effect:

  • Atlas-managed findings are no longer erased by the generic guard stale resolver
  • they stay under their own verification-aware lifecycle

Live Deployment

Deployed to Erik:

  • rebuilt @magatama/core
  • synced:
    • /opt/magatama/packages/core/dist/routes/health-builders.js
    • /opt/magatama/packages/core/dist/scheduler.js
  • restarted PM2 app:
    • magatama

Live Verification

Before

  • raw files existed:
    • audits: 3
    • devices: 32
  • DB open findings: 0

After protected proof rebuild

  • authenticated local /api/protection-proof trigger on Erik
  • DB open findings rematerialized to: 28

Public verification

Public MAGATAMA APIs now again expose real open state:

  • /api/findings?limit=5
    • returns open atlas-coverage-gap findings again
  • /api/protection-proof
    • knownAssets: 57
    • hostsWithTelemetry: 22
    • assetsWithoutTelemetry: 35
    • auditedHosts: 3
    • queueBlocked: 28
    • switchbladeAssets: 5
    • switchbladeRacks: 1
    • switchbladeNmsNodes: 5

Operational Truth

The major Atlas truthfulness regression is fixed:

  • Atlas and Findings no longer silently collapse to a fake clean state when raw Atlas data still contains real problems

What remains true:

  • most currently open Atlas findings are coverage gaps
  • they represent real missing live telemetry on known assets

Remaining Work

Still not fully closed:

  • lane-specific RunPod artifact adoption and automatic version switching
  • further Atlas policy refinement so inventory-only assets can be split more cleanly into:
    • actionable operational gaps
    • informational inventory/discovery context

Operator Note

If the browser still shows the older empty Atlas state after deployment:

  • hard refresh:
    • Cmd + Shift + R