From bba48d3e8411c757e177596f9269385d620f9bb8 Mon Sep 17 00:00:00 2001 From: Rene Fichtmueller Date: Sat, 9 May 2026 08:02:54 +0200 Subject: [PATCH] sync: record magatama atlas rematerialization fix --- sync/CURRENT.md | 69 +++++++++- ...ematerialization-and-stale-resolver-fix.md | 122 ++++++++++++++++++ 2 files changed, 190 insertions(+), 1 deletion(-) create mode 100644 sync/history/2026-05-09-magatama-atlas-rematerialization-and-stale-resolver-fix.md diff --git a/sync/CURRENT.md b/sync/CURRENT.md index 3833126..4a5aae0 100644 --- a/sync/CURRENT.md +++ b/sync/CURRENT.md @@ -1,9 +1,76 @@ # Current TIP Sync State -Updated: 2026-05-09 05:45 UTC +Updated: 2026-05-09 05:58 UTC ## Newest Work +- MAGATAMA Atlas rematerialization / anti-auto-resolve hardening completed live on 2026-05-09: + - operator problem: + - Atlas / Findings / Protection Proof had become dishonest again + - raw files on Erik still contained: + - `3` host audits + - `32` live Atlas scan devices + - but open findings had collapsed back to `0` + - Atlas UI therefore showed an implausibly clean state + - verified root cause: + - `packages/core/src/routes/health-builders.ts` + - `buildProtectionProofResponse()` read Atlas audits/snapshot but did **not** resync findings from those raw sources + - `packages/core/src/scheduler.ts` + - generic guard stale-auto-resolve treated Atlas-managed findings like ordinary scan findings + - newly rematerialized Atlas findings were therefore cleared again almost immediately + - code fixed: + - `packages/core/src/routes/health-builders.ts` + - added `readAtlasSnapshot()` + - added `syncAtlasAuditFindings(...)` + `syncAtlasExposureFindings(...)` via a new `syncAtlasOperationalFindings(...)` step + - `buildProtectionProofResponse()` now re-materializes Atlas-managed findings from current raw files before building the proof response + - `packages/core/src/scheduler.ts` + - introduced `ATLAS_MANAGED_FINDING_SOURCES` + - generic stale resolution now skips: + - `atlas-coverage-gap` + - `atlas-exposure` + - `atlas-host-audit` + - these sources are now left to their own verification-aware resolution logic + - live deployment on Erik: + - rebuilt `@magatama/core` + - synced: + - `/opt/magatama/packages/core/dist/routes/health-builders.js` + - `/opt/magatama/packages/core/dist/scheduler.js` + - restarted PM2 service: + - `magatama` + - live verification: + - before fix: + - Atlas raw files present: + - audits: `3` + - devices: `32` + - DB open findings: `0` + - after authenticated `/api/protection-proof` rebuild: + - DB open findings: `28` + - public `/api/findings?limit=5` now shows real open Atlas findings again + - public `/api/protection-proof` now reports: + - `knownAssets: 57` + - `hostsWithTelemetry: 22` + - `assetsWithoutTelemetry: 35` + - `auditedHosts: 3` + - `queueBlocked: 28` + - `switchbladeAssets: 5` + - `switchbladeRacks: 1` + - `switchbladeNmsNodes: 5` + - operational truth now: + - Atlas and Findings are no longer silently wiped clean by the generic stale resolver + - the remaining open state is again honest: + - most current open findings are `atlas-coverage-gap` + - they reflect missing live telemetry on known inventory/discovery assets + - operator note: + - browser cache / old UI state may still temporarily show the earlier empty Atlas + - hard refresh is required: + - `Cmd + Shift + R` + - important honest remainder: + - this closes the biggest Atlas truthfulness regression + - it does **not** yet solve every backend truth issue + - still pending: + - lane-specific RunPod artifact adoption / automatic version switch + - deeper Atlas policy refinement for which inventory-only assets should stay actionable vs informational + - TIP automated equivalence research / manual queue cleanup completed on 2026-05-09: - operator intent: - products should be researched well enough that they do not need manual equivalence validation diff --git a/sync/history/2026-05-09-magatama-atlas-rematerialization-and-stale-resolver-fix.md b/sync/history/2026-05-09-magatama-atlas-rematerialization-and-stale-resolver-fix.md new file mode 100644 index 0000000..3dfb921 --- /dev/null +++ b/sync/history/2026-05-09-magatama-atlas-rematerialization-and-stale-resolver-fix.md @@ -0,0 +1,122 @@ +# MAGATAMA Atlas Rematerialization and Stale Resolver Fix + +Date: 2026-05-09 + +## Problem + +MAGATAMA had fallen back into an untrustworthy state: + +- Atlas raw sources on Erik still existed and were current: + - `security-atlas-audits.json` with `3` audits + - `security-atlas-snapshot.json` with `32` devices +- but open findings in Postgres had collapsed back to `0` +- Atlas UI therefore looked implausibly empty / clean + +The operator requirement was explicit: + +- this must not silently happen again +- MAGATAMA must reflect real protection gaps honestly + +## Root Cause + +Two independent backend problems combined: + +1. `buildProtectionProofResponse()` read Atlas raw files but did not resync findings from them. +2. Generic stale finding auto-resolution in the scheduler treated Atlas-managed findings like ordinary guard findings and resolved them too aggressively. + +## Code Changes + +### `packages/core/src/routes/health-builders.ts` + +- added `readAtlasSnapshot()` +- imported `syncAtlasAuditFindings(...)` +- imported `syncAtlasExposureFindings(...)` +- introduced `syncAtlasOperationalFindings(...)` +- `buildProtectionProofResponse()` now calls that helper before building the proof payload + +Effect: + +- normal proof/Atlas reads now rematerialize current Atlas findings from the raw audit/snapshot files + +### `packages/core/src/scheduler.ts` + +- added: + - `ATLAS_MANAGED_FINDING_SOURCES` + - `isAtlasManagedFindingSource(...)` +- generic stale resolution now skips: + - `atlas-coverage-gap` + - `atlas-exposure` + - `atlas-host-audit` + +Effect: + +- Atlas-managed findings are no longer erased by the generic guard stale resolver +- they stay under their own verification-aware lifecycle + +## Live Deployment + +Deployed to Erik: + +- rebuilt `@magatama/core` +- synced: + - `/opt/magatama/packages/core/dist/routes/health-builders.js` + - `/opt/magatama/packages/core/dist/scheduler.js` +- restarted PM2 app: + - `magatama` + +## Live Verification + +### Before + +- raw files existed: + - audits: `3` + - devices: `32` +- DB open findings: `0` + +### After protected proof rebuild + +- authenticated local `/api/protection-proof` trigger on Erik +- DB open findings rematerialized to: `28` + +### Public verification + +Public MAGATAMA APIs now again expose real open state: + +- `/api/findings?limit=5` + - returns open `atlas-coverage-gap` findings again +- `/api/protection-proof` + - `knownAssets: 57` + - `hostsWithTelemetry: 22` + - `assetsWithoutTelemetry: 35` + - `auditedHosts: 3` + - `queueBlocked: 28` + - `switchbladeAssets: 5` + - `switchbladeRacks: 1` + - `switchbladeNmsNodes: 5` + +## Operational Truth + +The major Atlas truthfulness regression is fixed: + +- Atlas and Findings no longer silently collapse to a fake clean state when raw Atlas data still contains real problems + +What remains true: + +- most currently open Atlas findings are coverage gaps +- they represent real missing live telemetry on known assets + +## Remaining Work + +Still not fully closed: + +- lane-specific RunPod artifact adoption and automatic version switching +- further Atlas policy refinement so inventory-only assets can be split more cleanly into: + - actionable operational gaps + - informational inventory/discovery context + +## Operator Note + +If the browser still shows the older empty Atlas state after deployment: + +- hard refresh: + - `Cmd + Shift + R`