sync: record magatama atlas rematerialization fix

This commit is contained in:
Rene Fichtmueller 2026-05-09 08:02:54 +02:00
parent 43b7250180
commit bba48d3e84
2 changed files with 190 additions and 1 deletions

View File

@ -1,9 +1,76 @@
# Current TIP Sync State # Current TIP Sync State
Updated: 2026-05-09 05:45 UTC Updated: 2026-05-09 05:58 UTC
## Newest Work ## Newest Work
- MAGATAMA Atlas rematerialization / anti-auto-resolve hardening completed live on 2026-05-09:
- operator problem:
- Atlas / Findings / Protection Proof had become dishonest again
- raw files on Erik still contained:
- `3` host audits
- `32` live Atlas scan devices
- but open findings had collapsed back to `0`
- Atlas UI therefore showed an implausibly clean state
- verified root cause:
- `packages/core/src/routes/health-builders.ts`
- `buildProtectionProofResponse()` read Atlas audits/snapshot but did **not** resync findings from those raw sources
- `packages/core/src/scheduler.ts`
- generic guard stale-auto-resolve treated Atlas-managed findings like ordinary scan findings
- newly rematerialized Atlas findings were therefore cleared again almost immediately
- code fixed:
- `packages/core/src/routes/health-builders.ts`
- added `readAtlasSnapshot()`
- added `syncAtlasAuditFindings(...)` + `syncAtlasExposureFindings(...)` via a new `syncAtlasOperationalFindings(...)` step
- `buildProtectionProofResponse()` now re-materializes Atlas-managed findings from current raw files before building the proof response
- `packages/core/src/scheduler.ts`
- introduced `ATLAS_MANAGED_FINDING_SOURCES`
- generic stale resolution now skips:
- `atlas-coverage-gap`
- `atlas-exposure`
- `atlas-host-audit`
- these sources are now left to their own verification-aware resolution logic
- live deployment on Erik:
- rebuilt `@magatama/core`
- synced:
- `/opt/magatama/packages/core/dist/routes/health-builders.js`
- `/opt/magatama/packages/core/dist/scheduler.js`
- restarted PM2 service:
- `magatama`
- live verification:
- before fix:
- Atlas raw files present:
- audits: `3`
- devices: `32`
- DB open findings: `0`
- after authenticated `/api/protection-proof` rebuild:
- DB open findings: `28`
- public `/api/findings?limit=5` now shows real open Atlas findings again
- public `/api/protection-proof` now reports:
- `knownAssets: 57`
- `hostsWithTelemetry: 22`
- `assetsWithoutTelemetry: 35`
- `auditedHosts: 3`
- `queueBlocked: 28`
- `switchbladeAssets: 5`
- `switchbladeRacks: 1`
- `switchbladeNmsNodes: 5`
- operational truth now:
- Atlas and Findings are no longer silently wiped clean by the generic stale resolver
- the remaining open state is again honest:
- most current open findings are `atlas-coverage-gap`
- they reflect missing live telemetry on known inventory/discovery assets
- operator note:
- browser cache / old UI state may still temporarily show the earlier empty Atlas
- hard refresh is required:
- `Cmd + Shift + R`
- important honest remainder:
- this closes the biggest Atlas truthfulness regression
- it does **not** yet solve every backend truth issue
- still pending:
- lane-specific RunPod artifact adoption / automatic version switch
- deeper Atlas policy refinement for which inventory-only assets should stay actionable vs informational
- TIP automated equivalence research / manual queue cleanup completed on 2026-05-09: - TIP automated equivalence research / manual queue cleanup completed on 2026-05-09:
- operator intent: - operator intent:
- products should be researched well enough that they do not need manual equivalence validation - products should be researched well enough that they do not need manual equivalence validation

View File

@ -0,0 +1,122 @@
# MAGATAMA Atlas Rematerialization and Stale Resolver Fix
Date: 2026-05-09
## Problem
MAGATAMA had fallen back into an untrustworthy state:
- Atlas raw sources on Erik still existed and were current:
- `security-atlas-audits.json` with `3` audits
- `security-atlas-snapshot.json` with `32` devices
- but open findings in Postgres had collapsed back to `0`
- Atlas UI therefore looked implausibly empty / clean
The operator requirement was explicit:
- this must not silently happen again
- MAGATAMA must reflect real protection gaps honestly
## Root Cause
Two independent backend problems combined:
1. `buildProtectionProofResponse()` read Atlas raw files but did not resync findings from them.
2. Generic stale finding auto-resolution in the scheduler treated Atlas-managed findings like ordinary guard findings and resolved them too aggressively.
## Code Changes
### `packages/core/src/routes/health-builders.ts`
- added `readAtlasSnapshot()`
- imported `syncAtlasAuditFindings(...)`
- imported `syncAtlasExposureFindings(...)`
- introduced `syncAtlasOperationalFindings(...)`
- `buildProtectionProofResponse()` now calls that helper before building the proof payload
Effect:
- normal proof/Atlas reads now rematerialize current Atlas findings from the raw audit/snapshot files
### `packages/core/src/scheduler.ts`
- added:
- `ATLAS_MANAGED_FINDING_SOURCES`
- `isAtlasManagedFindingSource(...)`
- generic stale resolution now skips:
- `atlas-coverage-gap`
- `atlas-exposure`
- `atlas-host-audit`
Effect:
- Atlas-managed findings are no longer erased by the generic guard stale resolver
- they stay under their own verification-aware lifecycle
## Live Deployment
Deployed to Erik:
- rebuilt `@magatama/core`
- synced:
- `/opt/magatama/packages/core/dist/routes/health-builders.js`
- `/opt/magatama/packages/core/dist/scheduler.js`
- restarted PM2 app:
- `magatama`
## Live Verification
### Before
- raw files existed:
- audits: `3`
- devices: `32`
- DB open findings: `0`
### After protected proof rebuild
- authenticated local `/api/protection-proof` trigger on Erik
- DB open findings rematerialized to: `28`
### Public verification
Public MAGATAMA APIs now again expose real open state:
- `/api/findings?limit=5`
- returns open `atlas-coverage-gap` findings again
- `/api/protection-proof`
- `knownAssets: 57`
- `hostsWithTelemetry: 22`
- `assetsWithoutTelemetry: 35`
- `auditedHosts: 3`
- `queueBlocked: 28`
- `switchbladeAssets: 5`
- `switchbladeRacks: 1`
- `switchbladeNmsNodes: 5`
## Operational Truth
The major Atlas truthfulness regression is fixed:
- Atlas and Findings no longer silently collapse to a fake clean state when raw Atlas data still contains real problems
What remains true:
- most currently open Atlas findings are coverage gaps
- they represent real missing live telemetry on known assets
## Remaining Work
Still not fully closed:
- lane-specific RunPod artifact adoption and automatic version switching
- further Atlas policy refinement so inventory-only assets can be split more cleanly into:
- actionable operational gaps
- informational inventory/discovery context
## Operator Note
If the browser still shows the older empty Atlas state after deployment:
- hard refresh:
- `Cmd + Shift + R`