sync: record magatama atlas rematerialization fix
This commit is contained in:
parent
43b7250180
commit
bba48d3e84
@ -1,9 +1,76 @@
|
||||
# Current TIP Sync State
|
||||
|
||||
Updated: 2026-05-09 05:45 UTC
|
||||
Updated: 2026-05-09 05:58 UTC
|
||||
|
||||
## Newest Work
|
||||
|
||||
- MAGATAMA Atlas rematerialization / anti-auto-resolve hardening completed live on 2026-05-09:
|
||||
- operator problem:
|
||||
- Atlas / Findings / Protection Proof had become dishonest again
|
||||
- raw files on Erik still contained:
|
||||
- `3` host audits
|
||||
- `32` live Atlas scan devices
|
||||
- but open findings had collapsed back to `0`
|
||||
- Atlas UI therefore showed an implausibly clean state
|
||||
- verified root cause:
|
||||
- `packages/core/src/routes/health-builders.ts`
|
||||
- `buildProtectionProofResponse()` read Atlas audits/snapshot but did **not** resync findings from those raw sources
|
||||
- `packages/core/src/scheduler.ts`
|
||||
- generic guard stale-auto-resolve treated Atlas-managed findings like ordinary scan findings
|
||||
- newly rematerialized Atlas findings were therefore cleared again almost immediately
|
||||
- code fixed:
|
||||
- `packages/core/src/routes/health-builders.ts`
|
||||
- added `readAtlasSnapshot()`
|
||||
- added `syncAtlasAuditFindings(...)` + `syncAtlasExposureFindings(...)` via a new `syncAtlasOperationalFindings(...)` step
|
||||
- `buildProtectionProofResponse()` now re-materializes Atlas-managed findings from current raw files before building the proof response
|
||||
- `packages/core/src/scheduler.ts`
|
||||
- introduced `ATLAS_MANAGED_FINDING_SOURCES`
|
||||
- generic stale resolution now skips:
|
||||
- `atlas-coverage-gap`
|
||||
- `atlas-exposure`
|
||||
- `atlas-host-audit`
|
||||
- these sources are now left to their own verification-aware resolution logic
|
||||
- live deployment on Erik:
|
||||
- rebuilt `@magatama/core`
|
||||
- synced:
|
||||
- `/opt/magatama/packages/core/dist/routes/health-builders.js`
|
||||
- `/opt/magatama/packages/core/dist/scheduler.js`
|
||||
- restarted PM2 service:
|
||||
- `magatama`
|
||||
- live verification:
|
||||
- before fix:
|
||||
- Atlas raw files present:
|
||||
- audits: `3`
|
||||
- devices: `32`
|
||||
- DB open findings: `0`
|
||||
- after authenticated `/api/protection-proof` rebuild:
|
||||
- DB open findings: `28`
|
||||
- public `/api/findings?limit=5` now shows real open Atlas findings again
|
||||
- public `/api/protection-proof` now reports:
|
||||
- `knownAssets: 57`
|
||||
- `hostsWithTelemetry: 22`
|
||||
- `assetsWithoutTelemetry: 35`
|
||||
- `auditedHosts: 3`
|
||||
- `queueBlocked: 28`
|
||||
- `switchbladeAssets: 5`
|
||||
- `switchbladeRacks: 1`
|
||||
- `switchbladeNmsNodes: 5`
|
||||
- operational truth now:
|
||||
- Atlas and Findings are no longer silently wiped clean by the generic stale resolver
|
||||
- the remaining open state is again honest:
|
||||
- most current open findings are `atlas-coverage-gap`
|
||||
- they reflect missing live telemetry on known inventory/discovery assets
|
||||
- operator note:
|
||||
- browser cache / old UI state may still temporarily show the earlier empty Atlas
|
||||
- hard refresh is required:
|
||||
- `Cmd + Shift + R`
|
||||
- important honest remainder:
|
||||
- this closes the biggest Atlas truthfulness regression
|
||||
- it does **not** yet solve every backend truth issue
|
||||
- still pending:
|
||||
- lane-specific RunPod artifact adoption / automatic version switch
|
||||
- deeper Atlas policy refinement for which inventory-only assets should stay actionable vs informational
|
||||
|
||||
- TIP automated equivalence research / manual queue cleanup completed on 2026-05-09:
|
||||
- operator intent:
|
||||
- products should be researched well enough that they do not need manual equivalence validation
|
||||
|
||||
@ -0,0 +1,122 @@
|
||||
# MAGATAMA Atlas Rematerialization and Stale Resolver Fix
|
||||
|
||||
Date: 2026-05-09
|
||||
|
||||
## Problem
|
||||
|
||||
MAGATAMA had fallen back into an untrustworthy state:
|
||||
|
||||
- Atlas raw sources on Erik still existed and were current:
|
||||
- `security-atlas-audits.json` with `3` audits
|
||||
- `security-atlas-snapshot.json` with `32` devices
|
||||
- but open findings in Postgres had collapsed back to `0`
|
||||
- Atlas UI therefore looked implausibly empty / clean
|
||||
|
||||
The operator requirement was explicit:
|
||||
|
||||
- this must not silently happen again
|
||||
- MAGATAMA must reflect real protection gaps honestly
|
||||
|
||||
## Root Cause
|
||||
|
||||
Two independent backend problems combined:
|
||||
|
||||
1. `buildProtectionProofResponse()` read Atlas raw files but did not resync findings from them.
|
||||
2. Generic stale finding auto-resolution in the scheduler treated Atlas-managed findings like ordinary guard findings and resolved them too aggressively.
|
||||
|
||||
## Code Changes
|
||||
|
||||
### `packages/core/src/routes/health-builders.ts`
|
||||
|
||||
- added `readAtlasSnapshot()`
|
||||
- imported `syncAtlasAuditFindings(...)`
|
||||
- imported `syncAtlasExposureFindings(...)`
|
||||
- introduced `syncAtlasOperationalFindings(...)`
|
||||
- `buildProtectionProofResponse()` now calls that helper before building the proof payload
|
||||
|
||||
Effect:
|
||||
|
||||
- normal proof/Atlas reads now rematerialize current Atlas findings from the raw audit/snapshot files
|
||||
|
||||
### `packages/core/src/scheduler.ts`
|
||||
|
||||
- added:
|
||||
- `ATLAS_MANAGED_FINDING_SOURCES`
|
||||
- `isAtlasManagedFindingSource(...)`
|
||||
- generic stale resolution now skips:
|
||||
- `atlas-coverage-gap`
|
||||
- `atlas-exposure`
|
||||
- `atlas-host-audit`
|
||||
|
||||
Effect:
|
||||
|
||||
- Atlas-managed findings are no longer erased by the generic guard stale resolver
|
||||
- they stay under their own verification-aware lifecycle
|
||||
|
||||
## Live Deployment
|
||||
|
||||
Deployed to Erik:
|
||||
|
||||
- rebuilt `@magatama/core`
|
||||
- synced:
|
||||
- `/opt/magatama/packages/core/dist/routes/health-builders.js`
|
||||
- `/opt/magatama/packages/core/dist/scheduler.js`
|
||||
- restarted PM2 app:
|
||||
- `magatama`
|
||||
|
||||
## Live Verification
|
||||
|
||||
### Before
|
||||
|
||||
- raw files existed:
|
||||
- audits: `3`
|
||||
- devices: `32`
|
||||
- DB open findings: `0`
|
||||
|
||||
### After protected proof rebuild
|
||||
|
||||
- authenticated local `/api/protection-proof` trigger on Erik
|
||||
- DB open findings rematerialized to: `28`
|
||||
|
||||
### Public verification
|
||||
|
||||
Public MAGATAMA APIs now again expose real open state:
|
||||
|
||||
- `/api/findings?limit=5`
|
||||
- returns open `atlas-coverage-gap` findings again
|
||||
- `/api/protection-proof`
|
||||
- `knownAssets: 57`
|
||||
- `hostsWithTelemetry: 22`
|
||||
- `assetsWithoutTelemetry: 35`
|
||||
- `auditedHosts: 3`
|
||||
- `queueBlocked: 28`
|
||||
- `switchbladeAssets: 5`
|
||||
- `switchbladeRacks: 1`
|
||||
- `switchbladeNmsNodes: 5`
|
||||
|
||||
## Operational Truth
|
||||
|
||||
The major Atlas truthfulness regression is fixed:
|
||||
|
||||
- Atlas and Findings no longer silently collapse to a fake clean state when raw Atlas data still contains real problems
|
||||
|
||||
What remains true:
|
||||
|
||||
- most currently open Atlas findings are coverage gaps
|
||||
- they represent real missing live telemetry on known assets
|
||||
|
||||
## Remaining Work
|
||||
|
||||
Still not fully closed:
|
||||
|
||||
- lane-specific RunPod artifact adoption and automatic version switching
|
||||
- further Atlas policy refinement so inventory-only assets can be split more cleanly into:
|
||||
- actionable operational gaps
|
||||
- informational inventory/discovery context
|
||||
|
||||
## Operator Note
|
||||
|
||||
If the browser still shows the older empty Atlas state after deployment:
|
||||
|
||||
- hard refresh:
|
||||
- `Cmd + Shift + R`
|
||||
Loading…
x
Reference in New Issue
Block a user