sync: record magatama atlas rematerialization fix
This commit is contained in:
parent
43b7250180
commit
bba48d3e84
@ -1,9 +1,76 @@
|
|||||||
# Current TIP Sync State
|
# Current TIP Sync State
|
||||||
|
|
||||||
Updated: 2026-05-09 05:45 UTC
|
Updated: 2026-05-09 05:58 UTC
|
||||||
|
|
||||||
## Newest Work
|
## Newest Work
|
||||||
|
|
||||||
|
- MAGATAMA Atlas rematerialization / anti-auto-resolve hardening completed live on 2026-05-09:
|
||||||
|
- operator problem:
|
||||||
|
- Atlas / Findings / Protection Proof had become dishonest again
|
||||||
|
- raw files on Erik still contained:
|
||||||
|
- `3` host audits
|
||||||
|
- `32` live Atlas scan devices
|
||||||
|
- but open findings had collapsed back to `0`
|
||||||
|
- Atlas UI therefore showed an implausibly clean state
|
||||||
|
- verified root cause:
|
||||||
|
- `packages/core/src/routes/health-builders.ts`
|
||||||
|
- `buildProtectionProofResponse()` read Atlas audits/snapshot but did **not** resync findings from those raw sources
|
||||||
|
- `packages/core/src/scheduler.ts`
|
||||||
|
- generic guard stale-auto-resolve treated Atlas-managed findings like ordinary scan findings
|
||||||
|
- newly rematerialized Atlas findings were therefore cleared again almost immediately
|
||||||
|
- code fixed:
|
||||||
|
- `packages/core/src/routes/health-builders.ts`
|
||||||
|
- added `readAtlasSnapshot()`
|
||||||
|
- added `syncAtlasAuditFindings(...)` + `syncAtlasExposureFindings(...)` via a new `syncAtlasOperationalFindings(...)` step
|
||||||
|
- `buildProtectionProofResponse()` now re-materializes Atlas-managed findings from current raw files before building the proof response
|
||||||
|
- `packages/core/src/scheduler.ts`
|
||||||
|
- introduced `ATLAS_MANAGED_FINDING_SOURCES`
|
||||||
|
- generic stale resolution now skips:
|
||||||
|
- `atlas-coverage-gap`
|
||||||
|
- `atlas-exposure`
|
||||||
|
- `atlas-host-audit`
|
||||||
|
- these sources are now left to their own verification-aware resolution logic
|
||||||
|
- live deployment on Erik:
|
||||||
|
- rebuilt `@magatama/core`
|
||||||
|
- synced:
|
||||||
|
- `/opt/magatama/packages/core/dist/routes/health-builders.js`
|
||||||
|
- `/opt/magatama/packages/core/dist/scheduler.js`
|
||||||
|
- restarted PM2 service:
|
||||||
|
- `magatama`
|
||||||
|
- live verification:
|
||||||
|
- before fix:
|
||||||
|
- Atlas raw files present:
|
||||||
|
- audits: `3`
|
||||||
|
- devices: `32`
|
||||||
|
- DB open findings: `0`
|
||||||
|
- after authenticated `/api/protection-proof` rebuild:
|
||||||
|
- DB open findings: `28`
|
||||||
|
- public `/api/findings?limit=5` now shows real open Atlas findings again
|
||||||
|
- public `/api/protection-proof` now reports:
|
||||||
|
- `knownAssets: 57`
|
||||||
|
- `hostsWithTelemetry: 22`
|
||||||
|
- `assetsWithoutTelemetry: 35`
|
||||||
|
- `auditedHosts: 3`
|
||||||
|
- `queueBlocked: 28`
|
||||||
|
- `switchbladeAssets: 5`
|
||||||
|
- `switchbladeRacks: 1`
|
||||||
|
- `switchbladeNmsNodes: 5`
|
||||||
|
- operational truth now:
|
||||||
|
- Atlas and Findings are no longer silently wiped clean by the generic stale resolver
|
||||||
|
- the remaining open state is again honest:
|
||||||
|
- most current open findings are `atlas-coverage-gap`
|
||||||
|
- they reflect missing live telemetry on known inventory/discovery assets
|
||||||
|
- operator note:
|
||||||
|
- browser cache / old UI state may still temporarily show the earlier empty Atlas
|
||||||
|
- hard refresh is required:
|
||||||
|
- `Cmd + Shift + R`
|
||||||
|
- important honest remainder:
|
||||||
|
- this closes the biggest Atlas truthfulness regression
|
||||||
|
- it does **not** yet solve every backend truth issue
|
||||||
|
- still pending:
|
||||||
|
- lane-specific RunPod artifact adoption / automatic version switch
|
||||||
|
- deeper Atlas policy refinement for which inventory-only assets should stay actionable vs informational
|
||||||
|
|
||||||
- TIP automated equivalence research / manual queue cleanup completed on 2026-05-09:
|
- TIP automated equivalence research / manual queue cleanup completed on 2026-05-09:
|
||||||
- operator intent:
|
- operator intent:
|
||||||
- products should be researched well enough that they do not need manual equivalence validation
|
- products should be researched well enough that they do not need manual equivalence validation
|
||||||
|
|||||||
@ -0,0 +1,122 @@
|
|||||||
|
# MAGATAMA Atlas Rematerialization and Stale Resolver Fix
|
||||||
|
|
||||||
|
Date: 2026-05-09
|
||||||
|
|
||||||
|
## Problem
|
||||||
|
|
||||||
|
MAGATAMA had fallen back into an untrustworthy state:
|
||||||
|
|
||||||
|
- Atlas raw sources on Erik still existed and were current:
|
||||||
|
- `security-atlas-audits.json` with `3` audits
|
||||||
|
- `security-atlas-snapshot.json` with `32` devices
|
||||||
|
- but open findings in Postgres had collapsed back to `0`
|
||||||
|
- Atlas UI therefore looked implausibly empty / clean
|
||||||
|
|
||||||
|
The operator requirement was explicit:
|
||||||
|
|
||||||
|
- this must not silently happen again
|
||||||
|
- MAGATAMA must reflect real protection gaps honestly
|
||||||
|
|
||||||
|
## Root Cause
|
||||||
|
|
||||||
|
Two independent backend problems combined:
|
||||||
|
|
||||||
|
1. `buildProtectionProofResponse()` read Atlas raw files but did not resync findings from them.
|
||||||
|
2. Generic stale finding auto-resolution in the scheduler treated Atlas-managed findings like ordinary guard findings and resolved them too aggressively.
|
||||||
|
|
||||||
|
## Code Changes
|
||||||
|
|
||||||
|
### `packages/core/src/routes/health-builders.ts`
|
||||||
|
|
||||||
|
- added `readAtlasSnapshot()`
|
||||||
|
- imported `syncAtlasAuditFindings(...)`
|
||||||
|
- imported `syncAtlasExposureFindings(...)`
|
||||||
|
- introduced `syncAtlasOperationalFindings(...)`
|
||||||
|
- `buildProtectionProofResponse()` now calls that helper before building the proof payload
|
||||||
|
|
||||||
|
Effect:
|
||||||
|
|
||||||
|
- normal proof/Atlas reads now rematerialize current Atlas findings from the raw audit/snapshot files
|
||||||
|
|
||||||
|
### `packages/core/src/scheduler.ts`
|
||||||
|
|
||||||
|
- added:
|
||||||
|
- `ATLAS_MANAGED_FINDING_SOURCES`
|
||||||
|
- `isAtlasManagedFindingSource(...)`
|
||||||
|
- generic stale resolution now skips:
|
||||||
|
- `atlas-coverage-gap`
|
||||||
|
- `atlas-exposure`
|
||||||
|
- `atlas-host-audit`
|
||||||
|
|
||||||
|
Effect:
|
||||||
|
|
||||||
|
- Atlas-managed findings are no longer erased by the generic guard stale resolver
|
||||||
|
- they stay under their own verification-aware lifecycle
|
||||||
|
|
||||||
|
## Live Deployment
|
||||||
|
|
||||||
|
Deployed to Erik:
|
||||||
|
|
||||||
|
- rebuilt `@magatama/core`
|
||||||
|
- synced:
|
||||||
|
- `/opt/magatama/packages/core/dist/routes/health-builders.js`
|
||||||
|
- `/opt/magatama/packages/core/dist/scheduler.js`
|
||||||
|
- restarted PM2 app:
|
||||||
|
- `magatama`
|
||||||
|
|
||||||
|
## Live Verification
|
||||||
|
|
||||||
|
### Before
|
||||||
|
|
||||||
|
- raw files existed:
|
||||||
|
- audits: `3`
|
||||||
|
- devices: `32`
|
||||||
|
- DB open findings: `0`
|
||||||
|
|
||||||
|
### After protected proof rebuild
|
||||||
|
|
||||||
|
- authenticated local `/api/protection-proof` trigger on Erik
|
||||||
|
- DB open findings rematerialized to: `28`
|
||||||
|
|
||||||
|
### Public verification
|
||||||
|
|
||||||
|
Public MAGATAMA APIs now again expose real open state:
|
||||||
|
|
||||||
|
- `/api/findings?limit=5`
|
||||||
|
- returns open `atlas-coverage-gap` findings again
|
||||||
|
- `/api/protection-proof`
|
||||||
|
- `knownAssets: 57`
|
||||||
|
- `hostsWithTelemetry: 22`
|
||||||
|
- `assetsWithoutTelemetry: 35`
|
||||||
|
- `auditedHosts: 3`
|
||||||
|
- `queueBlocked: 28`
|
||||||
|
- `switchbladeAssets: 5`
|
||||||
|
- `switchbladeRacks: 1`
|
||||||
|
- `switchbladeNmsNodes: 5`
|
||||||
|
|
||||||
|
## Operational Truth
|
||||||
|
|
||||||
|
The major Atlas truthfulness regression is fixed:
|
||||||
|
|
||||||
|
- Atlas and Findings no longer silently collapse to a fake clean state when raw Atlas data still contains real problems
|
||||||
|
|
||||||
|
What remains true:
|
||||||
|
|
||||||
|
- most currently open Atlas findings are coverage gaps
|
||||||
|
- they represent real missing live telemetry on known assets
|
||||||
|
|
||||||
|
## Remaining Work
|
||||||
|
|
||||||
|
Still not fully closed:
|
||||||
|
|
||||||
|
- lane-specific RunPod artifact adoption and automatic version switching
|
||||||
|
- further Atlas policy refinement so inventory-only assets can be split more cleanly into:
|
||||||
|
- actionable operational gaps
|
||||||
|
- informational inventory/discovery context
|
||||||
|
|
||||||
|
## Operator Note
|
||||||
|
|
||||||
|
If the browser still shows the older empty Atlas state after deployment:
|
||||||
|
|
||||||
|
- hard refresh:
|
||||||
|
- `Cmd + Shift + R`
|
||||||
Loading…
x
Reference in New Issue
Block a user