diff --git a/sync/CURRENT.md b/sync/CURRENT.md index e6884fa..f82ec32 100644 --- a/sync/CURRENT.md +++ b/sync/CURRENT.md @@ -1,6 +1,6 @@ # Current TIP Sync State -Updated: 2026-05-06 08:35 UTC +Updated: 2026-05-06 10:28 UTC ## Active Policy @@ -68,6 +68,23 @@ When work touches TIP, Magatama, LLM Gateway, bridges, auth, or shared Erik infr - MAGATAMA now filters training metrics to verified/trainable examples only. - Failed/escalated MAGATAMA remediation records should go to `errors.jsonl`, not the main `fixes.jsonl`, so the next MagatamaLLM run does not train on junk. - Gitea-backed training pool remains the default target for training writes. +- MAGATAMA coverage-gap and training-integrity hardening on 2026-05-06: + - the earlier `49` medium `atlas-coverage-gap` findings were traced to Atlas treating inventory-only and discovery-only assets as operational protection failures. + - core logic was tightened so Atlas coverage findings now open only for managed operational assets: + - exposure-backed assets + - explicit non-auto owner + - configured telemetry expectation + - critical/high criticality + - infrastructure metadata or managed infra device types + - loopback and passive reference/inventory assets no longer reopen noisy guard findings. + - local build succeeded, the new core dist was deployed to Erik, and the first post-deploy guard scan resolved stale findings. + - live Postgres state after deploy: `open findings = 0`. + - training integrity bug was fixed in `packages/core/src/learning/fix-tracking.ts`: + - verified fixes now append to `training-data/gitea-learning-pool/magatamallm/fixes.jsonl` + - failed/escalated/report-only runs now belong in `errors.jsonl` + - two explicit Codex-written training entries were appended to the MAGATAMA Gitea-backed fixes corpus: + - atlas coverage scope hardening + - training path integrity fix - Complete Codex chat sync was added: - `sync/history/2026-04-29-codex-complete-chat-sync.md` - captures Ghost/blog updates, LinkedIn voice preferences, LPO/AI-fabric blog edits, Rest-Is-Not-Laziness scheduling replacement, and security notes. @@ -190,4 +207,5 @@ There are existing uncommitted changes outside `sync/`. Some are Codex work from - `6c42ca7 docs: add shared agent sync handoff` - `8e7c5aa docs: link llm-gateway sync handoff` - Pending after this update: - - push the refreshed complete-chat sync including MAGATAMA training/compliance state. + - watch whether any future guard exposure findings are genuine operational issues or new false positives. + - if failures still appear inside `fixes.jsonl`, scrub historic pollution and backfill `errors.jsonl`. diff --git a/sync/history/2026-05-06-magatama-coverage-gap-hardening-and-training-fix.md b/sync/history/2026-05-06-magatama-coverage-gap-hardening-and-training-fix.md new file mode 100644 index 0000000..da535fb --- /dev/null +++ b/sync/history/2026-05-06-magatama-coverage-gap-hardening-and-training-fix.md @@ -0,0 +1,107 @@ +# MAGATAMA Coverage-Gap Hardening And Training Fix + +Date: 2026-05-06 +Author: Codex + +## Why this handoff exists + +MAGATAMA looked clean at one point, then reopened a large batch of medium guard findings that were not all meaningful operational incidents. At the same time, verified MAGATAMA fixes were still at risk of landing in the wrong training output path. + +This handoff records the concrete fixes, live verification, and training-pool updates. + +## Problem observed + +- Earlier live state: + - `49` open findings + - all `guard` + - all `medium` + - source: `atlas-coverage-gap` +- Root cause: + - Atlas coverage logic treated passive inventory/discovery assets as if they were operationally managed assets that had failed telemetry obligations. + - This created noisy coverage findings for: + - loopback + - passive inventory-only entries + - external reference assets + - assets without explicit operator intent + +## Code changes made + +MAGATAMA repo: + +- `packages/core/src/routes/health-atlas.ts` +- `packages/core/src/routes/health-builders.ts` +- `packages/core/src/routes/health-types.ts` +- `packages/core/src/learning/fix-tracking.ts` + +### Coverage-gap hardening + +Added scope-aware logic so Atlas coverage findings only open for genuinely managed operational assets. + +Signals now used to justify an operational coverage finding include: + +- exposure evidence (`atlas-exposure`) +- explicit non-auto owner +- configured telemetry expectation +- high/critical criticality +- infrastructure metadata (`vendor`, `model`, `platform`, `rack`) +- managed infra device types (`server`, `switch`, `router`, `firewall`, `storage`, `nas`, `hypervisor`) + +Explicit exclusions: + +- loopback (`127.0.0.1`, `localhost`, `::1`) +- passive external reference assets +- inventory-only noise without operational scope signals + +### Training integrity fix + +Fixed inverted training output behavior: + +- verified successes now write to: + - `training-data/gitea-learning-pool/magatamallm/fixes.jsonl` +- failed/escalated/report-only runs should go to: + - `training-data/gitea-learning-pool/magatamallm/errors.jsonl` + +## Deployment and verification + +Build: + +- local `npm run build` in MAGATAMA completed successfully + +Deploy: + +- synced updated core `dist/routes/` +- synced updated core `dist/learning/` +- restarted PM2 app: + - `magatama` + +Live verification on Erik: + +- deployed files timestamp updated on `/opt/magatama/packages/core/dist/...` +- first post-restart guard scan log: + - `guard — first scan` + - `AutoResolve guard stale findings resolved: 33` +- Postgres after deploy: + - `open findings = 0` + +## Training pool updates + +Two new explicit solution entries were appended to the Gitea-backed MAGATAMA fixes corpus: + +1. `Atlas coverage gaps should only become findings for managed operational assets` +2. `Verified fixes must never be written into the error corpus` + +The updated `fixes.jsonl` was also synced to Erik: + +- `/opt/magatama/training-data/gitea-learning-pool/magatamallm/fixes.jsonl` + +## Important remaining note + +Historic pollution may still exist from older runs where failed/escalated items were appended to `fixes.jsonl` before this path correction. The pathing bug is fixed for future writes, but a later cleanup pass may still be needed to scrub old invalid rows and backfill `errors.jsonl`. + +## Operational takeaway + +This was a real solution, not a suppression: + +- Atlas noise is reduced by narrowing operational scope. +- Real managed assets can still produce coverage findings. +- Verified fixes and failed runs are now separated for future MagatamaLLM training quality.