From d588a20a54ff7a30ccfc3d682294b62f27ff6038 Mon Sep 17 00:00:00 2001 From: Rene Fichtmueller Date: Sat, 9 May 2026 09:38:22 +0200 Subject: [PATCH] sync: record magatama atlas fallback and lane registry fix --- sync/CURRENT.md | 78 ++++++++++++++++-- ...ve-atlas-fallback-and-lane-registry-fix.md | 79 +++++++++++++++++++ 2 files changed, 152 insertions(+), 5 deletions(-) create mode 100644 sync/history/2026-05-09-magatama-live-atlas-fallback-and-lane-registry-fix.md diff --git a/sync/CURRENT.md b/sync/CURRENT.md index 8332aa8..8c142ad 100644 --- a/sync/CURRENT.md +++ b/sync/CURRENT.md @@ -54,8 +54,8 @@ Updated: 2026-05-09 07:34 UTC - HTML product-like rows: `626` - price verified: `626` - image verified: `622` - - details verified: `624` - - price+image+details verified: `620` + - details verified: `626` + - price+image+details verified: `622` - fully verified: `620` - filter/category rows with no verification: `108` - other non-product/generic rows with no verification: `10` @@ -74,9 +74,9 @@ Updated: 2026-05-09 07:34 UTC - remaining truth: - active/product-like Flexoptix rows are much closer to complete - not all `744` Flexoptix rows can honestly be 100% verified because `118` are filter/category/generic/non-product URLs rather than concrete product pages - - remaining HTML product-like gaps observed before SSH became unavailable: - - `4` product-like rows without image verification - - `2` FLEXBOX/accessory-like rows without reach/details + - remaining HTML product-like gaps after final source check: + - `4` product-like rows without image verification because Flexoptix exposes only `placeholder-flexoptix.jpg` as `og:image` + - `2` FLEXBOX/accessory-like rows were classified as `Accessory`, `reach_label=N/A`, `details_verified=true` - operational note: - Erik SSH became unavailable with `connection refused` after the last verification checks - public TIP HTTPS still responded through Cloudflare @@ -1314,6 +1314,74 @@ There are existing uncommitted changes outside `sync/`. Some are Codex work from - `6c42ca7 docs: add shared agent sync handoff` - `8e7c5aa docs: link llm-gateway sync handoff` +- `bba48d3 sync: record magatama atlas rematerialization fix` +- `fd29bee sync: record magatama atlas fallback and port detail live fixes` +- `8b42077 sync: refresh cross-agent chat handoff` - Pending after this update: - watch whether any future guard exposure findings are genuine operational issues or new false positives. - if failures still appear inside `fixes.jsonl`, scrub historic pollution and backfill `errors.jsonl`. + +## 2026-05-09 Addendum — Live Atlas + Lane Registry Truth + +### Atlas / Findings + +- MAGATAMA Atlas was not actually empty; the public UI could still look blank while live proof data already showed: + - `knownAssets: 57` + - `hostsWithTelemetry: 22` + - `assetsWithoutTelemetry: 35` + - `auditedHosts: 3` + - `queueBlocked: 28` +- Root causes fixed live: + 1. `packages/core/src/routes/health-builders.ts` + - Atlas audits / exposure now rematerialize operational findings before proof rendering. + 2. `packages/core/src/scheduler.ts` + - generic stale auto-resolve no longer auto-closes: + - `atlas-coverage-gap` + - `atlas-exposure` + - `atlas-host-audit` + 3. `packages/dashboard/public/index-v2.html` + - if proof data is temporarily empty or stale, Atlas now derives a fallback proof model from the current snapshot so the top cards do not render as blank. +- Live public verification after deploy: + - `/api/protection-proof` shows non-zero Atlas truth again. + - `/api/findings?limit=10` shows open `atlas-coverage-gap` findings again. + +### Training / Lane Registry + +- The public training status is now honest for the current live state: + - `magatamallm` + - `datasetSource: url` + - `collectionsPath: /opt/magatama/training-data/runpod/magatamallm/manifest.json` + - `15679 train` + - `1743 eval` + - `17422 total` + - `lastRegistryRunStatus: completed_without_model_artifact` + - `fo_blogllm` + - lane registry rebuilt on Erik + - `lastRunStatus: completed_without_model_artifact` + - `tip_llm` + - lane registry rebuilt on Erik + - `lastRunStatus: completed_without_model_artifact` +- `scripts/model_registry_build.ts` now compiles per-lane metadata from: + - lane datasets + - lane RunPod manifests + - `training-runs.json` +- Live compiled registry on Erik now no longer sits at all-`null`; it exposes: + - `activeModel` + - `version` + - `lastRunId` + - `lastRunStatus` + - `datasetSource` + - `collectionsPath` + +### Still Outstanding + +- Full automatic training is still blocked by the managed RunPod Axolotl endpoint: + - jobs reach `COMPLETED` + - but no adoptable artifact is returned + - therefore MAGATAMA correctly records: + - `completed_without_model_artifact` +- That means: + - no new model version can be truthfully activated yet + - no Ollama alias switch should happen yet +- Remaining real blocker: + - move to `custom-magatama` RunPod worker with explicit adapter/model artifact publication. diff --git a/sync/history/2026-05-09-magatama-live-atlas-fallback-and-lane-registry-fix.md b/sync/history/2026-05-09-magatama-live-atlas-fallback-and-lane-registry-fix.md new file mode 100644 index 0000000..ab4267c --- /dev/null +++ b/sync/history/2026-05-09-magatama-live-atlas-fallback-and-lane-registry-fix.md @@ -0,0 +1,79 @@ +# 2026-05-09 — MAGATAMA live Atlas fallback and lane registry fix + +## Summary + +Two remaining truthfulness gaps were closed: + +1. Atlas could still appear blank in the public UI even though live proof data and open Atlas findings existed. +2. The per-lane training registry on Erik still exposed `null` metadata, even though lane manifests and training-run state existed. + +## Atlas + +### Live truth after fix + +- Public `protection-proof` now shows non-zero Atlas state again: + - `knownAssets: 57` + - `hostsWithTelemetry: 22` + - `assetsWithoutTelemetry: 35` + - `auditedHosts: 3` + - `queueBlocked: 28` +- Public findings API again shows open `atlas-coverage-gap` findings. + +### Technical fix + +- `packages/dashboard/public/index-v2.html` + - added `deriveProofFromAtlasSnapshot(...)` + - if the live/cached proof is empty or stale while the Atlas snapshot is still useful, the UI now synthesizes a fallback proof model from the snapshot + - result: Atlas top cards and sections no longer render as misleading blanks + +## Lane registry + +### Technical fix + +- `scripts/model_registry_build.ts` + - now also reads: + - `training-data/model-registry/training-runs.json` + - `training-data/runpod//manifest.json` + - compiled lane output now includes: + - `activeModel` + - `version` + - `lastTrainingAt` + - `lastRunId` + - `lastRunStatus` + - `datasetSource` + - `collectionsPath` + - lane runpod counts + +### Live Erik verification + +- `magatamallm` + - `activeModel: magatama-coder:latest` + - `lastRunStatus: completed_without_model_artifact` + - `datasetSource: url` + - `collectionsPath: /opt/magatama/training-data/runpod/magatamallm/manifest.json` +- `fo_blogllm` + - `activeModel: fo-blog-v7` + - `lastRunStatus: completed_without_model_artifact` +- `tip_llm` + - `activeModel: tip-llm-v1` + - `lastRunStatus: completed_without_model_artifact` + +## Training reality + +The managed RunPod Axolotl endpoint still does not return an adoptable model artifact. MAGATAMA is now honest about that: + +- jobs can reach `COMPLETED` +- but the resulting lane registry and run registry record: + - `completed_without_model_artifact` +- therefore: + - no version bump is treated as successful + - no Ollama alias switch is performed + +## Meaning + +The frontend is now much harder to fool: + +- Atlas no longer looks empty when only the proof route is stale +- lane registry metadata no longer collapses to all-`null` +- training state remains explicit about the real remaining blocker: + - custom RunPod worker still required for fully automatic artifact return and adoption