sync: record magatama atlas fallback and lane registry fix

2026-05-09 09:38:22 +02:00 · 2026-05-09 09:38:22 +02:00 · d588a20a54
commit d588a20a54
parent 549b4430df
2 changed files with 152 additions and 5 deletions
--- a/sync/CURRENT.md
+++ b/sync/CURRENT.md
@ -54,8 +54,8 @@ Updated: 2026-05-09 07:34 UTC
      - HTML product-like rows: `626`
      - price verified: `626`
      - image verified: `622`
-      - details verified: `624`
-      - price+image+details verified: `620`
+      - details verified: `626`
+      - price+image+details verified: `622`
      - fully verified: `620`
      - filter/category rows with no verification: `108`
      - other non-product/generic rows with no verification: `10`
@ -74,9 +74,9 @@ Updated: 2026-05-09 07:34 UTC
  - remaining truth:
    - active/product-like Flexoptix rows are much closer to complete
    - not all `744` Flexoptix rows can honestly be 100% verified because `118` are filter/category/generic/non-product URLs rather than concrete product pages
-    - remaining HTML product-like gaps observed before SSH became unavailable:
-      - `4` product-like rows without image verification
-      - `2` FLEXBOX/accessory-like rows without reach/details
+    - remaining HTML product-like gaps after final source check:
+      - `4` product-like rows without image verification because Flexoptix exposes only `placeholder-flexoptix.jpg` as `og:image`
+      - `2` FLEXBOX/accessory-like rows were classified as `Accessory`, `reach_label=N/A`, `details_verified=true`
  - operational note:
    - Erik SSH became unavailable with `connection refused` after the last verification checks
    - public TIP HTTPS still responded through Cloudflare
@ -1314,6 +1314,74 @@ There are existing uncommitted changes outside `sync/`. Some are Codex work from

 - `6c42ca7 docs: add shared agent sync handoff`
 - `8e7c5aa docs: link llm-gateway sync handoff`
+- `bba48d3 sync: record magatama atlas rematerialization fix`
+- `fd29bee sync: record magatama atlas fallback and port detail live fixes`
+- `8b42077 sync: refresh cross-agent chat handoff`
 - Pending after this update:
  - watch whether any future guard exposure findings are genuine operational issues or new false positives.
  - if failures still appear inside `fixes.jsonl`, scrub historic pollution and backfill `errors.jsonl`.
+
+## 2026-05-09 Addendum — Live Atlas + Lane Registry Truth
+
+### Atlas / Findings
+
+- MAGATAMA Atlas was not actually empty; the public UI could still look blank while live proof data already showed:
+  - `knownAssets: 57`
+  - `hostsWithTelemetry: 22`
+  - `assetsWithoutTelemetry: 35`
+  - `auditedHosts: 3`
+  - `queueBlocked: 28`
+- Root causes fixed live:
+  1. `packages/core/src/routes/health-builders.ts`
+     - Atlas audits / exposure now rematerialize operational findings before proof rendering.
+  2. `packages/core/src/scheduler.ts`
+     - generic stale auto-resolve no longer auto-closes:
+       - `atlas-coverage-gap`
+       - `atlas-exposure`
+       - `atlas-host-audit`
+  3. `packages/dashboard/public/index-v2.html`
+     - if proof data is temporarily empty or stale, Atlas now derives a fallback proof model from the current snapshot so the top cards do not render as blank.
+- Live public verification after deploy:
+  - `/api/protection-proof` shows non-zero Atlas truth again.
+  - `/api/findings?limit=10` shows open `atlas-coverage-gap` findings again.
+
+### Training / Lane Registry
+
+- The public training status is now honest for the current live state:
+  - `magatamallm`
+    - `datasetSource: url`
+    - `collectionsPath: /opt/magatama/training-data/runpod/magatamallm/manifest.json`
+    - `15679 train`
+    - `1743 eval`
+    - `17422 total`
+    - `lastRegistryRunStatus: completed_without_model_artifact`
+  - `fo_blogllm`
+    - lane registry rebuilt on Erik
+    - `lastRunStatus: completed_without_model_artifact`
+  - `tip_llm`
+    - lane registry rebuilt on Erik
+    - `lastRunStatus: completed_without_model_artifact`
+- `scripts/model_registry_build.ts` now compiles per-lane metadata from:
+  - lane datasets
+  - lane RunPod manifests
+  - `training-runs.json`
+- Live compiled registry on Erik now no longer sits at all-`null`; it exposes:
+  - `activeModel`
+  - `version`
+  - `lastRunId`
+  - `lastRunStatus`
+  - `datasetSource`
+  - `collectionsPath`
+
+### Still Outstanding
+
+- Full automatic training is still blocked by the managed RunPod Axolotl endpoint:
+  - jobs reach `COMPLETED`
+  - but no adoptable artifact is returned
+  - therefore MAGATAMA correctly records:
+    - `completed_without_model_artifact`
+- That means:
+  - no new model version can be truthfully activated yet
+  - no Ollama alias switch should happen yet
+- Remaining real blocker:
+  - move to `custom-magatama` RunPod worker with explicit adapter/model artifact publication.
--- a/sync/history/2026-05-09-magatama-live-atlas-fallback-and-lane-registry-fix.md
+++ b/sync/history/2026-05-09-magatama-live-atlas-fallback-and-lane-registry-fix.md
@ -0,0 +1,79 @@
+# 2026-05-09 — MAGATAMA live Atlas fallback and lane registry fix
+
+## Summary
+
+Two remaining truthfulness gaps were closed:
+
+1. Atlas could still appear blank in the public UI even though live proof data and open Atlas findings existed.
+2. The per-lane training registry on Erik still exposed `null` metadata, even though lane manifests and training-run state existed.
+
+## Atlas
+
+### Live truth after fix
+
+- Public `protection-proof` now shows non-zero Atlas state again:
+  - `knownAssets: 57`
+  - `hostsWithTelemetry: 22`
+  - `assetsWithoutTelemetry: 35`
+  - `auditedHosts: 3`
+  - `queueBlocked: 28`
+- Public findings API again shows open `atlas-coverage-gap` findings.
+
+### Technical fix
+
+- `packages/dashboard/public/index-v2.html`
+  - added `deriveProofFromAtlasSnapshot(...)`
+  - if the live/cached proof is empty or stale while the Atlas snapshot is still useful, the UI now synthesizes a fallback proof model from the snapshot
+  - result: Atlas top cards and sections no longer render as misleading blanks
+
+## Lane registry
+
+### Technical fix
+
+- `scripts/model_registry_build.ts`
+  - now also reads:
+    - `training-data/model-registry/training-runs.json`
+    - `training-data/runpod/<lane>/manifest.json`
+  - compiled lane output now includes:
+    - `activeModel`
+    - `version`
+    - `lastTrainingAt`
+    - `lastRunId`
+    - `lastRunStatus`
+    - `datasetSource`
+    - `collectionsPath`
+    - lane runpod counts
+
+### Live Erik verification
+
+- `magatamallm`
+  - `activeModel: magatama-coder:latest`
+  - `lastRunStatus: completed_without_model_artifact`
+  - `datasetSource: url`
+  - `collectionsPath: /opt/magatama/training-data/runpod/magatamallm/manifest.json`
+- `fo_blogllm`
+  - `activeModel: fo-blog-v7`
+  - `lastRunStatus: completed_without_model_artifact`
+- `tip_llm`
+  - `activeModel: tip-llm-v1`
+  - `lastRunStatus: completed_without_model_artifact`
+
+## Training reality
+
+The managed RunPod Axolotl endpoint still does not return an adoptable model artifact. MAGATAMA is now honest about that:
+
+- jobs can reach `COMPLETED`
+- but the resulting lane registry and run registry record:
+  - `completed_without_model_artifact`
+- therefore:
+  - no version bump is treated as successful
+  - no Ollama alias switch is performed
+
+## Meaning
+
+The frontend is now much harder to fool:
+
+- Atlas no longer looks empty when only the proof route is stale
+- lane registry metadata no longer collapses to all-`null`
+- training state remains explicit about the real remaining blocker:
+  - custom RunPod worker still required for fully automatic artifact return and adoption