sync: record magatama atlas fallback and lane registry fix

This commit is contained in:
Rene Fichtmueller 2026-05-09 09:38:22 +02:00
parent 549b4430df
commit d588a20a54
2 changed files with 152 additions and 5 deletions

View File

@ -54,8 +54,8 @@ Updated: 2026-05-09 07:34 UTC
- HTML product-like rows: `626`
- price verified: `626`
- image verified: `622`
- details verified: `624`
- price+image+details verified: `620`
- details verified: `626`
- price+image+details verified: `622`
- fully verified: `620`
- filter/category rows with no verification: `108`
- other non-product/generic rows with no verification: `10`
@ -74,9 +74,9 @@ Updated: 2026-05-09 07:34 UTC
- remaining truth:
- active/product-like Flexoptix rows are much closer to complete
- not all `744` Flexoptix rows can honestly be 100% verified because `118` are filter/category/generic/non-product URLs rather than concrete product pages
- remaining HTML product-like gaps observed before SSH became unavailable:
- `4` product-like rows without image verification
- `2` FLEXBOX/accessory-like rows without reach/details
- remaining HTML product-like gaps after final source check:
- `4` product-like rows without image verification because Flexoptix exposes only `placeholder-flexoptix.jpg` as `og:image`
- `2` FLEXBOX/accessory-like rows were classified as `Accessory`, `reach_label=N/A`, `details_verified=true`
- operational note:
- Erik SSH became unavailable with `connection refused` after the last verification checks
- public TIP HTTPS still responded through Cloudflare
@ -1314,6 +1314,74 @@ There are existing uncommitted changes outside `sync/`. Some are Codex work from
- `6c42ca7 docs: add shared agent sync handoff`
- `8e7c5aa docs: link llm-gateway sync handoff`
- `bba48d3 sync: record magatama atlas rematerialization fix`
- `fd29bee sync: record magatama atlas fallback and port detail live fixes`
- `8b42077 sync: refresh cross-agent chat handoff`
- Pending after this update:
- watch whether any future guard exposure findings are genuine operational issues or new false positives.
- if failures still appear inside `fixes.jsonl`, scrub historic pollution and backfill `errors.jsonl`.
## 2026-05-09 Addendum — Live Atlas + Lane Registry Truth
### Atlas / Findings
- MAGATAMA Atlas was not actually empty; the public UI could still look blank while live proof data already showed:
- `knownAssets: 57`
- `hostsWithTelemetry: 22`
- `assetsWithoutTelemetry: 35`
- `auditedHosts: 3`
- `queueBlocked: 28`
- Root causes fixed live:
1. `packages/core/src/routes/health-builders.ts`
- Atlas audits / exposure now rematerialize operational findings before proof rendering.
2. `packages/core/src/scheduler.ts`
- generic stale auto-resolve no longer auto-closes:
- `atlas-coverage-gap`
- `atlas-exposure`
- `atlas-host-audit`
3. `packages/dashboard/public/index-v2.html`
- if proof data is temporarily empty or stale, Atlas now derives a fallback proof model from the current snapshot so the top cards do not render as blank.
- Live public verification after deploy:
- `/api/protection-proof` shows non-zero Atlas truth again.
- `/api/findings?limit=10` shows open `atlas-coverage-gap` findings again.
### Training / Lane Registry
- The public training status is now honest for the current live state:
- `magatamallm`
- `datasetSource: url`
- `collectionsPath: /opt/magatama/training-data/runpod/magatamallm/manifest.json`
- `15679 train`
- `1743 eval`
- `17422 total`
- `lastRegistryRunStatus: completed_without_model_artifact`
- `fo_blogllm`
- lane registry rebuilt on Erik
- `lastRunStatus: completed_without_model_artifact`
- `tip_llm`
- lane registry rebuilt on Erik
- `lastRunStatus: completed_without_model_artifact`
- `scripts/model_registry_build.ts` now compiles per-lane metadata from:
- lane datasets
- lane RunPod manifests
- `training-runs.json`
- Live compiled registry on Erik now no longer sits at all-`null`; it exposes:
- `activeModel`
- `version`
- `lastRunId`
- `lastRunStatus`
- `datasetSource`
- `collectionsPath`
### Still Outstanding
- Full automatic training is still blocked by the managed RunPod Axolotl endpoint:
- jobs reach `COMPLETED`
- but no adoptable artifact is returned
- therefore MAGATAMA correctly records:
- `completed_without_model_artifact`
- That means:
- no new model version can be truthfully activated yet
- no Ollama alias switch should happen yet
- Remaining real blocker:
- move to `custom-magatama` RunPod worker with explicit adapter/model artifact publication.

View File

@ -0,0 +1,79 @@
# 2026-05-09 — MAGATAMA live Atlas fallback and lane registry fix
## Summary
Two remaining truthfulness gaps were closed:
1. Atlas could still appear blank in the public UI even though live proof data and open Atlas findings existed.
2. The per-lane training registry on Erik still exposed `null` metadata, even though lane manifests and training-run state existed.
## Atlas
### Live truth after fix
- Public `protection-proof` now shows non-zero Atlas state again:
- `knownAssets: 57`
- `hostsWithTelemetry: 22`
- `assetsWithoutTelemetry: 35`
- `auditedHosts: 3`
- `queueBlocked: 28`
- Public findings API again shows open `atlas-coverage-gap` findings.
### Technical fix
- `packages/dashboard/public/index-v2.html`
- added `deriveProofFromAtlasSnapshot(...)`
- if the live/cached proof is empty or stale while the Atlas snapshot is still useful, the UI now synthesizes a fallback proof model from the snapshot
- result: Atlas top cards and sections no longer render as misleading blanks
## Lane registry
### Technical fix
- `scripts/model_registry_build.ts`
- now also reads:
- `training-data/model-registry/training-runs.json`
- `training-data/runpod/<lane>/manifest.json`
- compiled lane output now includes:
- `activeModel`
- `version`
- `lastTrainingAt`
- `lastRunId`
- `lastRunStatus`
- `datasetSource`
- `collectionsPath`
- lane runpod counts
### Live Erik verification
- `magatamallm`
- `activeModel: magatama-coder:latest`
- `lastRunStatus: completed_without_model_artifact`
- `datasetSource: url`
- `collectionsPath: /opt/magatama/training-data/runpod/magatamallm/manifest.json`
- `fo_blogllm`
- `activeModel: fo-blog-v7`
- `lastRunStatus: completed_without_model_artifact`
- `tip_llm`
- `activeModel: tip-llm-v1`
- `lastRunStatus: completed_without_model_artifact`
## Training reality
The managed RunPod Axolotl endpoint still does not return an adoptable model artifact. MAGATAMA is now honest about that:
- jobs can reach `COMPLETED`
- but the resulting lane registry and run registry record:
- `completed_without_model_artifact`
- therefore:
- no version bump is treated as successful
- no Ollama alias switch is performed
## Meaning
The frontend is now much harder to fool:
- Atlas no longer looks empty when only the proof route is stale
- lane registry metadata no longer collapses to all-`null`
- training state remains explicit about the real remaining blocker:
- custom RunPod worker still required for fully automatic artifact return and adoption