sync: record magatama atlas fallback and lane registry fix
This commit is contained in:
parent
549b4430df
commit
d588a20a54
@ -54,8 +54,8 @@ Updated: 2026-05-09 07:34 UTC
|
||||
- HTML product-like rows: `626`
|
||||
- price verified: `626`
|
||||
- image verified: `622`
|
||||
- details verified: `624`
|
||||
- price+image+details verified: `620`
|
||||
- details verified: `626`
|
||||
- price+image+details verified: `622`
|
||||
- fully verified: `620`
|
||||
- filter/category rows with no verification: `108`
|
||||
- other non-product/generic rows with no verification: `10`
|
||||
@ -74,9 +74,9 @@ Updated: 2026-05-09 07:34 UTC
|
||||
- remaining truth:
|
||||
- active/product-like Flexoptix rows are much closer to complete
|
||||
- not all `744` Flexoptix rows can honestly be 100% verified because `118` are filter/category/generic/non-product URLs rather than concrete product pages
|
||||
- remaining HTML product-like gaps observed before SSH became unavailable:
|
||||
- `4` product-like rows without image verification
|
||||
- `2` FLEXBOX/accessory-like rows without reach/details
|
||||
- remaining HTML product-like gaps after final source check:
|
||||
- `4` product-like rows without image verification because Flexoptix exposes only `placeholder-flexoptix.jpg` as `og:image`
|
||||
- `2` FLEXBOX/accessory-like rows were classified as `Accessory`, `reach_label=N/A`, `details_verified=true`
|
||||
- operational note:
|
||||
- Erik SSH became unavailable with `connection refused` after the last verification checks
|
||||
- public TIP HTTPS still responded through Cloudflare
|
||||
@ -1314,6 +1314,74 @@ There are existing uncommitted changes outside `sync/`. Some are Codex work from
|
||||
|
||||
- `6c42ca7 docs: add shared agent sync handoff`
|
||||
- `8e7c5aa docs: link llm-gateway sync handoff`
|
||||
- `bba48d3 sync: record magatama atlas rematerialization fix`
|
||||
- `fd29bee sync: record magatama atlas fallback and port detail live fixes`
|
||||
- `8b42077 sync: refresh cross-agent chat handoff`
|
||||
- Pending after this update:
|
||||
- watch whether any future guard exposure findings are genuine operational issues or new false positives.
|
||||
- if failures still appear inside `fixes.jsonl`, scrub historic pollution and backfill `errors.jsonl`.
|
||||
|
||||
## 2026-05-09 Addendum — Live Atlas + Lane Registry Truth
|
||||
|
||||
### Atlas / Findings
|
||||
|
||||
- MAGATAMA Atlas was not actually empty; the public UI could still look blank while live proof data already showed:
|
||||
- `knownAssets: 57`
|
||||
- `hostsWithTelemetry: 22`
|
||||
- `assetsWithoutTelemetry: 35`
|
||||
- `auditedHosts: 3`
|
||||
- `queueBlocked: 28`
|
||||
- Root causes fixed live:
|
||||
1. `packages/core/src/routes/health-builders.ts`
|
||||
- Atlas audits / exposure now rematerialize operational findings before proof rendering.
|
||||
2. `packages/core/src/scheduler.ts`
|
||||
- generic stale auto-resolve no longer auto-closes:
|
||||
- `atlas-coverage-gap`
|
||||
- `atlas-exposure`
|
||||
- `atlas-host-audit`
|
||||
3. `packages/dashboard/public/index-v2.html`
|
||||
- if proof data is temporarily empty or stale, Atlas now derives a fallback proof model from the current snapshot so the top cards do not render as blank.
|
||||
- Live public verification after deploy:
|
||||
- `/api/protection-proof` shows non-zero Atlas truth again.
|
||||
- `/api/findings?limit=10` shows open `atlas-coverage-gap` findings again.
|
||||
|
||||
### Training / Lane Registry
|
||||
|
||||
- The public training status is now honest for the current live state:
|
||||
- `magatamallm`
|
||||
- `datasetSource: url`
|
||||
- `collectionsPath: /opt/magatama/training-data/runpod/magatamallm/manifest.json`
|
||||
- `15679 train`
|
||||
- `1743 eval`
|
||||
- `17422 total`
|
||||
- `lastRegistryRunStatus: completed_without_model_artifact`
|
||||
- `fo_blogllm`
|
||||
- lane registry rebuilt on Erik
|
||||
- `lastRunStatus: completed_without_model_artifact`
|
||||
- `tip_llm`
|
||||
- lane registry rebuilt on Erik
|
||||
- `lastRunStatus: completed_without_model_artifact`
|
||||
- `scripts/model_registry_build.ts` now compiles per-lane metadata from:
|
||||
- lane datasets
|
||||
- lane RunPod manifests
|
||||
- `training-runs.json`
|
||||
- Live compiled registry on Erik now no longer sits at all-`null`; it exposes:
|
||||
- `activeModel`
|
||||
- `version`
|
||||
- `lastRunId`
|
||||
- `lastRunStatus`
|
||||
- `datasetSource`
|
||||
- `collectionsPath`
|
||||
|
||||
### Still Outstanding
|
||||
|
||||
- Full automatic training is still blocked by the managed RunPod Axolotl endpoint:
|
||||
- jobs reach `COMPLETED`
|
||||
- but no adoptable artifact is returned
|
||||
- therefore MAGATAMA correctly records:
|
||||
- `completed_without_model_artifact`
|
||||
- That means:
|
||||
- no new model version can be truthfully activated yet
|
||||
- no Ollama alias switch should happen yet
|
||||
- Remaining real blocker:
|
||||
- move to `custom-magatama` RunPod worker with explicit adapter/model artifact publication.
|
||||
|
||||
@ -0,0 +1,79 @@
|
||||
# 2026-05-09 — MAGATAMA live Atlas fallback and lane registry fix
|
||||
|
||||
## Summary
|
||||
|
||||
Two remaining truthfulness gaps were closed:
|
||||
|
||||
1. Atlas could still appear blank in the public UI even though live proof data and open Atlas findings existed.
|
||||
2. The per-lane training registry on Erik still exposed `null` metadata, even though lane manifests and training-run state existed.
|
||||
|
||||
## Atlas
|
||||
|
||||
### Live truth after fix
|
||||
|
||||
- Public `protection-proof` now shows non-zero Atlas state again:
|
||||
- `knownAssets: 57`
|
||||
- `hostsWithTelemetry: 22`
|
||||
- `assetsWithoutTelemetry: 35`
|
||||
- `auditedHosts: 3`
|
||||
- `queueBlocked: 28`
|
||||
- Public findings API again shows open `atlas-coverage-gap` findings.
|
||||
|
||||
### Technical fix
|
||||
|
||||
- `packages/dashboard/public/index-v2.html`
|
||||
- added `deriveProofFromAtlasSnapshot(...)`
|
||||
- if the live/cached proof is empty or stale while the Atlas snapshot is still useful, the UI now synthesizes a fallback proof model from the snapshot
|
||||
- result: Atlas top cards and sections no longer render as misleading blanks
|
||||
|
||||
## Lane registry
|
||||
|
||||
### Technical fix
|
||||
|
||||
- `scripts/model_registry_build.ts`
|
||||
- now also reads:
|
||||
- `training-data/model-registry/training-runs.json`
|
||||
- `training-data/runpod/<lane>/manifest.json`
|
||||
- compiled lane output now includes:
|
||||
- `activeModel`
|
||||
- `version`
|
||||
- `lastTrainingAt`
|
||||
- `lastRunId`
|
||||
- `lastRunStatus`
|
||||
- `datasetSource`
|
||||
- `collectionsPath`
|
||||
- lane runpod counts
|
||||
|
||||
### Live Erik verification
|
||||
|
||||
- `magatamallm`
|
||||
- `activeModel: magatama-coder:latest`
|
||||
- `lastRunStatus: completed_without_model_artifact`
|
||||
- `datasetSource: url`
|
||||
- `collectionsPath: /opt/magatama/training-data/runpod/magatamallm/manifest.json`
|
||||
- `fo_blogllm`
|
||||
- `activeModel: fo-blog-v7`
|
||||
- `lastRunStatus: completed_without_model_artifact`
|
||||
- `tip_llm`
|
||||
- `activeModel: tip-llm-v1`
|
||||
- `lastRunStatus: completed_without_model_artifact`
|
||||
|
||||
## Training reality
|
||||
|
||||
The managed RunPod Axolotl endpoint still does not return an adoptable model artifact. MAGATAMA is now honest about that:
|
||||
|
||||
- jobs can reach `COMPLETED`
|
||||
- but the resulting lane registry and run registry record:
|
||||
- `completed_without_model_artifact`
|
||||
- therefore:
|
||||
- no version bump is treated as successful
|
||||
- no Ollama alias switch is performed
|
||||
|
||||
## Meaning
|
||||
|
||||
The frontend is now much harder to fool:
|
||||
|
||||
- Atlas no longer looks empty when only the proof route is stale
|
||||
- lane registry metadata no longer collapses to all-`null`
|
||||
- training state remains explicit about the real remaining blocker:
|
||||
- custom RunPod worker still required for fully automatic artifact return and adoption
|
||||
Loading…
x
Reference in New Issue
Block a user