diff --git a/sync/CURRENT.md b/sync/CURRENT.md index 475c1f4..6f10387 100644 --- a/sync/CURRENT.md +++ b/sync/CURRENT.md @@ -1,9 +1,36 @@ # Current TIP Sync State -Updated: 2026-05-09 23:38 UTC +Updated: 2026-05-10 02:58 UTC ## Newest Work +- MAGATAMA all-lane RunPod training completion on 2026-05-10: + - RunPod training/adoption is now verified end-to-end for all five active MAGATAMA LLM lanes: + - `magatamallm`: active `magatama-coder:latest`, model version `magatama-coder-r2`, dataset `1375 train / 153 eval / 1528 total` + - `fo_blogllm`: active `fo-blog-v8`, model version `fo-blog-v8-r2`, dataset `17342 train / 1929 eval / 19271 total` + - `tip_llm`: active `tip-llm-v2`, model version `tip-llm-v2-r2`, dataset `276 train / 31 eval / 307 total` + - `pulso_llm`: active `pulso-llm-v1`, model version `pulso-llm-v1-r1`, dataset `28 train / 5 eval / 33 total` + - `contact_llm`: active `contact-llm-v1`, model version `contact-llm-v1-r1`, dataset `18 train / 4 eval / 22 total` + - strict adoption rule is now validated in production: + - RunPod `COMPLETED` alone is not a success + - success requires uploaded adapter artifact, local Mac adoption, Ollama model registration, smoke tests, registry write, dashboard registry rebuild and active alias switch + - fixed/verified automation behavior: + - local Mac adoption service exposes authenticated adoption reports per lane via `/adoption-report/{lane}` + - dashboard adoption path can recover from transient network/fetch errors by reading the local adoption report + - reconciler can adopt already-completed RunPod jobs when the live SSE path failed after artifact upload + - registry events now include top-level `active_model`, `release_alias`, `model_version`, `version_counter` and `candidate_model` + - resolved concrete failures: + - `pulso_llm` training had succeeded, but old local lane mapping caused `unknown lane: pulso_llm`; Pulso is now adopted and active + - `tip_llm` training succeeded but local adoption failed due low Mac disk space before GGUF conversion; safe obsolete Ollama versions and imported intermediate GGUFs were removed, then TIP was reconciled successfully + - `contact_llm` was still `neverTrained`; it is now trained, adopted and active + - ContactLLM smoke test result: + - `4/5` checks passed + - remaining improvement: provenance prompt should always include source URL, timestamp, confidence and contact type; add this as a next training/eval item + - public Magatama `/api/llm/status?lane=...` checks after dashboard restart show all five lanes as `completed_and_adopted` + - operational note: + - keep enough Mac free space before another adoption; each new 7B adapter adoption needs merge + GGUF conversion workspace + - obsolete non-active Ollama versions can be removed after verifying active aliases and release aliases exist + - TIP price/source verification closure on 2026-05-10 local / 2026-05-09 UTC: - fixed SFPcables scraper to persist `product_page_url` - added product-page price fallback for SFPcables when listing pages omit price markup diff --git a/sync/history/2026-05-10-magatama-all-lane-runpod-training-complete.md b/sync/history/2026-05-10-magatama-all-lane-runpod-training-complete.md new file mode 100644 index 0000000..83ddfdf --- /dev/null +++ b/sync/history/2026-05-10-magatama-all-lane-runpod-training-complete.md @@ -0,0 +1,48 @@ +# MAGATAMA All-Lane RunPod Training Complete + +Date: 2026-05-10 02:58 UTC + +## Result + +All five MAGATAMA trainable LLM lanes completed a real RunPod training/adoption cycle and are now visible as adopted in the public MAGATAMA status API. + +## Verified Lanes + +- `magatamallm`: active `magatama-coder:latest`, model version `magatama-coder-r2`, `1375 train / 153 eval / 1528 total` +- `fo_blogllm`: active `fo-blog-v8`, model version `fo-blog-v8-r2`, `17342 train / 1929 eval / 19271 total` +- `tip_llm`: active `tip-llm-v2`, model version `tip-llm-v2-r2`, `276 train / 31 eval / 307 total` +- `pulso_llm`: active `pulso-llm-v1`, model version `pulso-llm-v1-r1`, `28 train / 5 eval / 33 total` +- `contact_llm`: active `contact-llm-v1`, model version `contact-llm-v1-r1`, `18 train / 4 eval / 22 total` + +## Fixes Made + +- Added/verified first-class local adoption support for `pulso_llm` and `contact_llm`. +- Added authenticated adoption-report recovery endpoint on the Mac training/adoption service. +- Hardened dashboard adoption flow so transient network/fetch errors can recover from local adoption reports. +- Hardened RunPod reconciler so completed jobs can be adopted after a failed live SSE/browser path. +- Registry success events now include explicit active model, release alias, model version, version counter and candidate model. +- Rebuilt the MAGATAMA model registry and restarted `magatama-dashboard` after successful TIP and Contact adoption. + +## Issues Resolved + +- `pulso_llm` showed `unknown lane: pulso_llm` after RunPod finished; this was a local adoption mapping issue, not a training failure. Pulso is now active. +- `tip_llm` failed local adoption because Mac disk space dropped below the GGUF conversion threshold. Obsolete non-active Ollama versions and already imported intermediate GGUFs were removed, then TIP was reconciled successfully. +- `contact_llm` had never been trained before this block. It now has a first adopted version. + +## Evaluation Notes + +- ContactLLM smoke test passed `4/5`. +- Open improvement: ContactLLM should consistently return provenance fields for public business contacts: source URL, timestamp, confidence and contact type. + +## Operating Rule + +Do not mark RunPod training successful on `COMPLETED` alone. A successful lane run must have: + +- uploaded adapter artifact +- successful local Mac adoption +- Ollama candidate + release alias + active alias +- smoke tests meeting threshold +- registry entry with `completed_and_adopted` +- public MAGATAMA `/api/llm/status?lane=...` showing the new active model/version + +No secrets, tokens or credentials are recorded in this handoff.