# Current TIP Sync State Updated: 2026-05-06 22:55 UTC ## Active Policy - Put coordination notes and handoffs in this `sync/` folder and push to Gitea. - Check sibling project sync folders first when context may span repos. - Use TIPLLM only for TIP crawler/robot planning and extraction feedback. - Write robot/crawler experience into the Gitea-backed TIPLLM training pool. - Keep Erik safe: no heavy crawler waves or uncontrolled Playwright/discovery jobs on Erik. - Use Proxmox/Pi workers for crawl load. ## Cross-Repo Sync Claude Code also created a Gitea sync handoff in the LLM Gateway repo: - Repo: `rene/llm-gateway` - Path: `sync/` - Commit shown by Claude: `e272105 sync: add chat handoff + context scaffolding for Codex integration (2026-04-29)` - Gitea path: `http://192.168.178.196:3000/rene/llm-gateway/src/main/sync/` When work touches TIP, Magatama, LLM Gateway, bridges, auth, or shared Erik infrastructure, read both: - `transceiver-db/sync/CURRENT.md` - `llm-gateway/sync/CURRENT.md` ## Latest Work - MAGATAMA Switchblade port intelligence is now truly flowing end-to-end on 2026-05-06: - live root cause: - Switchblade itself already had the rich SG350 data (`description`, LLDP neighbor, peer port, octets), but MAGATAMA had still shown mostly flat port chips. - verified live on Erik: - the real Switchblade runtime is the PM2 app `switchblade` under `/opt/switchblade-app`, not the older `/opt/switchblade` tree. - `GET http://127.0.0.1:3000/api/discovery/snmp` for `192.168.178.2` already returned rich rows such as: - `GigabitEthernet3` → description `Aruba-1830-UNUSED`, neighbor `VN46KYC0G0`, peer port `11` - `GigabitEthernet5` → description `Tashi-204`, neighbor `fritz.box`, peer `LAN:1` - `GigabitEthernet25` → description `to Cisco Business 220 Series`, neighbor `Switch39688E`, peer `gi9` - the remaining loss point was MAGATAMA’s own Switchblade sync/persistence path. - MAGATAMA sync hardening: - `scripts/switchblade_live_sync.ts` - now prefers live SNMP discovery data when it is richer than `/api/devices/` - now maps `description`, `peerDevice`, `peerPort`, `connectedHost`, `inOctets`, `outOctets` into rack device ports - added optional debug snapshot dump support via `SWITCHBLADE_DEBUG_SNAPSHOT_FILE` - sanitizes unreadable peer-port strings and drops synthetic high-index numeric pseudo-ports - verified with a forced live run on Erik: - `Top of Rack Switch` now exports `28` real SG350 ports into the rack snapshot instead of the earlier flattened/odd set - sample verified payloads before POST: - port 3 → `Aruba-1830-UNUSED` / `VN46KYC0G0` / `11` - port 5 → `Tashi-204` / `fritz.box` / `LAN:1` - port 25 → `to Cisco Business 220 Series` / `Switch39688E` / `gi9` - MAGATAMA core hardening: - `packages/core/src/routes/health-types.ts` - `SwitchbladePortSnapshot` now preserves: - `description` - `vlan` - `macCount` - `peerDevice` - `peerPort` - `connectedHost` - `transceiver` - `inOctets` - `outOctets` - `packages/core/src/routes/health-support.ts` - `normalizeSwitchbladePort()` now keeps those additional port fields instead of silently truncating them - rebuilt locally and re-rsynced the new `packages/core/dist` to Erik - dashboard/UI hardening: - `packages/dashboard/public/index-v2.html` - port chips already had custom tooltip support; now they also carry native `title=` fallback text - this reduces the old “question mark / unclear hover” problem in browsers that do not immediately show the custom bubble - live public verification after deploy: - `GET https://magatama.fichtmueller.org/api/switchblade/snapshot` - now contains enriched SG350 rack-port records with: - `description` - `peerDevice` - `peerPort` - `connectedHost` - `inOctets` - `outOctets` - public snapshot timestamp verified: - `receivedAt = 2026-05-06T22:51:59.247Z` - `Top of Rack Switch` in the public snapshot now exposes meaningful peer/use-case data instead of only flat status counters - operator impact: - MAGATAMA can now answer the actual operational question per port: - what is on this port - what is it talking to - what does the link look like - this is now grounded in Switchblade live SNMP/LLDP data, not guesswork. - TIP/Blog lane separation was materially corrected on 2026-05-06: - root cause: - `TIP_LLM` was still ingesting blog-/writer-shaped rows from the canonical lane pool and shared transceiver corpora. - local inspection showed the old TIP export had `6250` train rows, of which `6087` still matched blog/writer patterns. - dataset builder and Gitea sync were hardened: - `scripts/runpod_dataset_builder.ts` - added strict `tipDatasetAllowed(...)` - `TIP_LLM` now rejects blog-shaped source rows at dataset-build time - `TIP_LLM` now rejects blog-like `system`, `user`, and markdown-article `assistant` patterns - registry fallback for `TIP_LLM` now only uses lane-compatible datasets - `scripts/sync_gitea_training_pool.ts` - canonical TIP pool refresh now uses the stricter lane-alignment rules - redundant `merged.jsonl` copies for `fo_blogllm` and `tip_llm` are no longer rewritten, to avoid local disk exhaustion from duplicate lane artifacts - local disk issue encountered and fixed: - full refresh failed with `ENOSPC` while writing `training-data/gitea-learning-pool/tip_llm/merged.jsonl` - redundant lane `merged` artifacts for `fo_blogllm` and `tip_llm` were truncated and the sync script was changed to stop recreating them - free disk space returned from `377Mi` to `17Gi` - locally verified after rebuild: - `TIP_LLM` RunPod export: - `train = 233` - `eval = 26` - `total = 259` - `blog/writer matches = 0` - first TIP rows now use the correct TIP system prompt: - `You are TIP_LLM, a research and market-intelligence analyst for transceivers, switches, and vendor ecosystems...` - corrected artifacts and scripts were synced to Erik and `pnpm training:refresh-all` was rerun there. - live verified on Erik/public API: - `magatamallm` - `datasetSource = url` - `collectedExamples = 15679` - `evalExamples = 1743` - `totalExamples = 17422` - `newSinceLastTraining = 15679` - `fo_blogllm` - `datasetSource = url` - `collectedExamples = 17322` - `evalExamples = 1926` - `totalExamples = 19254` - `neverTrained = true` - `tip_llm` - `datasetSource = url` - `collectedExamples = 231` - `evalExamples = 26` - `totalExamples = 257` - `neverTrained = true` - operational conclusion: - lane-specific dataset truth is now real on Erik. - `TIP_LLM` is no longer silently borrowing the FO_Blog behavior lane. - the next remaining hard problem is now RunPod artifact adoption/validation, not lane contamination. - MAGATAMA frontend/runtime consistency was repaired again on 2026-05-06: - dashboard and core were rebuilt locally and redeployed to Erik. - live processes restarted successfully: - `magatama-dashboard` - `magatama` - public `api/llm/status` now shows the true lane-export totals for `magatamallm`: - `collectedExamples = 15620` - `effectiveExamples = 15620` - `evalExamples = 1736` - `totalExamples = 17356` - `newSinceLastTraining = 15620` - root cause for the stale `1097` display: - the RunPod start SSE path still logged the legacy deduplicated `fixes.jsonl` corpus. - this was changed so RunPod launches no longer present the legacy `1097` count as the active training truth. - after dataset refresh the UI now emits the lane manifest totals instead. - RunPod completion handling was hardened: - worker `COMPLETED` is no longer trusted blindly. - MAGATAMA now scans RunPod worker logs for real training failures (`Traceback`, `SyntaxError`, non-zero exit, etc.) before treating the run as successful. - if the worker logs show a hidden failure, MAGATAMA records this as `completed_with_worker_failure` instead of pretending the run succeeded. - public findings state remains currently empty: - `GET /api/findings?limit=1` returned `{"findings":[],"total":0}` - this is now rendered with an explicit empty-state row instead of a visually blank table. - Attack Paths empty-state is now intentionally explicit rather than looking broken. - Frontend cache and scope handling were hardened: - cache version bumped to `2026-05-06b` - stale legacy `magatama_api_cache:*` entries are cleared - per-endpoint TTLs added - invalid or empty scope selections are normalized instead of silently leaving the UI in misleading empty views - Switchblade rack port hover was materially improved: - port chips now carry `data-tooltip` - custom tooltip CSS is live on Erik - the old browser-native “question mark only” behavior should be replaced by a readable hover bubble - Changelog self-healing was added in core: - stale cached changelog data older than 6h now forces a rebuild from git history - verified live via dashboard proxy on Erik: - `generatedAt = 2026-05-06T15:18:42.708Z` - latest visible entries include `2026-04-30` items again instead of appearing frozen at `30.05` - MAGATAMA lane-specific training pools and RunPod dataset automation were finished on 2026-05-06: - root cause: - the training modal always fetched `/api/llm/status` without a lane, so `FO_BlogLLM` and `TIP_LLM` still showed the `magatamallm` pool. - dashboard/server were updated so `/api/llm/status?lane=...` is now truly lane-aware. - the training modal now refreshes per selected lane and rewrites: - title - runtime label - pool path - counts - dataset source - MAGATAMA dashboard env on Erik was switched to URL dataset mode for all lanes via `ecosystem.config.cjs`: - `RUNPOD_DATASET_SOURCE=url` - `RUNPOD_DATASET_SOURCE_MAGATAMALLM=url` - `RUNPOD_DATASET_SOURCE_FO_BLOGLLM=url` - `RUNPOD_DATASET_SOURCE_TIP_LLM=url` - live verified on Erik after restart: - `fo_blogllm` - `datasetSource = url` - `collectionsPath = /opt/magatama/training-data/runpod/fo_blogllm/manifest.json` - `train = 28` - `eval = 4` - `total = 32` - `tip_llm` - `datasetSource = url` - `collectionsPath = /opt/magatama/training-data/runpod/tip_llm/manifest.json` - `train = 36` - `eval = 4` - `total = 40` - `magatamallm` - remains on lane-export counts (`15620 / 1736 / 17356`) - operator impact: - no Hugging Face dataset publish is required anymore for MAGATAMA RunPod launches. - every supported LLM lane now points to its own local/Gitea-backed lane export instead of reusing `magatamallm`. - MAGATAMA training + Attack Paths + Atlas exposure were corrected again on 2026-05-06: - the RunPod serverless training start failure was not a RunPod outage. - root cause was missing training scripts on Erik (`training_full_refresh.ts` and related helpers were absent under `/opt/magatama/scripts`). - Codex synced the full local `magatama/scripts/` tree to Erik, added a safe fallback in `scripts/model_registry_build.ts`, and synced the local `training-data/model-registry/` directory. - verified on Erik: - `pnpm training:refresh-all` now succeeds. - fresh dataset totals after dedupe: - `magatamallm`: `92,742` raw → `17,356` effective (`15,620 train / 1,736 eval`) - `fo_blogllm`: `32` total (`28 train / 4 eval`) - `tip_llm`: `40` total (`36 train / 4 eval`) - important nuance: - Codex did **not** execute the final Hugging Face publish step from Erik in this chat. - local/script/build failures are fixed; external dataset publish still depends on the selected dataset source and explicit publish intent. - MAGATAMA Attack Paths UX is no longer a misleading blank panel: - the page now distinguishes between: - no live attack paths - historical fallback paths - empty selected scope (`0 assets in scope`) - when a user narrows the scope to a rack/location with zero scoped assets, the graph explicitly says so instead of looking broken. - live dashboard HTML on Erik now contains: - `Im aktuellen Scope liegen 0 Assets.` - `Erweitere Standort oder Datacenter / Rack, damit MAGATAMA korrelierbare Assets und Pfade darstellen kann.` - `Ohne offene mehrstufige Korrelationen bleibt die Graph-Sicht bewusst leer.` - MAGATAMA code/training hardening was extended: - `scripts/test_runpod_adapter.py` no longer loads tokenizer/model with `trust_remote_code=True`. - `scripts/ollama_adapter_bridge.py` no longer loads tokenizer/model with `trust_remote_code=True`. - this removed the live CODE finding around `HuggingFace trust_remote_code` on Erik. - Atlas exposure logic was tightened to stop reopening noisy LAN management findings: - generic `atlas-exposure` findings now only stay operationally open for exposure that is meaningful enough to track as a finding. - internal RFC1918 management/service ports discovered by the broad atlas scan are no longer promoted into open Guard findings just because they exist on the LAN. - host-specific posture for Proxmox / Erik / Mac Studio remains the job of explicit host-audit logic. - after rebuild + deploy + health sync: - live Postgres open findings returned to `0`. - Follow-up hardening on the same block: - the earlier RunPod error path in MAGATAMA dashboard was made more truthful. - dataset preparation now distinguishes: - local `training:refresh-all` failure - optional Hugging Face publish failure - URL-based dataset mode with no external publish required - the training SSE flow now explicitly tells the operator whether RunPod is using: - Hugging Face dataset source - or MAGATAMA URL-bundle dataset source - this avoids misleading `RunPod not reachable` wording when the actual failure is in dataset preparation. - follow-up serverless verification on 2026-05-06 narrowed the remaining fault further: - MAGATAMA submit logic now verifies that a RunPod job really exists under `/status/{jobId}` instead of trusting `/run`. - payloads were aligned more closely with the official Axolotl serverless schema: - `model_type=AutoModelForCausalLM` - `tokenizer_type=AutoTokenizer` - dataset `split: train` - optimizer `adamw_torch_fused` - verified full run attempt: - job id `9bc4b16b-755b-465b-aadf-b46f2fe467a3-e2` - disappeared as `not_found_after_submit` (`404 job not found`) - verified canary after payload fix: - job id `a4ac6951-7ed7-43cb-80d8-5ab61533c2da-e2` - immediately materialized as `IN_QUEUE` - then still disappeared on later reconcile as `not_found_after_submit` - current conclusion: - the old MAGATAMA bug is fixed. - the remaining problem is now likely on the RunPod endpoint/release side: jobs are accepted and briefly queued, but do not survive long enough to produce a durable serverless status lifecycle. - operational rule: - do not treat `submitted` or a brief `IN_QUEUE` as proof of a usable serverless training run. - only trust the run once it reaches `IN_PROGRESS` or a durable terminal state with artifact evidence. - follow-up training count fix on 2026-05-06 corrected the Training UI source-of-truth: - MAGATAMA had still shown `1097` because the dashboard was counting the legacy deduplicated fix corpus instead of the current lane-specific RunPod export. - dashboard now prefers `training-data/runpod/magatamallm/manifest.json` for the visible MagatamaLLM training count. - synced current lane export to Erik and restarted `magatama-dashboard`. - verified public API now returns: - `collectedExamples = 1367` - `effectiveExamples = 1367` - `evalExamples = 152` - `totalExamples = 1519` - `newSinceLastTraining = 1367` - if the browser still shows `1097`, treat it as stale cached UI and hard reload. - MAGATAMA was repaired end-to-end to a clean operational baseline: - live guard host-audits for Erik, Mac Studio, and Proxmox were corrected and rerun. - open findings were reduced all the way to `0` in Postgres. - false-positive Proxmox baseline findings were removed by teaching the audit to treat internal-only management ports and default-only rpcbind exposure as acceptable for this host. - code scanner false positives from generated/report artifacts remain excluded. - Live MAGATAMA protection/runtime state after the 2026-05-06 remediation: - `open findings: 0` - `queueExecuting: 0` - `queueBlocked: 0` - `queueFailed: 0` - public `/api/health` returns `status: ok` - public `/api/active-resolvers` returns: - `MAGATAMA Core: working` - `MagatamaLLM: working` - `Claude (secondary): working` - `Codex (secondary/manual): idle` - `Copilot (secondary/manual): idle` - Important resolver truth fix on 2026-05-06: - live `codex_enabled=false` in MAGATAMA settings was causing Codex to show as a broken resolver. - dashboard logic was updated so disabled Codex/Copilot now show truthfully as `idle` with `In MAGATAMA settings disabled`, instead of pretending there is a runtime outage. - the local codex bridge on Erik is reachable but currently reports `auth_required`; do not treat that as a production outage while Codex is intentionally disabled in settings. - Remaining real operational gap after findings hit zero: - MAGATAMA still knows more assets than it actively telemeters. - last public protection proof showed: - `knownAssets: 79` - `hostsWithTelemetry: 27` - `assetsWithoutTelemetry: 52` - these are currently inventory/discovery-only assets, not open findings, but they remain the next real coverage expansion area. - MAGATAMA cross-repo state from the same chat is now synced into this handoff: - Compliance framework cards in MAGATAMA are clickable and open per-framework requirement details. - MAGATAMA training status was corrected so `New Since Last Training` no longer falsely shows `0`. - Live verified/deduped MAGATAMA training state after the fix: - `collectedExamples: 49` - `rawExamples: 58` - `duplicateExamples: 9` - `effectiveExamples: 49` - `newSinceLastTraining: 49` - MAGATAMA now filters training metrics to verified/trainable examples only. - Failed/escalated MAGATAMA remediation records should go to `errors.jsonl`, not the main `fixes.jsonl`, so the next MagatamaLLM run does not train on junk. - Gitea-backed training pool remains the default target for training writes. - MAGATAMA coverage-gap and training-integrity hardening on 2026-05-06: - the earlier `49` medium `atlas-coverage-gap` findings were traced to Atlas treating inventory-only and discovery-only assets as operational protection failures. - core logic was tightened so Atlas coverage findings now open only for managed operational assets: - exposure-backed assets - explicit non-auto owner - configured telemetry expectation - critical/high criticality - infrastructure metadata or managed infra device types - loopback and passive reference/inventory assets no longer reopen noisy guard findings. - local build succeeded, the new core dist was deployed to Erik, and the first post-deploy guard scan resolved stale findings. - live Postgres state after deploy: `open findings = 0`. - training integrity bug was fixed in `packages/core/src/learning/fix-tracking.ts`: - verified fixes now append to `training-data/gitea-learning-pool/magatamallm/fixes.jsonl` - failed/escalated/report-only runs now belong in `errors.jsonl` - two explicit Codex-written training entries were appended to the MAGATAMA Gitea-backed fixes corpus: - atlas coverage scope hardening - training path integrity fix - corpus cleanup + dedupe was executed afterward: - pre-dedupe backup kept locally as: - `magatama/training-data/gitea-learning-pool/magatamallm/fixes-pre-dedupe-20260506.jsonl` - resulting verified corpus: - `fixes.jsonl = 1,368` unique verified training rows - resulting failure corpus: - `errors.jsonl = 4` tracked failed/escalated rows - integrity report now exists at: - `magatama/training-data/gitea-learning-pool/magatamallm/corpus-integrity-report.json` - latest integrity totals: - `scanned: 1368` - `verified: 1368` - `movedToErrors: 4` - `parseErrors: 0` - `invalidVerifiedFlag: 0` - Complete Codex chat sync was added: - `sync/history/2026-04-29-codex-complete-chat-sync.md` - captures Ghost/blog updates, LinkedIn voice preferences, LPO/AI-fabric blog edits, Rest-Is-Not-Laziness scheduling replacement, and security notes. - confirms no secrets were written into sync. - confirms TIP crawler/robot planning remains TIPLLM-only. - confirms Erik remains controller/light `erik-safe` only, with heavy crawler work assigned to Proxmox/Pi workers. - Codex sync-start confirmation was added: - `sync/history/2026-04-29-codex-sync-start-confirmation.md` - confirms Codex read this TIP handoff, checked the sibling LLM Gateway handoff, and is treating `sync/` as binding. - no code changes, crawler jobs, queue waves, PM2 restarts, or Erik load were initiated during this confirmation. - Codex follow-up on 2026-04-29 clarified the active BlogLLM model: - TIP shows `fo-blog-v7`, but this is not a normal Ollama GGUF manifest. - It is a local Adapter Bridge / Mac Studio model backed by the RunPod-trained PEFT adapter: `/Users/renefichtmueller/Desktop/Claude Code/magatama/training-data/runpod/pod-runs/2026-04-25-fo-tip/final/adapters/fo_blogllm/final-adapter` - Bridge definition: `/Users/renefichtmueller/Desktop/Claude Code/magatama/scripts/ollama_adapter_bridge.py` - TIP API default: `packages/api/src/llm/client.ts` uses `OLLAMA_LLM_MODEL || "fo-blog-v7"`. - `fo-blog-v8` remains the next training candidate, not the currently active TIP BlogLLM model. - Full Codex session handoff was added: - `sync/history/2026-04-29-codex-full-session-handoff.md` - covers TIP verification, product image/detail crawling, Blog Engine Hot Topics, TIPLLM robots, training pool, Erik status, and cross-repo sync. - Added a verification robot controller: - `packages/scraper/src/robots/verification-robots.ts` - command: `npm run robots:verification -w packages/scraper -- --status` - Added TIPLLM robot experience writing: - `packages/scraper/src/crawler-llm/training-data-writer.ts` - writes raw robot audit rows and SFT records. - Added Gitea training pool import to TIP learning-pool build: - `scripts/tip-learning-pool-build.ts` - imports `TIP_TRAINING_REPO/qa-pairs/*.jsonl` into the `tip_llm` lane. - Added docs: - `docs/TIP_SELFLEARNING_WORKFLOW.md` - Added package script: - `packages/scraper/package.json` - `robots:verification` ## Gitea Training Pool - Existing local clone: `/tmp/tip-training-data` - Gitea repo: `rene/tip-training-data` - Latest pushed training commit: - `f1c83f8 crawl: add robot-status training records [2026-04-29T20:11:24.091Z]` - First robot experience record was written to: - `/tmp/tip-training-data/qa-pairs/robot-control-high.jsonl` - `/tmp/tip-training-data/robot-experiences/2026-04-29.jsonl` ## MAGATAMA Training / Operations State - Relevant local repo: - `/Users/renefichtmueller/Desktop/Claude Code/magatama` - Latest confirmed live MAGATAMA findings state: - `open findings: 0` on `2026-05-06` - Latest confirmed live resolver state: - `Codex` and `Copilot` intentionally `idle/disabled` - not a runtime outage, but a settings choice until gateway/bridge auth is intentionally re-enabled - Latest confirmed live MAGATAMA training metric after dashboard fix: - `newSinceLastTraining: 49` - Meaning: - the old `0` was incorrect. - the currently visible trainable MAGATAMA corpus is based on verified and deduplicated examples only. - Latest corpus integrity state after cleanup: - operational Gitea-backed MAGATAMA training corpus is now much smaller but cleaner: - `1368` unique verified rows - `4` live failure/escalation rows in `errors.jsonl` - do not confuse raw historical volume with real trainable signal. - Important training integrity rule: - report-only or failed/escalated records must not be treated as verified training fixes. - keep them separated from the main verified training corpus. ## Erik Status - Synced TIPLLM robot/training code to `/opt/tip`. - Did not start crawler jobs. - Did not enqueue robot waves. - Did not restart PM2 services. - Remote scraper TypeScript build is passing after removing two stale misplaced remote-only duplicate files: - `/opt/tip/packages/scraper/src/scrapers/scheduler.ts` - `/opt/tip/packages/scraper/src/vendor-discovery-crawler.ts` - `tip-api` and `tip-scraper-daemon` are online. - Shared Erik note from the same chat: - MAGATAMA dashboard/core were redeployed during compliance/training fixes. - TIP crawler policy remains unchanged: Erik is controller/light runner only, not heavy crawl execution host. ## Last Live Verification Snapshot From 2026-04-29: - Total transceivers: `13,546` - Price verified: `7,250` - Image verified: `7,025` - Details verified: `6,243` - Fully verified: `5,812` - Last price observation: `2026-04-29 19:15:53 UTC` - Last stock observation: `2026-04-29 19:15:56 UTC` ## Latest MAGATAMA Training / RunPod Truth Confirmed on `2026-05-06`: - Lane-specific training pools are now materially separated and no longer all fallback to `magatamallm`. - Live Erik dashboard API now reports: - `magatamallm` - `1367 train` - `152 eval` - `1519 total` - `newSinceLastTraining = 1367` - `fo_blogllm` - `17353 train` - `1929 eval` - `19282 total` - `newSinceLastTraining = 17353` - active local model resolves to `fo-blog-v7` - `tip_llm` - `6482 train` - `721 eval` - `7203 total` - `newSinceLastTraining = 6482` - target active model is `tip-llm-v1`, but this model is not yet present locally in Ollama - Result: - previous `1097` everywhere was stale / wrong. - selected lane now controls its own manifest, model label, and training counts. ### Gitea-backed Pool Materialization - `magatamallm` Gitea pool remains canonical and populated. - `fo_blogllm` and `tip_llm` Gitea-backed pool folders were previously almost empty; they are now materialized from the local RunPod lane exports. - Lane manifests and JSONL exports now exist under: - `training-data/gitea-learning-pool/fo_blogllm/` - `training-data/gitea-learning-pool/tip_llm/` ### RunPod Completion Hardening - MAGATAMA dashboard code now treats RunPod `COMPLETED` as success only after: 1. target model artifact is referenced 2. local Mac training API adopts/imports the artifact 3. lane-specific smoke tests pass 4. active Ollama alias is updated - New local adoption endpoint is: - `POST /adopt-runpod-model` ### Mac Training API State - The old LaunchAgent on Mac Studio was still serving the legacy training API from: - `~/magatama-llm/service/training_api.py` - It has now been upgraded in place so Erik sees the new adoption-capable API. - Verified from Erik: - `http://192.168.178.213:3214/health` returns the new service - it now exposes `register_script` pointing into the MAGATAMA repo - `POST /adopt-runpod-model` exists and rejects unauthenticated requests with `401`, proving the route is live ### Still Outstanding - A fully successful end-to-end RunPod fine-tune with: - real worker success - real artifact - successful local Ollama import - active alias switch - smoke-test proof has not yet been re-verified after the new adoption pipeline was wired in. - Latest live proof run on `2026-05-06`: - job id: `2112a7ab-68c2-4411-a44f-6edb7ad377df-e1` - materialized correctly - reached `IN_PROGRESS` - then `COMPLETED` - but RunPod `status/{job}` returned no `output` object, no model artifact reference, and no Hugging Face repo result - current MAGATAMA handling now correctly classifies this as `completed_without_model_artifact`, not as success - `tip_llm-v1` is still not installed locally in Ollama. ### Pulso AI Recommendation - Keep a shared network/transceiver/switch core corpus with TIP. - Do not collapse `Pulso AI` into the same instruction lane as `TIP_LLM`. - Recommended split: - `TIP_LLM` - research - crawler / scraper / robot planning - vendor / firmware / issue extraction - `Pulso AI` - product responses - support - diagnostics - operator explanation layer ## Safe Next Steps 1. Clone or pull Gitea `origin` on laptop/Claude Code. 2. Read this folder first. 3. For BlogLLM work, treat `fo-blog-v7` as Adapter Bridge / PEFT adapter, not as a `~/.ollama` GGUF model. 4. Also read `llm-gateway/sync/CURRENT.md` when work touches shared Erik infrastructure, LLM routing, bridges, auth, TIPLLM, or crawler orchestration. 5. For TIP robot/crawler planning, use TIPLLM only. Do not route this lane through external AI providers. 6. When training pools or model stats look suspicious, prefer verified-only counts and check whether failed/escalated rows polluted the corpus. 7. For MAGATAMA-adjacent work, keep writing learnings back into the Gitea-backed pool and avoid training on report-only pseudo-fixes. 8. If testing robots, start with dry runs only: ```bash npm run robots:verification -w packages/scraper -- --status npm run robots:verification -w packages/scraper -- --tipllm-plan --limit=3 npm run robots:verification -w packages/scraper -- --enqueue=details-fast-lane --profile=erik-safe --dry-run ``` 9. Only dispatch real crawl work after deciding the target host: - Erik: `erik-safe`, tiny batches only. - Pi: `pi-fetch`. - Proxmox: `proxmox-heavy`. ## Dirty Worktree Note There are existing uncommitted changes outside `sync/`. Some are Codex work from this session, some appear pre-existing or from earlier Claude/Codex work. Do not blindly revert them. Review `git status --short` before committing broader changes. ## Latest Sync Commits - `6c42ca7 docs: add shared agent sync handoff` - `8e7c5aa docs: link llm-gateway sync handoff` - Pending after this update: - watch whether any future guard exposure findings are genuine operational issues or new false positives. - if failures still appear inside `fixes.jsonl`, scrub historic pollution and backfill `errors.jsonl`.