transceiver-db/sync/CURRENT.md
2026-05-07 01:16:25 +02:00

33 KiB
Raw Blame History

Current TIP Sync State

Updated: 2026-05-07 01:16 UTC

Active Policy

  • Put coordination notes and handoffs in this sync/ folder and push to Gitea.
  • Check sibling project sync folders first when context may span repos.
  • Use TIPLLM only for TIP crawler/robot planning and extraction feedback.
  • Write robot/crawler experience into the Gitea-backed TIPLLM training pool.
  • Keep Erik safe: no heavy crawler waves or uncontrolled Playwright/discovery jobs on Erik.
  • Use Proxmox/Pi workers for crawl load.

Cross-Repo Sync

Claude Code also created a Gitea sync handoff in the LLM Gateway repo:

  • Repo: rene/llm-gateway
  • Path: sync/
  • Commit shown by Claude: e272105 sync: add chat handoff + context scaffolding for Codex integration (2026-04-29)
  • Gitea path: http://192.168.178.196:3000/rene/llm-gateway/src/main/sync/

When work touches TIP, Magatama, LLM Gateway, bridges, auth, or shared Erik infrastructure, read both:

  • transceiver-db/sync/CURRENT.md
  • llm-gateway/sync/CURRENT.md

Latest Work

  • MAGATAMA local MagatamaLLM training state was re-verified on 2026-05-07:

    • result:
      • the lane export / dataset refresh worked
      • a new locally adopted MagatamaLLM model did not land
      • active MAGATAMA provider remains the older alias:
        • ollama:magatama-coder:latest
    • live/public evidence:
      • GET https://magatama.fichtmueller.org/api/llm/status
        • activeProvider = ollama:magatama-coder:latest
        • autoFixProvider = ollama:magatama-coder:latest
        • training.lastTrainingAt = 2026-05-06T22:43:20Z
        • training.modelVersion = magatama-coder:latest
        • training.activeRun = null
      • this means the UI timestamp currently reflects the latest dataset/training-state update, not proof of a newly adopted local model.
    • local Mac evidence:
      • ollama list still shows:
        • magatama-coder:latest → modified 3 weeks ago
        • magatama-llm-v2-0:latest → modified 11 days ago
      • no newer Magatama candidate/import alias appeared locally
    • registry/adoption evidence:
      • Erik lane manifest exists and is fresh:
        • /opt/magatama/training-data/runpod/magatamallm/manifest.json
        • generatedAt = 2026-05-06T22:45:15.944Z
        • train = 15679
        • eval = 1743
        • total = 17422
      • but Erik had no populated local adoption/registry state files in:
        • /opt/magatama/training-data/model-registry/models.json
        • /opt/magatama/training-data/model-registry/runs.json
        • /opt/magatama/training-data/model-registry/active.json
        • /opt/magatama/data/llm-status.json
      • local repo only had historical training-data/model-registry/training-runs.json
    • historical run evidence:
      • recent magatamallm training-run records still show:
        • submitted
        • then not_found_after_submit
        • or other non-adopted / worker-failure states
      • there is still no verified “completed_and_adopted” proof for a new MagatamaLLM local model.
    • operational conclusion:
      • current truth:
        • dataset/lane preparation works
        • local model adoption is still the missing step
        • MAGATAMA does not currently know more than the already active magatama-coder:latest alias
      • next fix block remains:
        • make RunPod/local completion count only when adoption succeeds
        • persist adoption report + model registry state
        • update active alias and version only after smoke-tested import succeeds
  • MAGATAMA Switchblade port intelligence is now truly flowing end-to-end on 2026-05-06:

    • live root cause:
      • Switchblade itself already had the rich SG350 data (description, LLDP neighbor, peer port, octets), but MAGATAMA had still shown mostly flat port chips.
      • verified live on Erik:
        • the real Switchblade runtime is the PM2 app switchblade under /opt/switchblade-app, not the older /opt/switchblade tree.
        • GET http://127.0.0.1:3000/api/discovery/snmp for 192.168.178.2 already returned rich rows such as:
          • GigabitEthernet3 → description Aruba-1830-UNUSED, neighbor VN46KYC0G0, peer port 11
          • GigabitEthernet5 → description Tashi-204, neighbor fritz.box, peer LAN:1
          • GigabitEthernet25 → description to Cisco Business 220 Series, neighbor Switch39688E, peer gi9
      • the remaining loss point was MAGATAMAs own Switchblade sync/persistence path.
    • MAGATAMA sync hardening:
      • scripts/switchblade_live_sync.ts
        • now prefers live SNMP discovery data when it is richer than /api/devices/<ip>
        • now maps description, peerDevice, peerPort, connectedHost, inOctets, outOctets into rack device ports
        • added optional debug snapshot dump support via SWITCHBLADE_DEBUG_SNAPSHOT_FILE
        • sanitizes unreadable peer-port strings and drops synthetic high-index numeric pseudo-ports
      • verified with a forced live run on Erik:
        • Top of Rack Switch now exports 28 real SG350 ports into the rack snapshot instead of the earlier flattened/odd set
        • sample verified payloads before POST:
          • port 3 → Aruba-1830-UNUSED / VN46KYC0G0 / 11
          • port 5 → Tashi-204 / fritz.box / LAN:1
          • port 25 → to Cisco Business 220 Series / Switch39688E / gi9
    • MAGATAMA core hardening:
      • packages/core/src/routes/health-types.ts
        • SwitchbladePortSnapshot now preserves:
          • description
          • vlan
          • macCount
          • peerDevice
          • peerPort
          • connectedHost
          • transceiver
          • inOctets
          • outOctets
      • packages/core/src/routes/health-support.ts
        • normalizeSwitchbladePort() now keeps those additional port fields instead of silently truncating them
      • rebuilt locally and re-rsynced the new packages/core/dist to Erik
    • dashboard/UI hardening:
      • packages/dashboard/public/index-v2.html
        • port chips already had custom tooltip support; now they also carry native title= fallback text
        • this reduces the old “question mark / unclear hover” problem in browsers that do not immediately show the custom bubble
    • live public verification after deploy:
      • GET https://magatama.fichtmueller.org/api/switchblade/snapshot
        • now contains enriched SG350 rack-port records with:
          • description
          • peerDevice
          • peerPort
          • connectedHost
          • inOctets
          • outOctets
        • public snapshot timestamp verified:
          • receivedAt = 2026-05-06T22:51:59.247Z
      • Top of Rack Switch in the public snapshot now exposes meaningful peer/use-case data instead of only flat status counters
    • operator impact:
      • MAGATAMA can now answer the actual operational question per port:
        • what is on this port
        • what is it talking to
        • what does the link look like
      • this is now grounded in Switchblade live SNMP/LLDP data, not guesswork.
  • TIP/Blog lane separation was materially corrected on 2026-05-06:

    • root cause:
      • TIP_LLM was still ingesting blog-/writer-shaped rows from the canonical lane pool and shared transceiver corpora.
      • local inspection showed the old TIP export had 6250 train rows, of which 6087 still matched blog/writer patterns.
    • dataset builder and Gitea sync were hardened:
      • scripts/runpod_dataset_builder.ts
        • added strict tipDatasetAllowed(...)
        • TIP_LLM now rejects blog-shaped source rows at dataset-build time
        • TIP_LLM now rejects blog-like system, user, and markdown-article assistant patterns
        • registry fallback for TIP_LLM now only uses lane-compatible datasets
      • scripts/sync_gitea_training_pool.ts
        • canonical TIP pool refresh now uses the stricter lane-alignment rules
        • redundant merged.jsonl copies for fo_blogllm and tip_llm are no longer rewritten, to avoid local disk exhaustion from duplicate lane artifacts
    • local disk issue encountered and fixed:
      • full refresh failed with ENOSPC while writing training-data/gitea-learning-pool/tip_llm/merged.jsonl
      • redundant lane merged artifacts for fo_blogllm and tip_llm were truncated and the sync script was changed to stop recreating them
      • free disk space returned from 377Mi to 17Gi
    • locally verified after rebuild:
      • TIP_LLM RunPod export:
        • train = 233
        • eval = 26
        • total = 259
        • blog/writer matches = 0
      • first TIP rows now use the correct TIP system prompt:
        • You are TIP_LLM, a research and market-intelligence analyst for transceivers, switches, and vendor ecosystems...
    • corrected artifacts and scripts were synced to Erik and pnpm training:refresh-all was rerun there.
    • live verified on Erik/public API:
      • magatamallm
        • datasetSource = url
        • collectedExamples = 15679
        • evalExamples = 1743
        • totalExamples = 17422
        • newSinceLastTraining = 15679
      • fo_blogllm
        • datasetSource = url
        • collectedExamples = 17322
        • evalExamples = 1926
        • totalExamples = 19254
        • neverTrained = true
      • tip_llm
        • datasetSource = url
        • collectedExamples = 231
        • evalExamples = 26
        • totalExamples = 257
        • neverTrained = true
    • operational conclusion:
      • lane-specific dataset truth is now real on Erik.
      • TIP_LLM is no longer silently borrowing the FO_Blog behavior lane.
      • the next remaining hard problem is now RunPod artifact adoption/validation, not lane contamination.
  • MAGATAMA frontend/runtime consistency was repaired again on 2026-05-06:

    • dashboard and core were rebuilt locally and redeployed to Erik.
    • live processes restarted successfully:
      • magatama-dashboard
      • magatama
    • public api/llm/status now shows the true lane-export totals for magatamallm:
      • collectedExamples = 15620
      • effectiveExamples = 15620
      • evalExamples = 1736
      • totalExamples = 17356
      • newSinceLastTraining = 15620
    • root cause for the stale 1097 display:
      • the RunPod start SSE path still logged the legacy deduplicated fixes.jsonl corpus.
      • this was changed so RunPod launches no longer present the legacy 1097 count as the active training truth.
      • after dataset refresh the UI now emits the lane manifest totals instead.
    • RunPod completion handling was hardened:
      • worker COMPLETED is no longer trusted blindly.
      • MAGATAMA now scans RunPod worker logs for real training failures (Traceback, SyntaxError, non-zero exit, etc.) before treating the run as successful.
      • if the worker logs show a hidden failure, MAGATAMA records this as completed_with_worker_failure instead of pretending the run succeeded.
    • public findings state remains currently empty:
      • GET /api/findings?limit=1 returned {"findings":[],"total":0}
      • this is now rendered with an explicit empty-state row instead of a visually blank table.
    • Attack Paths empty-state is now intentionally explicit rather than looking broken.
    • Frontend cache and scope handling were hardened:
      • cache version bumped to 2026-05-06b
      • stale legacy magatama_api_cache:* entries are cleared
      • per-endpoint TTLs added
      • invalid or empty scope selections are normalized instead of silently leaving the UI in misleading empty views
    • Switchblade rack port hover was materially improved:
      • port chips now carry data-tooltip
      • custom tooltip CSS is live on Erik
      • the old browser-native “question mark only” behavior should be replaced by a readable hover bubble
    • Changelog self-healing was added in core:
      • stale cached changelog data older than 6h now forces a rebuild from git history
      • verified live via dashboard proxy on Erik:
        • generatedAt = 2026-05-06T15:18:42.708Z
        • latest visible entries include 2026-04-30 items again instead of appearing frozen at 30.05
  • MAGATAMA lane-specific training pools and RunPod dataset automation were finished on 2026-05-06:

    • root cause:
      • the training modal always fetched /api/llm/status without a lane, so FO_BlogLLM and TIP_LLM still showed the magatamallm pool.
    • dashboard/server were updated so /api/llm/status?lane=... is now truly lane-aware.
    • the training modal now refreshes per selected lane and rewrites:
      • title
      • runtime label
      • pool path
      • counts
      • dataset source
    • MAGATAMA dashboard env on Erik was switched to URL dataset mode for all lanes via ecosystem.config.cjs:
      • RUNPOD_DATASET_SOURCE=url
      • RUNPOD_DATASET_SOURCE_MAGATAMALLM=url
      • RUNPOD_DATASET_SOURCE_FO_BLOGLLM=url
      • RUNPOD_DATASET_SOURCE_TIP_LLM=url
    • live verified on Erik after restart:
      • fo_blogllm
        • datasetSource = url
        • collectionsPath = /opt/magatama/training-data/runpod/fo_blogllm/manifest.json
        • train = 28
        • eval = 4
        • total = 32
      • tip_llm
        • datasetSource = url
        • collectionsPath = /opt/magatama/training-data/runpod/tip_llm/manifest.json
        • train = 36
        • eval = 4
        • total = 40
      • magatamallm
        • remains on lane-export counts (15620 / 1736 / 17356)
    • operator impact:
      • no Hugging Face dataset publish is required anymore for MAGATAMA RunPod launches.
      • every supported LLM lane now points to its own local/Gitea-backed lane export instead of reusing magatamallm.
  • MAGATAMA training + Attack Paths + Atlas exposure were corrected again on 2026-05-06:

    • the RunPod serverless training start failure was not a RunPod outage.
    • root cause was missing training scripts on Erik (training_full_refresh.ts and related helpers were absent under /opt/magatama/scripts).
    • Codex synced the full local magatama/scripts/ tree to Erik, added a safe fallback in scripts/model_registry_build.ts, and synced the local training-data/model-registry/ directory.
    • verified on Erik:
      • pnpm training:refresh-all now succeeds.
      • fresh dataset totals after dedupe:
        • magatamallm: 92,742 raw → 17,356 effective (15,620 train / 1,736 eval)
        • fo_blogllm: 32 total (28 train / 4 eval)
        • tip_llm: 40 total (36 train / 4 eval)
    • important nuance:
      • Codex did not execute the final Hugging Face publish step from Erik in this chat.
      • local/script/build failures are fixed; external dataset publish still depends on the selected dataset source and explicit publish intent.
  • MAGATAMA Attack Paths UX is no longer a misleading blank panel:

    • the page now distinguishes between:
      • no live attack paths
      • historical fallback paths
      • empty selected scope (0 assets in scope)
    • when a user narrows the scope to a rack/location with zero scoped assets, the graph explicitly says so instead of looking broken.
    • live dashboard HTML on Erik now contains:
      • Im aktuellen Scope liegen 0 Assets.
      • Erweitere Standort oder Datacenter / Rack, damit MAGATAMA korrelierbare Assets und Pfade darstellen kann.
      • Ohne offene mehrstufige Korrelationen bleibt die Graph-Sicht bewusst leer.
  • MAGATAMA code/training hardening was extended:

    • scripts/test_runpod_adapter.py no longer loads tokenizer/model with trust_remote_code=True.
    • scripts/ollama_adapter_bridge.py no longer loads tokenizer/model with trust_remote_code=True.
    • this removed the live CODE finding around HuggingFace trust_remote_code on Erik.
  • Atlas exposure logic was tightened to stop reopening noisy LAN management findings:

    • generic atlas-exposure findings now only stay operationally open for exposure that is meaningful enough to track as a finding.
    • internal RFC1918 management/service ports discovered by the broad atlas scan are no longer promoted into open Guard findings just because they exist on the LAN.
    • host-specific posture for Proxmox / Erik / Mac Studio remains the job of explicit host-audit logic.
    • after rebuild + deploy + health sync:
      • live Postgres open findings returned to 0.
  • Follow-up hardening on the same block:

    • the earlier RunPod error path in MAGATAMA dashboard was made more truthful.
    • dataset preparation now distinguishes:
      • local training:refresh-all failure
      • optional Hugging Face publish failure
      • URL-based dataset mode with no external publish required
    • the training SSE flow now explicitly tells the operator whether RunPod is using:
      • Hugging Face dataset source
      • or MAGATAMA URL-bundle dataset source
    • this avoids misleading RunPod not reachable wording when the actual failure is in dataset preparation.
    • follow-up serverless verification on 2026-05-06 narrowed the remaining fault further:
      • MAGATAMA submit logic now verifies that a RunPod job really exists under /status/{jobId} instead of trusting /run.
      • payloads were aligned more closely with the official Axolotl serverless schema:
        • model_type=AutoModelForCausalLM
        • tokenizer_type=AutoTokenizer
        • dataset split: train
        • optimizer adamw_torch_fused
      • verified full run attempt:
        • job id 9bc4b16b-755b-465b-aadf-b46f2fe467a3-e2
        • disappeared as not_found_after_submit (404 job not found)
      • verified canary after payload fix:
        • job id a4ac6951-7ed7-43cb-80d8-5ab61533c2da-e2
        • immediately materialized as IN_QUEUE
        • then still disappeared on later reconcile as not_found_after_submit
      • current conclusion:
        • the old MAGATAMA bug is fixed.
        • the remaining problem is now likely on the RunPod endpoint/release side: jobs are accepted and briefly queued, but do not survive long enough to produce a durable serverless status lifecycle.
      • operational rule:
        • do not treat submitted or a brief IN_QUEUE as proof of a usable serverless training run.
        • only trust the run once it reaches IN_PROGRESS or a durable terminal state with artifact evidence.
    • follow-up training count fix on 2026-05-06 corrected the Training UI source-of-truth:
      • MAGATAMA had still shown 1097 because the dashboard was counting the legacy deduplicated fix corpus instead of the current lane-specific RunPod export.
      • dashboard now prefers training-data/runpod/magatamallm/manifest.json for the visible MagatamaLLM training count.
      • synced current lane export to Erik and restarted magatama-dashboard.
      • verified public API now returns:
        • collectedExamples = 1367
        • effectiveExamples = 1367
        • evalExamples = 152
        • totalExamples = 1519
        • newSinceLastTraining = 1367
      • if the browser still shows 1097, treat it as stale cached UI and hard reload.
  • MAGATAMA was repaired end-to-end to a clean operational baseline:

    • live guard host-audits for Erik, Mac Studio, and Proxmox were corrected and rerun.
    • open findings were reduced all the way to 0 in Postgres.
    • false-positive Proxmox baseline findings were removed by teaching the audit to treat internal-only management ports and default-only rpcbind exposure as acceptable for this host.
    • code scanner false positives from generated/report artifacts remain excluded.
  • Live MAGATAMA protection/runtime state after the 2026-05-06 remediation:

    • open findings: 0
    • queueExecuting: 0
    • queueBlocked: 0
    • queueFailed: 0
    • public /api/health returns status: ok
    • public /api/active-resolvers returns:
      • MAGATAMA Core: working
      • MagatamaLLM: working
      • Claude (secondary): working
      • Codex (secondary/manual): idle
      • Copilot (secondary/manual): idle
  • Important resolver truth fix on 2026-05-06:

    • live codex_enabled=false in MAGATAMA settings was causing Codex to show as a broken resolver.
    • dashboard logic was updated so disabled Codex/Copilot now show truthfully as idle with In MAGATAMA settings disabled, instead of pretending there is a runtime outage.
    • the local codex bridge on Erik is reachable but currently reports auth_required; do not treat that as a production outage while Codex is intentionally disabled in settings.
  • Remaining real operational gap after findings hit zero:

    • MAGATAMA still knows more assets than it actively telemeters.
    • last public protection proof showed:
      • knownAssets: 79
      • hostsWithTelemetry: 27
      • assetsWithoutTelemetry: 52
    • these are currently inventory/discovery-only assets, not open findings, but they remain the next real coverage expansion area.
  • MAGATAMA cross-repo state from the same chat is now synced into this handoff:

    • Compliance framework cards in MAGATAMA are clickable and open per-framework requirement details.
    • MAGATAMA training status was corrected so New Since Last Training no longer falsely shows 0.
    • Live verified/deduped MAGATAMA training state after the fix:
      • collectedExamples: 49
      • rawExamples: 58
      • duplicateExamples: 9
      • effectiveExamples: 49
      • newSinceLastTraining: 49
    • MAGATAMA now filters training metrics to verified/trainable examples only.
    • Failed/escalated MAGATAMA remediation records should go to errors.jsonl, not the main fixes.jsonl, so the next MagatamaLLM run does not train on junk.
    • Gitea-backed training pool remains the default target for training writes.
  • MAGATAMA coverage-gap and training-integrity hardening on 2026-05-06:

    • the earlier 49 medium atlas-coverage-gap findings were traced to Atlas treating inventory-only and discovery-only assets as operational protection failures.
    • core logic was tightened so Atlas coverage findings now open only for managed operational assets:
      • exposure-backed assets
      • explicit non-auto owner
      • configured telemetry expectation
      • critical/high criticality
      • infrastructure metadata or managed infra device types
    • loopback and passive reference/inventory assets no longer reopen noisy guard findings.
    • local build succeeded, the new core dist was deployed to Erik, and the first post-deploy guard scan resolved stale findings.
    • live Postgres state after deploy: open findings = 0.
    • training integrity bug was fixed in packages/core/src/learning/fix-tracking.ts:
      • verified fixes now append to training-data/gitea-learning-pool/magatamallm/fixes.jsonl
      • failed/escalated/report-only runs now belong in errors.jsonl
    • two explicit Codex-written training entries were appended to the MAGATAMA Gitea-backed fixes corpus:
      • atlas coverage scope hardening
      • training path integrity fix
    • corpus cleanup + dedupe was executed afterward:
      • pre-dedupe backup kept locally as:
        • magatama/training-data/gitea-learning-pool/magatamallm/fixes-pre-dedupe-20260506.jsonl
      • resulting verified corpus:
        • fixes.jsonl = 1,368 unique verified training rows
      • resulting failure corpus:
        • errors.jsonl = 4 tracked failed/escalated rows
      • integrity report now exists at:
        • magatama/training-data/gitea-learning-pool/magatamallm/corpus-integrity-report.json
      • latest integrity totals:
        • scanned: 1368
        • verified: 1368
        • movedToErrors: 4
        • parseErrors: 0
        • invalidVerifiedFlag: 0
  • Complete Codex chat sync was added:

    • sync/history/2026-04-29-codex-complete-chat-sync.md
    • captures Ghost/blog updates, LinkedIn voice preferences, LPO/AI-fabric blog edits, Rest-Is-Not-Laziness scheduling replacement, and security notes.
    • confirms no secrets were written into sync.
    • confirms TIP crawler/robot planning remains TIPLLM-only.
    • confirms Erik remains controller/light erik-safe only, with heavy crawler work assigned to Proxmox/Pi workers.
  • Codex sync-start confirmation was added:

    • sync/history/2026-04-29-codex-sync-start-confirmation.md
    • confirms Codex read this TIP handoff, checked the sibling LLM Gateway handoff, and is treating sync/ as binding.
    • no code changes, crawler jobs, queue waves, PM2 restarts, or Erik load were initiated during this confirmation.
  • Codex follow-up on 2026-04-29 clarified the active BlogLLM model:

    • TIP shows fo-blog-v7, but this is not a normal Ollama GGUF manifest.
    • It is a local Adapter Bridge / Mac Studio model backed by the RunPod-trained PEFT adapter: /Users/renefichtmueller/Desktop/Claude Code/magatama/training-data/runpod/pod-runs/2026-04-25-fo-tip/final/adapters/fo_blogllm/final-adapter
    • Bridge definition: /Users/renefichtmueller/Desktop/Claude Code/magatama/scripts/ollama_adapter_bridge.py
    • TIP API default: packages/api/src/llm/client.ts uses OLLAMA_LLM_MODEL || "fo-blog-v7".
    • fo-blog-v8 remains the next training candidate, not the currently active TIP BlogLLM model.
  • Full Codex session handoff was added:

    • sync/history/2026-04-29-codex-full-session-handoff.md
    • covers TIP verification, product image/detail crawling, Blog Engine Hot Topics, TIPLLM robots, training pool, Erik status, and cross-repo sync.
  • Added a verification robot controller:

    • packages/scraper/src/robots/verification-robots.ts
    • command: npm run robots:verification -w packages/scraper -- --status
  • Added TIPLLM robot experience writing:

    • packages/scraper/src/crawler-llm/training-data-writer.ts
    • writes raw robot audit rows and SFT records.
  • Added Gitea training pool import to TIP learning-pool build:

    • scripts/tip-learning-pool-build.ts
    • imports TIP_TRAINING_REPO/qa-pairs/*.jsonl into the tip_llm lane.
  • Added docs:

    • docs/TIP_SELFLEARNING_WORKFLOW.md
  • Added package script:

    • packages/scraper/package.json
    • robots:verification

Gitea Training Pool

  • Existing local clone: /tmp/tip-training-data
  • Gitea repo: rene/tip-training-data
  • Latest pushed training commit:
    • f1c83f8 crawl: add robot-status training records [2026-04-29T20:11:24.091Z]
  • First robot experience record was written to:
    • /tmp/tip-training-data/qa-pairs/robot-control-high.jsonl
    • /tmp/tip-training-data/robot-experiences/2026-04-29.jsonl

MAGATAMA Training / Operations State

  • Relevant local repo:
    • /Users/renefichtmueller/Desktop/Claude Code/magatama
  • Latest confirmed live MAGATAMA findings state:
    • open findings: 0 on 2026-05-06
  • Latest confirmed live resolver state:
    • Codex and Copilot intentionally idle/disabled
    • not a runtime outage, but a settings choice until gateway/bridge auth is intentionally re-enabled
  • Latest confirmed live MAGATAMA training metric after dashboard fix:
    • newSinceLastTraining: 49
  • Meaning:
    • the old 0 was incorrect.
    • the currently visible trainable MAGATAMA corpus is based on verified and deduplicated examples only.
  • Latest corpus integrity state after cleanup:
    • operational Gitea-backed MAGATAMA training corpus is now much smaller but cleaner:
      • 1368 unique verified rows
      • 4 live failure/escalation rows in errors.jsonl
    • do not confuse raw historical volume with real trainable signal.
  • Important training integrity rule:
    • report-only or failed/escalated records must not be treated as verified training fixes.
    • keep them separated from the main verified training corpus.

Erik Status

  • Synced TIPLLM robot/training code to /opt/tip.
  • Did not start crawler jobs.
  • Did not enqueue robot waves.
  • Did not restart PM2 services.
  • Remote scraper TypeScript build is passing after removing two stale misplaced remote-only duplicate files:
    • /opt/tip/packages/scraper/src/scrapers/scheduler.ts
    • /opt/tip/packages/scraper/src/vendor-discovery-crawler.ts
  • tip-api and tip-scraper-daemon are online.
  • Shared Erik note from the same chat:
    • MAGATAMA dashboard/core were redeployed during compliance/training fixes.
    • TIP crawler policy remains unchanged: Erik is controller/light runner only, not heavy crawl execution host.

Last Live Verification Snapshot

From 2026-04-29:

  • Total transceivers: 13,546
  • Price verified: 7,250
  • Image verified: 7,025
  • Details verified: 6,243
  • Fully verified: 5,812
  • Last price observation: 2026-04-29 19:15:53 UTC
  • Last stock observation: 2026-04-29 19:15:56 UTC

Latest MAGATAMA Training / RunPod Truth

Confirmed on 2026-05-06:

  • Lane-specific training pools are now materially separated and no longer all fallback to magatamallm.
  • Live Erik dashboard API now reports:
    • magatamallm
      • 1367 train
      • 152 eval
      • 1519 total
      • newSinceLastTraining = 1367
    • fo_blogllm
      • 17353 train
      • 1929 eval
      • 19282 total
      • newSinceLastTraining = 17353
      • active local model resolves to fo-blog-v7
    • tip_llm
      • 6482 train
      • 721 eval
      • 7203 total
      • newSinceLastTraining = 6482
      • target active model is tip-llm-v1, but this model is not yet present locally in Ollama
  • Result:
    • previous 1097 everywhere was stale / wrong.
    • selected lane now controls its own manifest, model label, and training counts.

Gitea-backed Pool Materialization

  • magatamallm Gitea pool remains canonical and populated.
  • fo_blogllm and tip_llm Gitea-backed pool folders were previously almost empty; they are now materialized from the local RunPod lane exports.
  • Lane manifests and JSONL exports now exist under:
    • training-data/gitea-learning-pool/fo_blogllm/
    • training-data/gitea-learning-pool/tip_llm/

RunPod Completion Hardening

  • MAGATAMA dashboard code now treats RunPod COMPLETED as success only after:
    1. target model artifact is referenced
    2. local Mac training API adopts/imports the artifact
    3. lane-specific smoke tests pass
    4. active Ollama alias is updated
  • New local adoption endpoint is:
    • POST /adopt-runpod-model

Mac Training API State

  • The old LaunchAgent on Mac Studio was still serving the legacy training API from:
    • ~/magatama-llm/service/training_api.py
  • It has now been upgraded in place so Erik sees the new adoption-capable API.
  • Verified from Erik:
    • http://192.168.178.213:3214/health returns the new service
    • it now exposes register_script pointing into the MAGATAMA repo
    • POST /adopt-runpod-model exists and rejects unauthenticated requests with 401, proving the route is live

Still Outstanding

  • A fully successful end-to-end RunPod fine-tune with:
    • real worker success
    • real artifact
    • successful local Ollama import
    • active alias switch
    • smoke-test proof has not yet been re-verified after the new adoption pipeline was wired in.
  • Latest live proof run on 2026-05-06:
    • job id: 2112a7ab-68c2-4411-a44f-6edb7ad377df-e1
    • materialized correctly
    • reached IN_PROGRESS
    • then COMPLETED
    • but RunPod status/{job} returned no output object, no model artifact reference, and no Hugging Face repo result
    • current MAGATAMA handling now correctly classifies this as completed_without_model_artifact, not as success
  • tip_llm-v1 is still not installed locally in Ollama.

Pulso AI Recommendation

  • Keep a shared network/transceiver/switch core corpus with TIP.
  • Do not collapse Pulso AI into the same instruction lane as TIP_LLM.
  • Recommended split:
    • TIP_LLM
      • research
      • crawler / scraper / robot planning
      • vendor / firmware / issue extraction
    • Pulso AI
      • product responses
      • support
      • diagnostics
      • operator explanation layer

Safe Next Steps

  1. Clone or pull Gitea origin on laptop/Claude Code.
  2. Read this folder first.
  3. For BlogLLM work, treat fo-blog-v7 as Adapter Bridge / PEFT adapter, not as a ~/.ollama GGUF model.
  4. Also read llm-gateway/sync/CURRENT.md when work touches shared Erik infrastructure, LLM routing, bridges, auth, TIPLLM, or crawler orchestration.
  5. For TIP robot/crawler planning, use TIPLLM only. Do not route this lane through external AI providers.
  6. When training pools or model stats look suspicious, prefer verified-only counts and check whether failed/escalated rows polluted the corpus.
  7. For MAGATAMA-adjacent work, keep writing learnings back into the Gitea-backed pool and avoid training on report-only pseudo-fixes.
  8. If testing robots, start with dry runs only:
npm run robots:verification -w packages/scraper -- --status
npm run robots:verification -w packages/scraper -- --tipllm-plan --limit=3
npm run robots:verification -w packages/scraper -- --enqueue=details-fast-lane --profile=erik-safe --dry-run
  1. Only dispatch real crawl work after deciding the target host:
    • Erik: erik-safe, tiny batches only.
    • Pi: pi-fetch.
    • Proxmox: proxmox-heavy.

Dirty Worktree Note

There are existing uncommitted changes outside sync/. Some are Codex work from this session, some appear pre-existing or from earlier Claude/Codex work. Do not blindly revert them. Review git status --short before committing broader changes.

Latest Sync Commits

  • 6c42ca7 docs: add shared agent sync handoff
  • 8e7c5aa docs: link llm-gateway sync handoff
  • Pending after this update:
    • watch whether any future guard exposure findings are genuine operational issues or new false positives.
    • if failures still appear inside fixes.jsonl, scrub historic pollution and backfill errors.jsonl.