24 KiB
Current TIP Sync State
Updated: 2026-05-06 15:48 UTC
Active Policy
- Put coordination notes and handoffs in this
sync/folder and push to Gitea. - Check sibling project sync folders first when context may span repos.
- Use TIPLLM only for TIP crawler/robot planning and extraction feedback.
- Write robot/crawler experience into the Gitea-backed TIPLLM training pool.
- Keep Erik safe: no heavy crawler waves or uncontrolled Playwright/discovery jobs on Erik.
- Use Proxmox/Pi workers for crawl load.
Cross-Repo Sync
Claude Code also created a Gitea sync handoff in the LLM Gateway repo:
- Repo:
rene/llm-gateway - Path:
sync/ - Commit shown by Claude:
e272105 sync: add chat handoff + context scaffolding for Codex integration (2026-04-29) - Gitea path:
http://192.168.178.196:3000/rene/llm-gateway/src/main/sync/
When work touches TIP, Magatama, LLM Gateway, bridges, auth, or shared Erik infrastructure, read both:
transceiver-db/sync/CURRENT.mdllm-gateway/sync/CURRENT.md
Latest Work
-
MAGATAMA frontend/runtime consistency was repaired again on 2026-05-06:
- dashboard and core were rebuilt locally and redeployed to Erik.
- live processes restarted successfully:
magatama-dashboardmagatama
- public
api/llm/statusnow shows the true lane-export totals formagatamallm:collectedExamples = 15620effectiveExamples = 15620evalExamples = 1736totalExamples = 17356newSinceLastTraining = 15620
- root cause for the stale
1097display:- the RunPod start SSE path still logged the legacy deduplicated
fixes.jsonlcorpus. - this was changed so RunPod launches no longer present the legacy
1097count as the active training truth. - after dataset refresh the UI now emits the lane manifest totals instead.
- the RunPod start SSE path still logged the legacy deduplicated
- RunPod completion handling was hardened:
- worker
COMPLETEDis no longer trusted blindly. - MAGATAMA now scans RunPod worker logs for real training failures (
Traceback,SyntaxError, non-zero exit, etc.) before treating the run as successful. - if the worker logs show a hidden failure, MAGATAMA records this as
completed_with_worker_failureinstead of pretending the run succeeded.
- worker
- public findings state remains currently empty:
GET /api/findings?limit=1returned{"findings":[],"total":0}- this is now rendered with an explicit empty-state row instead of a visually blank table.
- Attack Paths empty-state is now intentionally explicit rather than looking broken.
- Frontend cache and scope handling were hardened:
- cache version bumped to
2026-05-06b - stale legacy
magatama_api_cache:*entries are cleared - per-endpoint TTLs added
- invalid or empty scope selections are normalized instead of silently leaving the UI in misleading empty views
- cache version bumped to
- Switchblade rack port hover was materially improved:
- port chips now carry
data-tooltip - custom tooltip CSS is live on Erik
- the old browser-native “question mark only” behavior should be replaced by a readable hover bubble
- port chips now carry
- Changelog self-healing was added in core:
- stale cached changelog data older than 6h now forces a rebuild from git history
- verified live via dashboard proxy on Erik:
generatedAt = 2026-05-06T15:18:42.708Z- latest visible entries include
2026-04-30items again instead of appearing frozen at30.05
-
MAGATAMA lane-specific training pools and RunPod dataset automation were finished on 2026-05-06:
- root cause:
- the training modal always fetched
/api/llm/statuswithout a lane, soFO_BlogLLMandTIP_LLMstill showed themagatamallmpool.
- the training modal always fetched
- dashboard/server were updated so
/api/llm/status?lane=...is now truly lane-aware. - the training modal now refreshes per selected lane and rewrites:
- title
- runtime label
- pool path
- counts
- dataset source
- MAGATAMA dashboard env on Erik was switched to URL dataset mode for all lanes via
ecosystem.config.cjs:RUNPOD_DATASET_SOURCE=urlRUNPOD_DATASET_SOURCE_MAGATAMALLM=urlRUNPOD_DATASET_SOURCE_FO_BLOGLLM=urlRUNPOD_DATASET_SOURCE_TIP_LLM=url
- live verified on Erik after restart:
fo_blogllmdatasetSource = urlcollectionsPath = /opt/magatama/training-data/runpod/fo_blogllm/manifest.jsontrain = 28eval = 4total = 32
tip_llmdatasetSource = urlcollectionsPath = /opt/magatama/training-data/runpod/tip_llm/manifest.jsontrain = 36eval = 4total = 40
magatamallm- remains on lane-export counts (
15620 / 1736 / 17356)
- remains on lane-export counts (
- operator impact:
- no Hugging Face dataset publish is required anymore for MAGATAMA RunPod launches.
- every supported LLM lane now points to its own local/Gitea-backed lane export instead of reusing
magatamallm.
- root cause:
-
MAGATAMA training + Attack Paths + Atlas exposure were corrected again on 2026-05-06:
- the RunPod serverless training start failure was not a RunPod outage.
- root cause was missing training scripts on Erik (
training_full_refresh.tsand related helpers were absent under/opt/magatama/scripts). - Codex synced the full local
magatama/scripts/tree to Erik, added a safe fallback inscripts/model_registry_build.ts, and synced the localtraining-data/model-registry/directory. - verified on Erik:
pnpm training:refresh-allnow succeeds.- fresh dataset totals after dedupe:
magatamallm:92,742raw →17,356effective (15,620 train / 1,736 eval)fo_blogllm:32total (28 train / 4 eval)tip_llm:40total (36 train / 4 eval)
- important nuance:
- Codex did not execute the final Hugging Face publish step from Erik in this chat.
- local/script/build failures are fixed; external dataset publish still depends on the selected dataset source and explicit publish intent.
-
MAGATAMA Attack Paths UX is no longer a misleading blank panel:
- the page now distinguishes between:
- no live attack paths
- historical fallback paths
- empty selected scope (
0 assets in scope)
- when a user narrows the scope to a rack/location with zero scoped assets, the graph explicitly says so instead of looking broken.
- live dashboard HTML on Erik now contains:
Im aktuellen Scope liegen 0 Assets.Erweitere Standort oder Datacenter / Rack, damit MAGATAMA korrelierbare Assets und Pfade darstellen kann.Ohne offene mehrstufige Korrelationen bleibt die Graph-Sicht bewusst leer.
- the page now distinguishes between:
-
MAGATAMA code/training hardening was extended:
scripts/test_runpod_adapter.pyno longer loads tokenizer/model withtrust_remote_code=True.scripts/ollama_adapter_bridge.pyno longer loads tokenizer/model withtrust_remote_code=True.- this removed the live CODE finding around
HuggingFace trust_remote_codeon Erik.
-
Atlas exposure logic was tightened to stop reopening noisy LAN management findings:
- generic
atlas-exposurefindings now only stay operationally open for exposure that is meaningful enough to track as a finding. - internal RFC1918 management/service ports discovered by the broad atlas scan are no longer promoted into open Guard findings just because they exist on the LAN.
- host-specific posture for Proxmox / Erik / Mac Studio remains the job of explicit host-audit logic.
- after rebuild + deploy + health sync:
- live Postgres open findings returned to
0.
- live Postgres open findings returned to
- generic
-
Follow-up hardening on the same block:
- the earlier RunPod error path in MAGATAMA dashboard was made more truthful.
- dataset preparation now distinguishes:
- local
training:refresh-allfailure - optional Hugging Face publish failure
- URL-based dataset mode with no external publish required
- local
- the training SSE flow now explicitly tells the operator whether RunPod is using:
- Hugging Face dataset source
- or MAGATAMA URL-bundle dataset source
- this avoids misleading
RunPod not reachablewording when the actual failure is in dataset preparation. - follow-up serverless verification on 2026-05-06 narrowed the remaining fault further:
- MAGATAMA submit logic now verifies that a RunPod job really exists under
/status/{jobId}instead of trusting/run. - payloads were aligned more closely with the official Axolotl serverless schema:
model_type=AutoModelForCausalLMtokenizer_type=AutoTokenizer- dataset
split: train - optimizer
adamw_torch_fused
- verified full run attempt:
- job id
9bc4b16b-755b-465b-aadf-b46f2fe467a3-e2 - disappeared as
not_found_after_submit(404 job not found)
- job id
- verified canary after payload fix:
- job id
a4ac6951-7ed7-43cb-80d8-5ab61533c2da-e2 - immediately materialized as
IN_QUEUE - then still disappeared on later reconcile as
not_found_after_submit
- job id
- current conclusion:
- the old MAGATAMA bug is fixed.
- the remaining problem is now likely on the RunPod endpoint/release side: jobs are accepted and briefly queued, but do not survive long enough to produce a durable serverless status lifecycle.
- operational rule:
- do not treat
submittedor a briefIN_QUEUEas proof of a usable serverless training run. - only trust the run once it reaches
IN_PROGRESSor a durable terminal state with artifact evidence.
- do not treat
- MAGATAMA submit logic now verifies that a RunPod job really exists under
- follow-up training count fix on 2026-05-06 corrected the Training UI source-of-truth:
- MAGATAMA had still shown
1097because the dashboard was counting the legacy deduplicated fix corpus instead of the current lane-specific RunPod export. - dashboard now prefers
training-data/runpod/magatamallm/manifest.jsonfor the visible MagatamaLLM training count. - synced current lane export to Erik and restarted
magatama-dashboard. - verified public API now returns:
collectedExamples = 1367effectiveExamples = 1367evalExamples = 152totalExamples = 1519newSinceLastTraining = 1367
- if the browser still shows
1097, treat it as stale cached UI and hard reload.
- MAGATAMA had still shown
-
MAGATAMA was repaired end-to-end to a clean operational baseline:
- live guard host-audits for Erik, Mac Studio, and Proxmox were corrected and rerun.
- open findings were reduced all the way to
0in Postgres. - false-positive Proxmox baseline findings were removed by teaching the audit to treat internal-only management ports and default-only rpcbind exposure as acceptable for this host.
- code scanner false positives from generated/report artifacts remain excluded.
-
Live MAGATAMA protection/runtime state after the 2026-05-06 remediation:
open findings: 0queueExecuting: 0queueBlocked: 0queueFailed: 0- public
/api/healthreturnsstatus: ok - public
/api/active-resolversreturns:MAGATAMA Core: workingMagatamaLLM: workingClaude (secondary): workingCodex (secondary/manual): idleCopilot (secondary/manual): idle
-
Important resolver truth fix on 2026-05-06:
- live
codex_enabled=falsein MAGATAMA settings was causing Codex to show as a broken resolver. - dashboard logic was updated so disabled Codex/Copilot now show truthfully as
idlewithIn MAGATAMA settings disabled, instead of pretending there is a runtime outage. - the local codex bridge on Erik is reachable but currently reports
auth_required; do not treat that as a production outage while Codex is intentionally disabled in settings.
- live
-
Remaining real operational gap after findings hit zero:
- MAGATAMA still knows more assets than it actively telemeters.
- last public protection proof showed:
knownAssets: 79hostsWithTelemetry: 27assetsWithoutTelemetry: 52
- these are currently inventory/discovery-only assets, not open findings, but they remain the next real coverage expansion area.
-
MAGATAMA cross-repo state from the same chat is now synced into this handoff:
- Compliance framework cards in MAGATAMA are clickable and open per-framework requirement details.
- MAGATAMA training status was corrected so
New Since Last Trainingno longer falsely shows0. - Live verified/deduped MAGATAMA training state after the fix:
collectedExamples: 49rawExamples: 58duplicateExamples: 9effectiveExamples: 49newSinceLastTraining: 49
- MAGATAMA now filters training metrics to verified/trainable examples only.
- Failed/escalated MAGATAMA remediation records should go to
errors.jsonl, not the mainfixes.jsonl, so the next MagatamaLLM run does not train on junk. - Gitea-backed training pool remains the default target for training writes.
-
MAGATAMA coverage-gap and training-integrity hardening on 2026-05-06:
- the earlier
49mediumatlas-coverage-gapfindings were traced to Atlas treating inventory-only and discovery-only assets as operational protection failures. - core logic was tightened so Atlas coverage findings now open only for managed operational assets:
- exposure-backed assets
- explicit non-auto owner
- configured telemetry expectation
- critical/high criticality
- infrastructure metadata or managed infra device types
- loopback and passive reference/inventory assets no longer reopen noisy guard findings.
- local build succeeded, the new core dist was deployed to Erik, and the first post-deploy guard scan resolved stale findings.
- live Postgres state after deploy:
open findings = 0. - training integrity bug was fixed in
packages/core/src/learning/fix-tracking.ts:- verified fixes now append to
training-data/gitea-learning-pool/magatamallm/fixes.jsonl - failed/escalated/report-only runs now belong in
errors.jsonl
- verified fixes now append to
- two explicit Codex-written training entries were appended to the MAGATAMA Gitea-backed fixes corpus:
- atlas coverage scope hardening
- training path integrity fix
- corpus cleanup + dedupe was executed afterward:
- pre-dedupe backup kept locally as:
magatama/training-data/gitea-learning-pool/magatamallm/fixes-pre-dedupe-20260506.jsonl
- resulting verified corpus:
fixes.jsonl = 1,368unique verified training rows
- resulting failure corpus:
errors.jsonl = 4tracked failed/escalated rows
- integrity report now exists at:
magatama/training-data/gitea-learning-pool/magatamallm/corpus-integrity-report.json
- latest integrity totals:
scanned: 1368verified: 1368movedToErrors: 4parseErrors: 0invalidVerifiedFlag: 0
- pre-dedupe backup kept locally as:
- the earlier
-
Complete Codex chat sync was added:
sync/history/2026-04-29-codex-complete-chat-sync.md- captures Ghost/blog updates, LinkedIn voice preferences, LPO/AI-fabric blog edits, Rest-Is-Not-Laziness scheduling replacement, and security notes.
- confirms no secrets were written into sync.
- confirms TIP crawler/robot planning remains TIPLLM-only.
- confirms Erik remains controller/light
erik-safeonly, with heavy crawler work assigned to Proxmox/Pi workers.
-
Codex sync-start confirmation was added:
sync/history/2026-04-29-codex-sync-start-confirmation.md- confirms Codex read this TIP handoff, checked the sibling LLM Gateway handoff, and is treating
sync/as binding. - no code changes, crawler jobs, queue waves, PM2 restarts, or Erik load were initiated during this confirmation.
-
Codex follow-up on 2026-04-29 clarified the active BlogLLM model:
- TIP shows
fo-blog-v7, but this is not a normal Ollama GGUF manifest. - It is a local Adapter Bridge / Mac Studio model backed by the RunPod-trained PEFT adapter:
/Users/renefichtmueller/Desktop/Claude Code/magatama/training-data/runpod/pod-runs/2026-04-25-fo-tip/final/adapters/fo_blogllm/final-adapter - Bridge definition:
/Users/renefichtmueller/Desktop/Claude Code/magatama/scripts/ollama_adapter_bridge.py - TIP API default:
packages/api/src/llm/client.tsusesOLLAMA_LLM_MODEL || "fo-blog-v7". fo-blog-v8remains the next training candidate, not the currently active TIP BlogLLM model.
- TIP shows
-
Full Codex session handoff was added:
sync/history/2026-04-29-codex-full-session-handoff.md- covers TIP verification, product image/detail crawling, Blog Engine Hot Topics, TIPLLM robots, training pool, Erik status, and cross-repo sync.
-
Added a verification robot controller:
packages/scraper/src/robots/verification-robots.ts- command:
npm run robots:verification -w packages/scraper -- --status
-
Added TIPLLM robot experience writing:
packages/scraper/src/crawler-llm/training-data-writer.ts- writes raw robot audit rows and SFT records.
-
Added Gitea training pool import to TIP learning-pool build:
scripts/tip-learning-pool-build.ts- imports
TIP_TRAINING_REPO/qa-pairs/*.jsonlinto thetip_llmlane.
-
Added docs:
docs/TIP_SELFLEARNING_WORKFLOW.md
-
Added package script:
packages/scraper/package.jsonrobots:verification
Gitea Training Pool
- Existing local clone:
/tmp/tip-training-data - Gitea repo:
rene/tip-training-data - Latest pushed training commit:
f1c83f8 crawl: add robot-status training records [2026-04-29T20:11:24.091Z]
- First robot experience record was written to:
/tmp/tip-training-data/qa-pairs/robot-control-high.jsonl/tmp/tip-training-data/robot-experiences/2026-04-29.jsonl
MAGATAMA Training / Operations State
- Relevant local repo:
/Users/renefichtmueller/Desktop/Claude Code/magatama
- Latest confirmed live MAGATAMA findings state:
open findings: 0on2026-05-06
- Latest confirmed live resolver state:
CodexandCopilotintentionallyidle/disabled- not a runtime outage, but a settings choice until gateway/bridge auth is intentionally re-enabled
- Latest confirmed live MAGATAMA training metric after dashboard fix:
newSinceLastTraining: 49
- Meaning:
- the old
0was incorrect. - the currently visible trainable MAGATAMA corpus is based on verified and deduplicated examples only.
- the old
- Latest corpus integrity state after cleanup:
- operational Gitea-backed MAGATAMA training corpus is now much smaller but cleaner:
1368unique verified rows4live failure/escalation rows inerrors.jsonl
- do not confuse raw historical volume with real trainable signal.
- operational Gitea-backed MAGATAMA training corpus is now much smaller but cleaner:
- Important training integrity rule:
- report-only or failed/escalated records must not be treated as verified training fixes.
- keep them separated from the main verified training corpus.
Erik Status
- Synced TIPLLM robot/training code to
/opt/tip. - Did not start crawler jobs.
- Did not enqueue robot waves.
- Did not restart PM2 services.
- Remote scraper TypeScript build is passing after removing two stale misplaced remote-only duplicate files:
/opt/tip/packages/scraper/src/scrapers/scheduler.ts/opt/tip/packages/scraper/src/vendor-discovery-crawler.ts
tip-apiandtip-scraper-daemonare online.- Shared Erik note from the same chat:
- MAGATAMA dashboard/core were redeployed during compliance/training fixes.
- TIP crawler policy remains unchanged: Erik is controller/light runner only, not heavy crawl execution host.
Last Live Verification Snapshot
From 2026-04-29:
- Total transceivers:
13,546 - Price verified:
7,250 - Image verified:
7,025 - Details verified:
6,243 - Fully verified:
5,812 - Last price observation:
2026-04-29 19:15:53 UTC - Last stock observation:
2026-04-29 19:15:56 UTC
Latest MAGATAMA Training / RunPod Truth
Confirmed on 2026-05-06:
- Lane-specific training pools are now materially separated and no longer all fallback to
magatamallm. - Live Erik dashboard API now reports:
magatamallm1367 train152 eval1519 totalnewSinceLastTraining = 1367
fo_blogllm17353 train1929 eval19282 totalnewSinceLastTraining = 17353- active local model resolves to
fo-blog-v7
tip_llm6482 train721 eval7203 totalnewSinceLastTraining = 6482- target active model is
tip-llm-v1, but this model is not yet present locally in Ollama
- Result:
- previous
1097everywhere was stale / wrong. - selected lane now controls its own manifest, model label, and training counts.
- previous
Gitea-backed Pool Materialization
magatamallmGitea pool remains canonical and populated.fo_blogllmandtip_llmGitea-backed pool folders were previously almost empty; they are now materialized from the local RunPod lane exports.- Lane manifests and JSONL exports now exist under:
training-data/gitea-learning-pool/fo_blogllm/training-data/gitea-learning-pool/tip_llm/
RunPod Completion Hardening
- MAGATAMA dashboard code now treats RunPod
COMPLETEDas success only after:- target model artifact is referenced
- local Mac training API adopts/imports the artifact
- lane-specific smoke tests pass
- active Ollama alias is updated
- New local adoption endpoint is:
POST /adopt-runpod-model
Mac Training API State
- The old LaunchAgent on Mac Studio was still serving the legacy training API from:
~/magatama-llm/service/training_api.py
- It has now been upgraded in place so Erik sees the new adoption-capable API.
- Verified from Erik:
http://192.168.178.213:3214/healthreturns the new service- it now exposes
register_scriptpointing into the MAGATAMA repo POST /adopt-runpod-modelexists and rejects unauthenticated requests with401, proving the route is live
Still Outstanding
- A fully successful end-to-end RunPod fine-tune with:
- real worker success
- real artifact
- successful local Ollama import
- active alias switch
- smoke-test proof has not yet been re-verified after the new adoption pipeline was wired in.
tip_llm-v1is still not installed locally in Ollama.
Pulso AI Recommendation
- Keep a shared network/transceiver/switch core corpus with TIP.
- Do not collapse
Pulso AIinto the same instruction lane asTIP_LLM. - Recommended split:
TIP_LLM- research
- crawler / scraper / robot planning
- vendor / firmware / issue extraction
Pulso AI- product responses
- support
- diagnostics
- operator explanation layer
Safe Next Steps
- Clone or pull Gitea
originon laptop/Claude Code. - Read this folder first.
- For BlogLLM work, treat
fo-blog-v7as Adapter Bridge / PEFT adapter, not as a~/.ollamaGGUF model. - Also read
llm-gateway/sync/CURRENT.mdwhen work touches shared Erik infrastructure, LLM routing, bridges, auth, TIPLLM, or crawler orchestration. - For TIP robot/crawler planning, use TIPLLM only. Do not route this lane through external AI providers.
- When training pools or model stats look suspicious, prefer verified-only counts and check whether failed/escalated rows polluted the corpus.
- For MAGATAMA-adjacent work, keep writing learnings back into the Gitea-backed pool and avoid training on report-only pseudo-fixes.
- If testing robots, start with dry runs only:
npm run robots:verification -w packages/scraper -- --status
npm run robots:verification -w packages/scraper -- --tipllm-plan --limit=3
npm run robots:verification -w packages/scraper -- --enqueue=details-fast-lane --profile=erik-safe --dry-run
- Only dispatch real crawl work after deciding the target host:
- Erik:
erik-safe, tiny batches only. - Pi:
pi-fetch. - Proxmox:
proxmox-heavy.
- Erik:
Dirty Worktree Note
There are existing uncommitted changes outside sync/. Some are Codex work from this session, some appear pre-existing or from earlier Claude/Codex work. Do not blindly revert them. Review git status --short before committing broader changes.
Latest Sync Commits
6c42ca7 docs: add shared agent sync handoff8e7c5aa docs: link llm-gateway sync handoff- Pending after this update:
- watch whether any future guard exposure findings are genuine operational issues or new false positives.
- if failures still appear inside
fixes.jsonl, scrub historic pollution and backfillerrors.jsonl.