transceiver-db/sync/CURRENT.md
2026-05-06 10:46:56 +02:00

12 KiB

Current TIP Sync State

Updated: 2026-05-06 10:28 UTC

Active Policy

  • Put coordination notes and handoffs in this sync/ folder and push to Gitea.
  • Check sibling project sync folders first when context may span repos.
  • Use TIPLLM only for TIP crawler/robot planning and extraction feedback.
  • Write robot/crawler experience into the Gitea-backed TIPLLM training pool.
  • Keep Erik safe: no heavy crawler waves or uncontrolled Playwright/discovery jobs on Erik.
  • Use Proxmox/Pi workers for crawl load.

Cross-Repo Sync

Claude Code also created a Gitea sync handoff in the LLM Gateway repo:

  • Repo: rene/llm-gateway
  • Path: sync/
  • Commit shown by Claude: e272105 sync: add chat handoff + context scaffolding for Codex integration (2026-04-29)
  • Gitea path: http://192.168.178.196:3000/rene/llm-gateway/src/main/sync/

When work touches TIP, Magatama, LLM Gateway, bridges, auth, or shared Erik infrastructure, read both:

  • transceiver-db/sync/CURRENT.md
  • llm-gateway/sync/CURRENT.md

Latest Work

  • MAGATAMA was repaired end-to-end to a clean operational baseline:

    • live guard host-audits for Erik, Mac Studio, and Proxmox were corrected and rerun.
    • open findings were reduced all the way to 0 in Postgres.
    • false-positive Proxmox baseline findings were removed by teaching the audit to treat internal-only management ports and default-only rpcbind exposure as acceptable for this host.
    • code scanner false positives from generated/report artifacts remain excluded.
  • Live MAGATAMA protection/runtime state after the 2026-05-06 remediation:

    • open findings: 0
    • queueExecuting: 0
    • queueBlocked: 0
    • queueFailed: 0
    • public /api/health returns status: ok
    • public /api/active-resolvers returns:
      • MAGATAMA Core: working
      • MagatamaLLM: working
      • Claude (secondary): working
      • Codex (secondary/manual): idle
      • Copilot (secondary/manual): idle
  • Important resolver truth fix on 2026-05-06:

    • live codex_enabled=false in MAGATAMA settings was causing Codex to show as a broken resolver.
    • dashboard logic was updated so disabled Codex/Copilot now show truthfully as idle with In MAGATAMA settings disabled, instead of pretending there is a runtime outage.
    • the local codex bridge on Erik is reachable but currently reports auth_required; do not treat that as a production outage while Codex is intentionally disabled in settings.
  • Remaining real operational gap after findings hit zero:

    • MAGATAMA still knows more assets than it actively telemeters.
    • last public protection proof showed:
      • knownAssets: 79
      • hostsWithTelemetry: 27
      • assetsWithoutTelemetry: 52
    • these are currently inventory/discovery-only assets, not open findings, but they remain the next real coverage expansion area.
  • MAGATAMA cross-repo state from the same chat is now synced into this handoff:

    • Compliance framework cards in MAGATAMA are clickable and open per-framework requirement details.
    • MAGATAMA training status was corrected so New Since Last Training no longer falsely shows 0.
    • Live verified/deduped MAGATAMA training state after the fix:
      • collectedExamples: 49
      • rawExamples: 58
      • duplicateExamples: 9
      • effectiveExamples: 49
      • newSinceLastTraining: 49
    • MAGATAMA now filters training metrics to verified/trainable examples only.
    • Failed/escalated MAGATAMA remediation records should go to errors.jsonl, not the main fixes.jsonl, so the next MagatamaLLM run does not train on junk.
    • Gitea-backed training pool remains the default target for training writes.
  • MAGATAMA coverage-gap and training-integrity hardening on 2026-05-06:

    • the earlier 49 medium atlas-coverage-gap findings were traced to Atlas treating inventory-only and discovery-only assets as operational protection failures.
    • core logic was tightened so Atlas coverage findings now open only for managed operational assets:
      • exposure-backed assets
      • explicit non-auto owner
      • configured telemetry expectation
      • critical/high criticality
      • infrastructure metadata or managed infra device types
    • loopback and passive reference/inventory assets no longer reopen noisy guard findings.
    • local build succeeded, the new core dist was deployed to Erik, and the first post-deploy guard scan resolved stale findings.
    • live Postgres state after deploy: open findings = 0.
    • training integrity bug was fixed in packages/core/src/learning/fix-tracking.ts:
      • verified fixes now append to training-data/gitea-learning-pool/magatamallm/fixes.jsonl
      • failed/escalated/report-only runs now belong in errors.jsonl
    • two explicit Codex-written training entries were appended to the MAGATAMA Gitea-backed fixes corpus:
      • atlas coverage scope hardening
      • training path integrity fix
    • corpus cleanup + dedupe was executed afterward:
      • pre-dedupe backup kept locally as:
        • magatama/training-data/gitea-learning-pool/magatamallm/fixes-pre-dedupe-20260506.jsonl
      • resulting verified corpus:
        • fixes.jsonl = 1,368 unique verified training rows
      • resulting failure corpus:
        • errors.jsonl = 4 tracked failed/escalated rows
      • integrity report now exists at:
        • magatama/training-data/gitea-learning-pool/magatamallm/corpus-integrity-report.json
      • latest integrity totals:
        • scanned: 1368
        • verified: 1368
        • movedToErrors: 4
        • parseErrors: 0
        • invalidVerifiedFlag: 0
  • Complete Codex chat sync was added:

    • sync/history/2026-04-29-codex-complete-chat-sync.md
    • captures Ghost/blog updates, LinkedIn voice preferences, LPO/AI-fabric blog edits, Rest-Is-Not-Laziness scheduling replacement, and security notes.
    • confirms no secrets were written into sync.
    • confirms TIP crawler/robot planning remains TIPLLM-only.
    • confirms Erik remains controller/light erik-safe only, with heavy crawler work assigned to Proxmox/Pi workers.
  • Codex sync-start confirmation was added:

    • sync/history/2026-04-29-codex-sync-start-confirmation.md
    • confirms Codex read this TIP handoff, checked the sibling LLM Gateway handoff, and is treating sync/ as binding.
    • no code changes, crawler jobs, queue waves, PM2 restarts, or Erik load were initiated during this confirmation.
  • Codex follow-up on 2026-04-29 clarified the active BlogLLM model:

    • TIP shows fo-blog-v7, but this is not a normal Ollama GGUF manifest.
    • It is a local Adapter Bridge / Mac Studio model backed by the RunPod-trained PEFT adapter: /Users/renefichtmueller/Desktop/Claude Code/magatama/training-data/runpod/pod-runs/2026-04-25-fo-tip/final/adapters/fo_blogllm/final-adapter
    • Bridge definition: /Users/renefichtmueller/Desktop/Claude Code/magatama/scripts/ollama_adapter_bridge.py
    • TIP API default: packages/api/src/llm/client.ts uses OLLAMA_LLM_MODEL || "fo-blog-v7".
    • fo-blog-v8 remains the next training candidate, not the currently active TIP BlogLLM model.
  • Full Codex session handoff was added:

    • sync/history/2026-04-29-codex-full-session-handoff.md
    • covers TIP verification, product image/detail crawling, Blog Engine Hot Topics, TIPLLM robots, training pool, Erik status, and cross-repo sync.
  • Added a verification robot controller:

    • packages/scraper/src/robots/verification-robots.ts
    • command: npm run robots:verification -w packages/scraper -- --status
  • Added TIPLLM robot experience writing:

    • packages/scraper/src/crawler-llm/training-data-writer.ts
    • writes raw robot audit rows and SFT records.
  • Added Gitea training pool import to TIP learning-pool build:

    • scripts/tip-learning-pool-build.ts
    • imports TIP_TRAINING_REPO/qa-pairs/*.jsonl into the tip_llm lane.
  • Added docs:

    • docs/TIP_SELFLEARNING_WORKFLOW.md
  • Added package script:

    • packages/scraper/package.json
    • robots:verification

Gitea Training Pool

  • Existing local clone: /tmp/tip-training-data
  • Gitea repo: rene/tip-training-data
  • Latest pushed training commit:
    • f1c83f8 crawl: add robot-status training records [2026-04-29T20:11:24.091Z]
  • First robot experience record was written to:
    • /tmp/tip-training-data/qa-pairs/robot-control-high.jsonl
    • /tmp/tip-training-data/robot-experiences/2026-04-29.jsonl

MAGATAMA Training / Operations State

  • Relevant local repo:
    • /Users/renefichtmueller/Desktop/Claude Code/magatama
  • Latest confirmed live MAGATAMA findings state:
    • open findings: 0 on 2026-05-06
  • Latest confirmed live resolver state:
    • Codex and Copilot intentionally idle/disabled
    • not a runtime outage, but a settings choice until gateway/bridge auth is intentionally re-enabled
  • Latest confirmed live MAGATAMA training metric after dashboard fix:
    • newSinceLastTraining: 49
  • Meaning:
    • the old 0 was incorrect.
    • the currently visible trainable MAGATAMA corpus is based on verified and deduplicated examples only.
  • Latest corpus integrity state after cleanup:
    • operational Gitea-backed MAGATAMA training corpus is now much smaller but cleaner:
      • 1368 unique verified rows
      • 4 live failure/escalation rows in errors.jsonl
    • do not confuse raw historical volume with real trainable signal.
  • Important training integrity rule:
    • report-only or failed/escalated records must not be treated as verified training fixes.
    • keep them separated from the main verified training corpus.

Erik Status

  • Synced TIPLLM robot/training code to /opt/tip.
  • Did not start crawler jobs.
  • Did not enqueue robot waves.
  • Did not restart PM2 services.
  • Remote scraper TypeScript build is passing after removing two stale misplaced remote-only duplicate files:
    • /opt/tip/packages/scraper/src/scrapers/scheduler.ts
    • /opt/tip/packages/scraper/src/vendor-discovery-crawler.ts
  • tip-api and tip-scraper-daemon are online.
  • Shared Erik note from the same chat:
    • MAGATAMA dashboard/core were redeployed during compliance/training fixes.
    • TIP crawler policy remains unchanged: Erik is controller/light runner only, not heavy crawl execution host.

Last Live Verification Snapshot

From 2026-04-29:

  • Total transceivers: 13,546
  • Price verified: 7,250
  • Image verified: 7,025
  • Details verified: 6,243
  • Fully verified: 5,812
  • Last price observation: 2026-04-29 19:15:53 UTC
  • Last stock observation: 2026-04-29 19:15:56 UTC

Safe Next Steps

  1. Clone or pull Gitea origin on laptop/Claude Code.
  2. Read this folder first.
  3. For BlogLLM work, treat fo-blog-v7 as Adapter Bridge / PEFT adapter, not as a ~/.ollama GGUF model.
  4. Also read llm-gateway/sync/CURRENT.md when work touches shared Erik infrastructure, LLM routing, bridges, auth, TIPLLM, or crawler orchestration.
  5. For TIP robot/crawler planning, use TIPLLM only. Do not route this lane through external AI providers.
  6. When training pools or model stats look suspicious, prefer verified-only counts and check whether failed/escalated rows polluted the corpus.
  7. For MAGATAMA-adjacent work, keep writing learnings back into the Gitea-backed pool and avoid training on report-only pseudo-fixes.
  8. If testing robots, start with dry runs only:
npm run robots:verification -w packages/scraper -- --status
npm run robots:verification -w packages/scraper -- --tipllm-plan --limit=3
npm run robots:verification -w packages/scraper -- --enqueue=details-fast-lane --profile=erik-safe --dry-run
  1. Only dispatch real crawl work after deciding the target host:
    • Erik: erik-safe, tiny batches only.
    • Pi: pi-fetch.
    • Proxmox: proxmox-heavy.

Dirty Worktree Note

There are existing uncommitted changes outside sync/. Some are Codex work from this session, some appear pre-existing or from earlier Claude/Codex work. Do not blindly revert them. Review git status --short before committing broader changes.

Latest Sync Commits

  • 6c42ca7 docs: add shared agent sync handoff
  • 8e7c5aa docs: link llm-gateway sync handoff
  • Pending after this update:
    • watch whether any future guard exposure findings are genuine operational issues or new false positives.
    • if failures still appear inside fixes.jsonl, scrub historic pollution and backfill errors.jsonl.