2 Commits

Author SHA1 Message Date
Rene Fichtmueller
c50af63389 feat(ctx-health): add proxmox-pvestatd + opnsense-disk health checks
- Add SSH-based health check for pvestatd D-state detection on Proxmox host
  (heal via cgroup move + lock file removal + reset-failed)
- Add SSH-based disk check for OPNsense VM (threshold 75%, auto-cleanup)
- knowledge/fixes.json: add 48 training fixes including post-reboot DNS
  recovery (fix-046), cloudflared DNS-wait boot fix (fix-047), and
  vzdump load-crash scenario with recovery steps (fix-048)
2026-04-13 05:42:24 +02:00
Rene Fichtmueller
e0b9fa1f53 feat: add CtxHealth self-healing daemon as new workspace package
New package @llm-gateway/ctx-health (packages/ctx-health/) — a TypeScript
infrastructure monitoring and auto-healing daemon. Monitors 8 subsystems
every 60s (PM2, PostgreSQL, Ollama, Cloudflare tunnel, disk, memory,
network, WireGuard), gets AI-powered root cause analysis via the gateway
(ctxhealer caller / ctx_health_diagnose task_type), executes healing
actions with cooldown (5min) and escalation guards (3+ failures → human
escalation), persists all incidents to ctx_health_incidents and
ctx_health_status tables. Dry-run mode via CTX_HEALTH_DRY_RUN=true.
Runs as ctx-health PM2 process on Erik server.
2026-04-03 00:16:08 +02:00