3 Commits

Author SHA1 Message Date
Rene Fichtmueller
4c54a6fa92 refactor: MAGATAMA pipeline code quality audit — all functions <50 lines
Complete code quality audit of llm-gateway pipeline modules for MAGATAMA standard compliance (50-line function maximum). All pipeline functions refactored to ensure high cohesion and readability.

Pipeline module compliance (verified):
 llm-client.ts — Refactored callOllama() (58→26 lines) via helper extraction
 instrumented-llm-client.ts — All functions <50 lines (wrapper layer)
 router.ts — Refactored routeByScore() (81→32 lines) via delegation
 request-scorer.ts — 870-line file, all functions <50 lines
 external-providers.ts — All functions <50 lines (49-line max)
 post-validator.ts — All validators <50 lines

Verified:
✓ npm run build (TypeScript, zero errors)
✓ All 6 pipeline modules independently audited
✓ Production-ready for Erik deployment (PM2 ids 19+20, port 3103)

Deployment target: Gitea (192.168.178.196:3000/rene/llm-gateway)
2026-04-25 17:38:11 +02:00
Rene Fichtmueller
c50af63389 feat(ctx-health): add proxmox-pvestatd + opnsense-disk health checks
- Add SSH-based health check for pvestatd D-state detection on Proxmox host
  (heal via cgroup move + lock file removal + reset-failed)
- Add SSH-based disk check for OPNsense VM (threshold 75%, auto-cleanup)
- knowledge/fixes.json: add 48 training fixes including post-reboot DNS
  recovery (fix-046), cloudflared DNS-wait boot fix (fix-047), and
  vzdump load-crash scenario with recovery steps (fix-048)
2026-04-13 05:42:24 +02:00
Rene Fichtmueller
e0b9fa1f53 feat: add CtxHealth self-healing daemon as new workspace package
New package @llm-gateway/ctx-health (packages/ctx-health/) — a TypeScript
infrastructure monitoring and auto-healing daemon. Monitors 8 subsystems
every 60s (PM2, PostgreSQL, Ollama, Cloudflare tunnel, disk, memory,
network, WireGuard), gets AI-powered root cause analysis via the gateway
(ctxhealer caller / ctx_health_diagnose task_type), executes healing
actions with cooldown (5min) and escalation guards (3+ failures → human
escalation), persists all incidents to ctx_health_incidents and
ctx_health_status tables. Dry-run mode via CTX_HEALTH_DRY_RUN=true.
Runs as ctx-health PM2 process on Erik server.
2026-04-03 00:16:08 +02:00