42 Commits

Author SHA1 Message Date
Rene Fichtmueller
0191c60b64 chore: commit deployed gateway state (dashboard, streaming, routing, bridges, cost-tracking)
Live production state on Erik that had drifted from Gitea — deployed across several
sessions but never committed. Excludes deploy/ecosystem.config.cjs (holds live tokens).

- dashboard: passive usage-report endpoint, per-device entries, CEST timezone, cost-panel rounding
- completion: SSE + HTTP/2 streaming
- pipeline: routing-rules, request-scorer, external-providers (subscription bridges)
- cost-tracking: tokenvault migration, cost-calculator, request-logger
- infra: docker-compose bridge env, server/health/tls, deps
2026-06-05 20:23:33 +00:00
Rene Fichtmueller
91384dbb2a fix: SSE stream endpoint with proper HTTP/2 stream handling and heartbeat
- Fixed /api/stream/requests endpoint HTTP/2 INTERNAL_ERROR
- Use reply.raw.writeHead() instead of Fastify headers API for SSE
- Added 30s heartbeat to keep connection alive
- Proper event format with 'event:' and 'data:' fields
- Comprehensive error handling and cleanup on disconnect
- Mirrors working pattern from /api/stream/costs endpoint
- Resolves dashboard perpetual 'Loading...' state
2026-04-26 23:52:13 +02:00
Rene Fichtmueller
255bd90e7e fix: add missing jose dependency for JWT validation 2026-04-26 20:45:05 +02:00
Rene Fichtmueller
d614795545 fix: SQL errors in learning engine for best model selection 2026-04-26 20:42:40 +02:00
Rene Fichtmueller
9808184384 fix: remove non-existent @llm-gateway/types dependency
The @llm-gateway/types package was referenced in dependencies but never
defined or imported. Removing to fix npm installation failures.
2026-04-26 20:39:41 +02:00
Rene Fichtmueller
93bbb44bf0 fix: remove broken @shieldx/core dependency
The @shieldx/core dependency was referenced at an invalid file path and is
not actually used in the codebase (import is commented as TODO). Removing
this dependency resolves npm installation failures on deployment.
2026-04-26 20:36:21 +02:00
Rene Fichtmueller
1d4be52c83 fix: only send HSTS header on HTTPS connections, not HTTP
The learning process was failing to communicate with the gateway because:
1. Gateway was sending 'Strict-Transport-Security' header on HTTP responses
2. Node.js fetch respects HSTS and upgrades subsequent requests to HTTPS
3. Gateway only has HTTP listener (localhost:3103), no HTTPS
4. Result: SSL 'packet length too long' error on second request attempt

Solution: Modified registerHSTSMiddleware to only send HSTS header when
the connection is already secure (HTTPS or x-forwarded-proto: https).
HTTP connections will not get the HSTS header, preventing the forced upgrade.
2026-04-26 19:01:41 +02:00
Rene Fichtmueller
ff090de82b fix: Update request logging to use request_tracking table instead of dashboard_request_log 2026-04-26 00:42:58 +02:00
Rene Fichtmueller
4c54a6fa92 refactor: MAGATAMA pipeline code quality audit — all functions <50 lines
Complete code quality audit of llm-gateway pipeline modules for MAGATAMA standard compliance (50-line function maximum). All pipeline functions refactored to ensure high cohesion and readability.

Pipeline module compliance (verified):
 llm-client.ts — Refactored callOllama() (58→26 lines) via helper extraction
 instrumented-llm-client.ts — All functions <50 lines (wrapper layer)
 router.ts — Refactored routeByScore() (81→32 lines) via delegation
 request-scorer.ts — 870-line file, all functions <50 lines
 external-providers.ts — All functions <50 lines (49-line max)
 post-validator.ts — All validators <50 lines

Verified:
✓ npm run build (TypeScript, zero errors)
✓ All 6 pipeline modules independently audited
✓ Production-ready for Erik deployment (PM2 ids 19+20, port 3103)

Deployment target: Gitea (192.168.178.196:3000/rene/llm-gateway)
2026-04-25 17:38:11 +02:00
Rene Fichtmueller
128e18b751 feat: integrate GitHub Copilot as third LLM provider via copilot-bridge
Add GitHub Copilot API proxy integration to LLM Gateway:

* Implement copilot-bridge service:
  - HTTP wrapper managing copilot-api (GitHub Copilot API proxy)
  - OpenAI-compatible /v1/chat/completions endpoint (port 3252)
  - Graceful startup and SIGTERM shutdown handling
  - Health check endpoint with service diagnostics

* Register copilot-bridge in provider fallback chain:
  - Position: After OpenAI, before free LLM APIs (tier 4)
  - Rate limit: 60 requests/min (GitHub Copilot API limit)
  - Models: gpt-4 (reasoning), gpt-3.5-turbo (medium)
  - Authentication: GitHub Copilot subscription (internal to copilot-api)

* Update PM2 ecosystem configuration:
  - Add copilot-bridge service definition (port 3252)
  - Configure COPILOT_BRIDGE_URL in gateway environment
  - Add copilot to LLM_PROVIDERS list

* Enhance deployment automation:
  - Update ensure-bridges.sh with copilot-bridge deployment
  - Copy service files from repo to /opt/copilot-bridge
  - Run npm install for copilot-api dependency

* Comprehensive documentation:
  - Expand DEPLOYMENT-BRIDGES.md with copilot-bridge section
  - Prerequisites: Node.js 20+, GitHub Copilot subscription
  - Authentication workflow: npm run auth with GitHub OAuth
  - Troubleshooting: subscription verification, auth cache reset

Provider chain now supports:
1. Ollama (local, free)
2. claude-bridge (Claude subscription)
3. openai-bridge (OpenAI subscription)
4. copilot-bridge (GitHub Copilot subscription) ← NEW
5. Free APIs: Cerebras, Groq, Mistral, NVIDIA, Cloudflare

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-04-25 12:38:30 +02:00
Rene Fichtmueller
7599f33866 feat: integrate OpenAI Codex and ChatGPT as primary LLM providers via subscription
- Add openai-bridge service (port 3251) for ChatGPT and Codex integration
- Update external-providers.ts with openai and chatgpt provider definitions
- Add GPT-4 Turbo, GPT-4, and GPT-3.5 Turbo models to provider registry
- Modify getApiKey() to handle bridge provider authentication
- Modify getBaseUrl() to construct URLs from env vars
- Update ecosystem.config.cjs with OPENAI_BRIDGE_URL and OPENAI_API_KEY config
- Add openai-bridge PM2 service configuration (port 3251)
- Support both claude-bridge (port 3250) and openai-bridge (port 3251) as subscription services
- Extend fallback chain: claude → openai/chatgpt → cerebras → groq → mistral → nvidia → cloudflare

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-04-25 12:29:55 +02:00
Rene Fichtmueller
b34b835b47 feat: integrate claude-bridge as primary LLM provider with fallback chain
- Add claude-bridge provider to external-providers.ts with Claude models (opus, sonnet, haiku)
- Modify getApiKey to handle claude-bridge authentication (CLAUDE_BRIDGE_ENABLED flag)
- Update getBaseUrl to construct URL from CLAUDE_BRIDGE_URL environment variable
- Remove Authorization header for claude-bridge (uses subscription-based auth)
- claude-bridge now first in fallback chain: Claude → Cerebras → Groq → Mistral → NVIDIA → Cloudflare
2026-04-25 12:18:33 +02:00
Rene Fichtmueller
f5e2357f20 docs: Add Phase 2 delivery summary and getting started guides
- PHASE_2_DELIVERY.md: Complete delivery summary with all components
- GETTING_STARTED.md: Quick start guide (40 min end-to-end)
- scripts/verify_local_setup.sh: Local environment verification
2026-04-25 05:48:33 +02:00
Rene Fichtmueller
a04c1d67f2 feat: Complete LightRAG Sidecar Phase 2 — Hybrid Retrieval Implementation
Delivers production-ready knowledge graph sidecar with hybrid BM25+vector search.

COMPONENTS:
- RetrievalService: Hybrid BM25 + Qdrant vector search with RRF fusion (k=60, 0.4/0.6 weights)
- IngestionService: Document pipeline with Ollama entity extraction, entity linking, bge-m3 embeddings
- EvaluationService: Precision@K, Recall@K, MRR@K, NDCG@K metrics with FTS baseline comparison
- Database schema: Entity, Relation, Document, QueryLog, EvaluationResult ORM models
- API routes: /api/kg/query, /api/kg/ingest, /api/kg/eval, /api/kg/health

INFRASTRUCTURE:
- FastAPI 0.104 async server on port 3140
- PostgreSQL 17 + pgvector for knowledge graph storage
- Qdrant 2.7 vector database with COSINE distance (384-dim bge-m3)
- Ollama qwen2.5:14b for entity extraction via JSON-structured prompts
- PM2 ecosystem configuration for Erik production deployment

TESTING & DEPLOYMENT:
- TESTING.md: 5-phase local testing workflow with examples
- DEPLOYMENT_CHECKLIST.md: Step-by-step Erik deployment guide
- eval-transceiver-50qa.json: 50 Q&A evaluation pairs for transceiver domain
- populate_eval_set.py: Interactive script to populate ground truth document IDs
- READINESS_CHECKLIST.md: Pre-deployment verification checklist
- bootstrap_tip_data.py: Load TIP blog documents via API

PERFORMANCE TARGETS:
 Query latency p95: <500ms
 Recall@10: ≥85% (vs 72% FTS baseline)
 Entity extraction accuracy: ≥90%
 Ingestion throughput: ≥100 docs/sec
 Memory usage: <1GB

Ready for Phase 3: E2E testing, TypeScript client, multi-domain support.
2026-04-25 05:47:18 +02:00
Rene Fichtmueller
282403d34b feat: Implement Phase 2G.4 — Learning system integration & per-agent metrics
Per-agent request logging, feedback processing, and confidence scoring.

- Per-agent metric collection: request_id, model, latency_ms, tokens_in/out, confidence, fallback_used, success
- Agent feedback loop: outcome tracking (success/fallback/timeout/error/user_rejected)
- Confidence scoring: 50% success + 25% quality + 25% satisfaction (per-agent independent of global)
- Cost attribution: Monthly cost report per agent (tokens × model rate)
- SLO monitoring: p50/p95/p99 latencies vs per-agent targets
- Anomaly detection: σ-based latency spikes, success rate drops, confidence degradation
- Full TypeScript types, database schema initialization, comprehensive documentation
2026-04-19 22:22:17 +02:00
Rene Fichtmueller
1d327720d5 feat: Implement Phase 2G.3 — ChatGPT/OpenAI API compatibility adapter
HTTP server providing OpenAI API compatibility for LLM Gateway.

- OpenAI client SDK drop-in replacement (baseURL only change)
- POST /v1/chat/completions endpoint with streaming support
- GET /v1/models for client library discovery
- Automatic model mapping: gpt-4 → qwen2.5:32b, etc.
- Server-Sent Events (SSE) streaming implementation
- Full TypeScript types and comprehensive test suite
- Graceful shutdown handling (SIGTERM/SIGINT)
- Health check endpoint with gateway status
- Performance: Same as gateway (100-500ms with fallback to Ollama)
2026-04-19 22:05:20 +02:00
Rene Fichtmueller
63171645da feat: Implement Phase 2G.2 — Codex/Copilot LSP adapter
Language Server Protocol bridge for GitHub Copilot and Copilot-compatible editors.

- Implements LSP transport layer (vscode-languageserver)
- Completion with trigger characters: '.', ' ', '('
- Hover documentation with model/confidence metadata
- Code action placeholders for explain/refactor/test/fix
- Automatic fallback to local Ollama (192.168.178.213:11434)
- Full TypeScript types and test coverage
- CLI entry point: codex-lsp (stdio transport)
- Performance: Gateway 100-500ms, Ollama 200-2000ms
2026-04-19 22:04:15 +02:00
Rene Fichtmueller
b943bb1d59 feat: Implement Phase 2G.1 — Claude Code IDE bridge
- Create @llm-gateway/claude-code-bridge package
- Support explain, refactor, test, document, fix commands
- Automatic fallback to local Ollama when gateway unavailable
- Health monitoring and confidence tracking
- Comprehensive test suite covering all completion methods
- Follows ADR-0005 agent integration protocol
2026-04-19 22:02:06 +02:00
Rene Fichtmueller
2ca77d0aee feat: Phase 2F — Multi-Agent Integration (ADRs + Client Fallback + Tests)
- ADR-0001: Multi-Agent Coworking Architecture with LLM Gateway Orchestrator
- ADR-0002: Tier Assignment Strategy for Model Selection (cost-first escalation)
- ADR-0003: Confidence Gate Thresholds & Learning Cycle Intervals (6h/12h/24h cycles)
- ADR-0004: External Provider Fallback Chain Ordering (Cerebras → Groq → Mistral)
- Enhanced client SDK: Offline Ollama fallback, health checks, exponential backoff retry
- Integration tests: claude-code-integration.test.ts (14 test cases)
- PHASE_2F_DEPLOYMENT.md: Pre-deployment checklist, automated deploy, rollback plan
- Post-deployment verification procedures for health, client fallback, metrics
2026-04-19 21:39:44 +02:00
Rene Fichtmueller
c3ab87b167 feat: add fo-blog-v8 training pipeline (Qwen2.5-14B, SFT+DPO)
Full v8 training pipeline for the optical networking blog model:
- train_blog_v8.py: SFT (LoRA r=64, 5 epochs) + DPO (2 epochs) on Qwen2.5-14B-Instruct
  Fixed for trl 1.2.x: SFTConfig instead of TrainingArguments, processing_class= instead
  of tokenizer=, eval_strategy= instead of deprecated evaluation_strategy=
- consolidate_v8_dataset.py: weighted merge of all data sources (820 effective SFT / 235 DPO)
- crawl_v8_sources.py: APNIC/RIPE Labs/potaroo/Cloudflare crawler with balanced div extraction
- process_v6_blogs.py: converts 101 real v6 TIP blog outputs into SFT + DPO pairs
- label_v7_quality.py: Claude-judged quality labels → v8 quality DPO pairs
- parse_real_posts.py: parses blog.fichtmueller.org Ghost CMS HTML → gold SFT records
- run_v8_pipeline.sh: autopilot (consolidate → SFT → DPO → GGUF → Ollama)
- blog-v8-training.yaml: training config reference

Dataset breakdown: 19 real posts ×3 + 196 v7-gen + 28 v6blogs ×2 + 135 external ×1.5
2026-04-19 11:44:09 +02:00
Rene Fichtmueller
2fb0992c71 feat: add MAGATAMA まがたま security intelligence model to LLM Gateway
- Add magatama:32b to models.yaml (large tier, 131k context, security strengths)
- Add 6 MAGATAMA routing rules: threat_analysis, ciso_report, compliance_gap,
  incident_response, bgp_security, vuln_triage
- Add 6 MAGATAMA prompt templates with full TEPPEKI doctrine:
  MITRE ATT&CK, Kill Chain, CIA Triad, NIS2, ISO 27001, CVSS v3.1
- Fine-tuned on Qwen2.5-32B-Instruct with 22831 MAGATAMA security samples
  LoRA adapter: r=8, alpha=16
2026-04-16 14:31:17 +02:00
Rene Fichtmueller
c50af63389 feat(ctx-health): add proxmox-pvestatd + opnsense-disk health checks
- Add SSH-based health check for pvestatd D-state detection on Proxmox host
  (heal via cgroup move + lock file removal + reset-failed)
- Add SSH-based disk check for OPNsense VM (threshold 75%, auto-cleanup)
- knowledge/fixes.json: add 48 training fixes including post-reboot DNS
  recovery (fix-046), cloudflared DNS-wait boot fix (fix-047), and
  vzdump load-crash scenario with recovery steps (fix-048)
2026-04-13 05:42:24 +02:00
Rene Fichtmueller
b4593b6582 feat: integrate real @shieldx/core library into gateway pipeline
Replace recursive HTTP-based ShieldX scan with direct library integration.
- 547+ rules, 50+ languages, sub-millisecond scans
- Enables: rules, entropy, indirect injection, behavioral, unicode,
  tokenizer, compressed payload detection
- Disables Ollama-dependent scanners for zero external dependency
- Response now includes threat_level, kill_chain_phase, shieldx_latency_ms
2026-04-07 09:03:02 +02:00
Rene Fichtmueller
8123343361 feat: add Flexoptix blog pipeline templates (tip_blog_angle + tip_blog_draft)
Two-stage LLM pipeline for Flexoptix-style blog generation:
- tip_blog_angle: identifies the real production situation as article angle (JSON)
- tip_blog_draft: writes continuous prose article — no headers, no bullets,
  no AI filler. Gold article included as few-shot reference. v2.0.0 with
  absolute format rules, DR4 wavelength correction (1310nm), fiber scope
  vs OPM distinction, no invented firmware versions.
2026-04-03 01:03:09 +02:00
49c673b683 feat: add ctx_morning_briefing routing rule and prompt template
Adds LLM Gateway support for CtxReport daily intelligence reports.
Uses qwen2.5:32b to generate German-language morning briefings from
infrastructure metrics collected overnight.
2026-04-03 00:44:04 +02:00
Rene Fichtmueller
e0b9fa1f53 feat: add CtxHealth self-healing daemon as new workspace package
New package @llm-gateway/ctx-health (packages/ctx-health/) — a TypeScript
infrastructure monitoring and auto-healing daemon. Monitors 8 subsystems
every 60s (PM2, PostgreSQL, Ollama, Cloudflare tunnel, disk, memory,
network, WireGuard), gets AI-powered root cause analysis via the gateway
(ctxhealer caller / ctx_health_diagnose task_type), executes healing
actions with cooldown (5min) and escalation guards (3+ failures → human
escalation), persists all incidents to ctx_health_incidents and
ctx_health_status tables. Dry-run mode via CTX_HEALTH_DRY_RUN=true.
Runs as ctx-health PM2 process on Erik server.
2026-04-03 00:16:08 +02:00
Rene Fichtmueller
a8a77e689c feat: add CtxHealth + CtxSecurity to gateway — ctxhealer:latest model, 5 routing rules, 2 templates 2026-04-03 00:14:23 +02:00
Rene Fichtmueller
9b4d1caa8a fix: routing-optimizer uses status='approved' not non-existent validation_passed column 2026-04-03 00:01:19 +02:00
Rene Fichtmueller
52697bc6fc fix: replace hardcoded Mac paths with relative paths in learning engine (routing-optimizer, prompt-optimizer, few-shot-curator) 2026-04-02 23:58:53 +02:00
Rene Fichtmueller
719336bded fix: map input as fallback for all 20+ template content variables (ocr_text, alert_data, bgp_data, etc.) 2026-04-02 23:41:36 +02:00
Rene Fichtmueller
f1c1d107ca fix: map input to source_data fallback and spread context vars into template variables 2026-04-02 23:38:22 +02:00
Rene Fichtmueller
3bb9923255 fix: fine-tuner uses FT_DB_URL/FT_GATEWAY_URL/FT_OLLAMA_URL env vars, not DATABASE_URL 2026-04-02 23:35:27 +02:00
Rene Fichtmueller
d8deecdb32 feat: SSH tunnel launch script for fine-tuner (IONOS blocks port 5432 externally) 2026-04-02 23:28:30 +02:00
Rene Fichtmueller
499e600239 fix: fine-tuner config points to Erik DB + CF tunnel URLs
- database_url: Erik PostgreSQL (217.154.82.179:5432) with correct password
- gateway_url: https://llm-gateway.context-x.org (public CF tunnel)
- ollama_url: localhost:11434 (local Mac Studio, fine-tuner runs locally)
2026-04-02 23:23:17 +02:00
Rene Fichtmueller
0803fdb722 feat: add confidence_scorer prompt template (internal self-evaluation) 2026-04-02 23:20:31 +02:00
Rene Fichtmueller
b68d5c3fbf fix: client CompletionResponse matches actual gateway response fields
- Match field names: id, status, confidence, model, task_type, latency_ms, tokens, output
- Default URL now https://llm-gateway.context-x.org (public endpoint)
- ShieldX client uses 'shieldx' caller (not 'internal')
- tokens.in/tokens.out instead of token_count.input/output
2026-04-02 23:17:14 +02:00
Rene Fichtmueller
ac33476666 feat: add 55 prompt templates + ShieldX/LinkedIn routing rules + ban lists in Gitea
Templates (55 total, exceeds 49 target):
- TIP: transceiver_enrich, datasheet_extract, compatibility_parse, blog_generator,
  faq_answer, hype_cycle_narrative, price_anomaly, vendor_classify, product_description
- EO Global Pulse: business_card_ocr, voice_to_crm, event_prep_brief, attendee_enrich,
  meeting_suggest, lead_qualify, debrief_generate, ticket_summarize
- SwitchBlade: root_cause, alert_narrative, cve_remediation, csrd_narrative,
  transceiver_advisor, bandwidth_report, ticket_draft, firmware_assess, topology_explain
- PeerCortex: as_narrative, health_summary, rpki_explain, anomaly_hypothesis,
  peer_recommendation, incident_brief
- NOGnet: cfp_evaluate, cfp_feedback, topic_gap_analysis, meeting_match, speaker_enrich,
  sponsor_pitch, event_debrief, agenda_summary, session_intro
- ShieldX: threat_classify, pattern_describe, healing_recommend, compliance_report, false_positive
- Content: linkedin_post_de, linkedin_post_en, newsletter_dispatch_de, email_draft_de
- Internal: ban_detect, prompt_improve
- Routing rules: +55 entries for all template-based task types
- Ban lists: en.csv, de.csv, auto.csv created in Gitea (llm-banlists repo)
2026-04-02 23:14:30 +02:00
Rene Fichtmueller
c82b187548 feat: fix template resolution + add 40 routing rules for all project task types
- completion.ts now uses taskType directly for resolvePrompt (not decision.prompt_template)
  so tip_transceiver_enrich.yaml is used instead of generic_qa fallback template
- routing-rules.yaml: +40 task type entries for TIP (8), EO Pulse (8), SwitchBlade (9),
  PeerCortex (6), NOGnet (9), internal (2) — all with correct model tier assignments
- qwen2.5:3b for fast tasks (classify, short outputs)
- qwen2.5:14b for medium (most analysis tasks)
- qwen2.5:32b for large (blog posts, detailed reports, CSRD)
2026-04-02 23:11:21 +02:00
Rene Fichtmueller
2c5f7f6ebe fix: OLLAMA_URL env var takes precedence over hardcoded models.yaml URL
Gateway was reading ollama_base_url from YAML (192.168.178.169) instead of
OLLAMA_URL env var (https://ollama.fichtmueller.org). Fix getOllamaBaseUrl()
to prefer process.env['OLLAMA_URL'] and update YAML default to CF tunnel.
2026-04-02 23:05:13 +02:00
Rene Fichtmueller
773fd368e0 fix: parse DATABASE_URL in pool clients + extend Ollama health timeout to 15s
Gateway and learning DB clients now prefer DATABASE_URL connection string
over individual DB_* env vars — matches ecosystem.config.cjs convention.
Ollama health check timeout increased 5→15s for Cloudflare tunnel latency.
2026-04-02 23:03:31 +02:00
Rene Fichtmueller
4c5003f9fc feat: fix OLLAMA_URL to use Cloudflare tunnel + add 35 prompt templates
- Update OLLAMA_URL from 192.168.178.169 to https://ollama.fichtmueller.org
- Fix port from 3100 to 3103 (3100 was taken by Docker proxy on Erik)
- Fix DATABASE_URL password to llm_secure_2026
- Add GITEA_URL env var for ban list sync
- Add 35 prompt templates: TIP (10), EO Global Pulse (8), SwitchBlade (9),
  PeerCortex (3), internal (3), ShieldX (1), general (1)
2026-04-02 23:00:37 +02:00
Rene Fichtmueller
3a00ff4d33 feat: initial llm-gateway implementation
- Complete Fastify gateway with 8-stage pipeline
- Circuit breaker (opossum) per model tier
- Rate limiting per caller
- Ban list validation (EN/DE/auto-detected)
- TIP validator (SFF-8024, part numbers, wavelengths)
- Prometheus metrics
- pg-boss async queue
- PostgreSQL audit log + review queue
- 9 prompt templates (TIP, LinkedIn, ShieldX)
- Learning engine scaffolding
- Auto-learning: ban-list, few-shot, routing, prompt optimizer
2026-04-02 22:48:55 +02:00