llm-gateway

Author	SHA1	Message	Date
Rene Fichtmueller	afe3597311	feat: add automated bridge deployment script and comprehensive deployment guide - ensure-bridges.sh: Idempotent startup script that deploys openai-bridge if not present - DEPLOYMENT-BRIDGES.md: Complete deployment guide with setup, configuration, verification, and troubleshooting steps - Enables autonomous deployment of ChatGPT/Codex bridge service on Erik - Supports both automatic and manual setup workflows	2026-04-25 12:32:31 +02:00
Rene Fichtmueller	e128d39818	chore: add openai-bridge deployment script for Erik	2026-04-25 12:31:11 +02:00
Rene Fichtmueller	7599f33866	feat: integrate OpenAI Codex and ChatGPT as primary LLM providers via subscription - Add openai-bridge service (port 3251) for ChatGPT and Codex integration - Update external-providers.ts with openai and chatgpt provider definitions - Add GPT-4 Turbo, GPT-4, and GPT-3.5 Turbo models to provider registry - Modify getApiKey() to handle bridge provider authentication - Modify getBaseUrl() to construct URLs from env vars - Update ecosystem.config.cjs with OPENAI_BRIDGE_URL and OPENAI_API_KEY config - Add openai-bridge PM2 service configuration (port 3251) - Support both claude-bridge (port 3250) and openai-bridge (port 3251) as subscription services - Extend fallback chain: claude → openai/chatgpt → cerebras → groq → mistral → nvidia → cloudflare Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>	2026-04-25 12:29:55 +02:00
Rene Fichtmueller	590d3797c9	chore: update ecosystem.config.cjs with claude-bridge and fixed ollama URL - CLAUDE_BRIDGE_URL: http://localhost:3250 - CLAUDE_BRIDGE_ENABLED: true - OLLAMA_URL: http://192.168.178.213:11434 (direct IP instead of HTTPS tunnel) - LLM_PROVIDERS: claude,cerebras,groq,mistral,nvidia - Add free LLM API keys (empty, to be filled with actual keys)	2026-04-25 12:19:09 +02:00
Rene Fichtmueller	b34b835b47	feat: integrate claude-bridge as primary LLM provider with fallback chain - Add claude-bridge provider to external-providers.ts with Claude models (opus, sonnet, haiku) - Modify getApiKey to handle claude-bridge authentication (CLAUDE_BRIDGE_ENABLED flag) - Update getBaseUrl to construct URL from CLAUDE_BRIDGE_URL environment variable - Remove Authorization header for claude-bridge (uses subscription-based auth) - claude-bridge now first in fallback chain: Claude → Cerebras → Groq → Mistral → NVIDIA → Cloudflare	2026-04-25 12:18:33 +02:00
Rene Fichtmueller	f5e2357f20	docs: Add Phase 2 delivery summary and getting started guides - PHASE_2_DELIVERY.md: Complete delivery summary with all components - GETTING_STARTED.md: Quick start guide (40 min end-to-end) - scripts/verify_local_setup.sh: Local environment verification	2026-04-25 05:48:33 +02:00
Rene Fichtmueller	a04c1d67f2	feat: Complete LightRAG Sidecar Phase 2 — Hybrid Retrieval Implementation Delivers production-ready knowledge graph sidecar with hybrid BM25+vector search. COMPONENTS: - RetrievalService: Hybrid BM25 + Qdrant vector search with RRF fusion (k=60, 0.4/0.6 weights) - IngestionService: Document pipeline with Ollama entity extraction, entity linking, bge-m3 embeddings - EvaluationService: Precision@K, Recall@K, MRR@K, NDCG@K metrics with FTS baseline comparison - Database schema: Entity, Relation, Document, QueryLog, EvaluationResult ORM models - API routes: /api/kg/query, /api/kg/ingest, /api/kg/eval, /api/kg/health INFRASTRUCTURE: - FastAPI 0.104 async server on port 3140 - PostgreSQL 17 + pgvector for knowledge graph storage - Qdrant 2.7 vector database with COSINE distance (384-dim bge-m3) - Ollama qwen2.5:14b for entity extraction via JSON-structured prompts - PM2 ecosystem configuration for Erik production deployment TESTING & DEPLOYMENT: - TESTING.md: 5-phase local testing workflow with examples - DEPLOYMENT_CHECKLIST.md: Step-by-step Erik deployment guide - eval-transceiver-50qa.json: 50 Q&A evaluation pairs for transceiver domain - populate_eval_set.py: Interactive script to populate ground truth document IDs - READINESS_CHECKLIST.md: Pre-deployment verification checklist - bootstrap_tip_data.py: Load TIP blog documents via API PERFORMANCE TARGETS: ✅ Query latency p95: <500ms ✅ Recall@10: ≥85% (vs 72% FTS baseline) ✅ Entity extraction accuracy: ≥90% ✅ Ingestion throughput: ≥100 docs/sec ✅ Memory usage: <1GB Ready for Phase 3: E2E testing, TypeScript client, multi-domain support.	2026-04-25 05:47:18 +02:00
Rene Fichtmueller	282403d34b	feat: Implement Phase 2G.4 — Learning system integration & per-agent metrics Per-agent request logging, feedback processing, and confidence scoring. - Per-agent metric collection: request_id, model, latency_ms, tokens_in/out, confidence, fallback_used, success - Agent feedback loop: outcome tracking (success/fallback/timeout/error/user_rejected) - Confidence scoring: 50% success + 25% quality + 25% satisfaction (per-agent independent of global) - Cost attribution: Monthly cost report per agent (tokens × model rate) - SLO monitoring: p50/p95/p99 latencies vs per-agent targets - Anomaly detection: σ-based latency spikes, success rate drops, confidence degradation - Full TypeScript types, database schema initialization, comprehensive documentation	2026-04-19 22:22:17 +02:00
Rene Fichtmueller	1d327720d5	feat: Implement Phase 2G.3 — ChatGPT/OpenAI API compatibility adapter HTTP server providing OpenAI API compatibility for LLM Gateway. - OpenAI client SDK drop-in replacement (baseURL only change) - POST /v1/chat/completions endpoint with streaming support - GET /v1/models for client library discovery - Automatic model mapping: gpt-4 → qwen2.5:32b, etc. - Server-Sent Events (SSE) streaming implementation - Full TypeScript types and comprehensive test suite - Graceful shutdown handling (SIGTERM/SIGINT) - Health check endpoint with gateway status - Performance: Same as gateway (100-500ms with fallback to Ollama)	2026-04-19 22:05:20 +02:00
Rene Fichtmueller	63171645da	feat: Implement Phase 2G.2 — Codex/Copilot LSP adapter Language Server Protocol bridge for GitHub Copilot and Copilot-compatible editors. - Implements LSP transport layer (vscode-languageserver) - Completion with trigger characters: '.', ' ', '(' - Hover documentation with model/confidence metadata - Code action placeholders for explain/refactor/test/fix - Automatic fallback to local Ollama (192.168.178.213:11434) - Full TypeScript types and test coverage - CLI entry point: codex-lsp (stdio transport) - Performance: Gateway 100-500ms, Ollama 200-2000ms	2026-04-19 22:04:15 +02:00
Rene Fichtmueller	b943bb1d59	feat: Implement Phase 2G.1 — Claude Code IDE bridge - Create @llm-gateway/claude-code-bridge package - Support explain, refactor, test, document, fix commands - Automatic fallback to local Ollama when gateway unavailable - Health monitoring and confidence tracking - Comprehensive test suite covering all completion methods - Follows ADR-0005 agent integration protocol	2026-04-19 22:02:06 +02:00
Rene Fichtmueller	4d7e251322	feat: Add ADR-0005 for Phase 2G agent integration protocol - Define three-layer integration stack (transport, adapters, protocol) - JSON-RPC 2.0 over HTTP for unified agent communication - Support for Claude Code, Codex/Copilot, ChatGPT, Ollama fallback - Establishes foundation for Phase 2G multi-agent integration - Decision on authentication, rate limiting, streaming TBD in implementation	2026-04-19 22:01:17 +02:00
Rene Fichtmueller	8e83e5fa6e	chore: Document Phase 2F deployment blocker — Erik unreachable (network issue)	2026-04-19 21:41:12 +02:00
Rene Fichtmueller	2ca77d0aee	feat: Phase 2F — Multi-Agent Integration (ADRs + Client Fallback + Tests) - ADR-0001: Multi-Agent Coworking Architecture with LLM Gateway Orchestrator - ADR-0002: Tier Assignment Strategy for Model Selection (cost-first escalation) - ADR-0003: Confidence Gate Thresholds & Learning Cycle Intervals (6h/12h/24h cycles) - ADR-0004: External Provider Fallback Chain Ordering (Cerebras → Groq → Mistral) - Enhanced client SDK: Offline Ollama fallback, health checks, exponential backoff retry - Integration tests: claude-code-integration.test.ts (14 test cases) - PHASE_2F_DEPLOYMENT.md: Pre-deployment checklist, automated deploy, rollback plan - Post-deployment verification procedures for health, client fallback, metrics	2026-04-19 21:39:44 +02:00
Rene Fichtmueller	c3ab87b167	feat: add fo-blog-v8 training pipeline (Qwen2.5-14B, SFT+DPO) Full v8 training pipeline for the optical networking blog model: - train_blog_v8.py: SFT (LoRA r=64, 5 epochs) + DPO (2 epochs) on Qwen2.5-14B-Instruct Fixed for trl 1.2.x: SFTConfig instead of TrainingArguments, processing_class= instead of tokenizer=, eval_strategy= instead of deprecated evaluation_strategy= - consolidate_v8_dataset.py: weighted merge of all data sources (820 effective SFT / 235 DPO) - crawl_v8_sources.py: APNIC/RIPE Labs/potaroo/Cloudflare crawler with balanced div extraction - process_v6_blogs.py: converts 101 real v6 TIP blog outputs into SFT + DPO pairs - label_v7_quality.py: Claude-judged quality labels → v8 quality DPO pairs - parse_real_posts.py: parses blog.fichtmueller.org Ghost CMS HTML → gold SFT records - run_v8_pipeline.sh: autopilot (consolidate → SFT → DPO → GGUF → Ollama) - blog-v8-training.yaml: training config reference Dataset breakdown: 19 real posts ×3 + 196 v7-gen + 28 v6blogs ×2 + 135 external ×1.5	2026-04-19 11:44:09 +02:00
Rene Fichtmueller	79d434434f	chore: MAGATAMA deployment state — download in progress, Pi-hole bypassed	2026-04-16 16:35:54 +02:00
Rene Fichtmueller	2fb0992c71	feat: add MAGATAMA まがたま security intelligence model to LLM Gateway - Add magatama:32b to models.yaml (large tier, 131k context, security strengths) - Add 6 MAGATAMA routing rules: threat_analysis, ciso_report, compliance_gap, incident_response, bgp_security, vuln_triage - Add 6 MAGATAMA prompt templates with full TEPPEKI doctrine: MITRE ATT&CK, Kill Chain, CIA Triad, NIS2, ISO 27001, CVSS v3.1 - Fine-tuned on Qwen2.5-32B-Instruct with 22831 MAGATAMA security samples LoRA adapter: r=8, alpha=16	2026-04-16 14:31:17 +02:00
Rene Fichtmueller	132cf835b0	fix: add fixes 049-050 (OPNsense WireGuard FritzBox NAT + VLAN WAN)	2026-04-13 08:20:06 +02:00
Rene Fichtmueller	c50af63389	feat(ctx-health): add proxmox-pvestatd + opnsense-disk health checks - Add SSH-based health check for pvestatd D-state detection on Proxmox host (heal via cgroup move + lock file removal + reset-failed) - Add SSH-based disk check for OPNsense VM (threshold 75%, auto-cleanup) - knowledge/fixes.json: add 48 training fixes including post-reboot DNS recovery (fix-046), cloudflared DNS-wait boot fix (fix-047), and vzdump load-crash scenario with recovery steps (fix-048)	2026-04-13 05:42:24 +02:00
Rene Fichtmueller	b4593b6582	feat: integrate real @shieldx/core library into gateway pipeline Replace recursive HTTP-based ShieldX scan with direct library integration. - 547+ rules, 50+ languages, sub-millisecond scans - Enables: rules, entropy, indirect injection, behavioral, unicode, tokenizer, compressed payload detection - Disables Ollama-dependent scanners for zero external dependency - Response now includes threat_level, kill_chain_phase, shieldx_latency_ms	2026-04-07 09:03:02 +02:00
Rene Fichtmueller	8123343361	feat: add Flexoptix blog pipeline templates (tip_blog_angle + tip_blog_draft) Two-stage LLM pipeline for Flexoptix-style blog generation: - tip_blog_angle: identifies the real production situation as article angle (JSON) - tip_blog_draft: writes continuous prose article — no headers, no bullets, no AI filler. Gold article included as few-shot reference. v2.0.0 with absolute format rules, DR4 wavelength correction (1310nm), fiber scope vs OPM distinction, no invented firmware versions.	2026-04-03 01:03:09 +02:00
rene	49c673b683	feat: add ctx_morning_briefing routing rule and prompt template Adds LLM Gateway support for CtxReport daily intelligence reports. Uses qwen2.5:32b to generate German-language morning briefings from infrastructure metrics collected overnight.	2026-04-03 00:44:04 +02:00
Rene Fichtmueller	e0b9fa1f53	feat: add CtxHealth self-healing daemon as new workspace package New package @llm-gateway/ctx-health (packages/ctx-health/) — a TypeScript infrastructure monitoring and auto-healing daemon. Monitors 8 subsystems every 60s (PM2, PostgreSQL, Ollama, Cloudflare tunnel, disk, memory, network, WireGuard), gets AI-powered root cause analysis via the gateway (ctxhealer caller / ctx_health_diagnose task_type), executes healing actions with cooldown (5min) and escalation guards (3+ failures → human escalation), persists all incidents to ctx_health_incidents and ctx_health_status tables. Dry-run mode via CTX_HEALTH_DRY_RUN=true. Runs as ctx-health PM2 process on Erik server.	2026-04-03 00:16:08 +02:00
Rene Fichtmueller	a8a77e689c	feat: add CtxHealth + CtxSecurity to gateway — ctxhealer:latest model, 5 routing rules, 2 templates	2026-04-03 00:14:23 +02:00
Rene Fichtmueller	9b4d1caa8a	fix: routing-optimizer uses status='approved' not non-existent validation_passed column	2026-04-03 00:01:19 +02:00
Rene Fichtmueller	52697bc6fc	fix: replace hardcoded Mac paths with relative paths in learning engine (routing-optimizer, prompt-optimizer, few-shot-curator)	2026-04-02 23:58:53 +02:00
Rene Fichtmueller	c3248da6c0	chore: add pending changelog entries for 2026-04-02 fixes	2026-04-02 23:52:17 +02:00
Rene Fichtmueller	719336bded	fix: map input as fallback for all 20+ template content variables (ocr_text, alert_data, bgp_data, etc.)	2026-04-02 23:41:36 +02:00
Rene Fichtmueller	f1c1d107ca	fix: map input to source_data fallback and spread context vars into template variables	2026-04-02 23:38:22 +02:00
Rene Fichtmueller	3bb9923255	fix: fine-tuner uses FT_DB_URL/FT_GATEWAY_URL/FT_OLLAMA_URL env vars, not DATABASE_URL	2026-04-02 23:35:27 +02:00
Rene Fichtmueller	d8deecdb32	feat: SSH tunnel launch script for fine-tuner (IONOS blocks port 5432 externally)	2026-04-02 23:28:30 +02:00
Rene Fichtmueller	499e600239	fix: fine-tuner config points to Erik DB + CF tunnel URLs - database_url: Erik PostgreSQL (217.154.82.179:5432) with correct password - gateway_url: https://llm-gateway.context-x.org (public CF tunnel) - ollama_url: localhost:11434 (local Mac Studio, fine-tuner runs locally)	2026-04-02 23:23:17 +02:00
Rene Fichtmueller	0803fdb722	feat: add confidence_scorer prompt template (internal self-evaluation)	2026-04-02 23:20:31 +02:00
Rene Fichtmueller	b68d5c3fbf	fix: client CompletionResponse matches actual gateway response fields - Match field names: id, status, confidence, model, task_type, latency_ms, tokens, output - Default URL now https://llm-gateway.context-x.org (public endpoint) - ShieldX client uses 'shieldx' caller (not 'internal') - tokens.in/tokens.out instead of token_count.input/output	2026-04-02 23:17:14 +02:00
Rene Fichtmueller	ac33476666	feat: add 55 prompt templates + ShieldX/LinkedIn routing rules + ban lists in Gitea Templates (55 total, exceeds 49 target): - TIP: transceiver_enrich, datasheet_extract, compatibility_parse, blog_generator, faq_answer, hype_cycle_narrative, price_anomaly, vendor_classify, product_description - EO Global Pulse: business_card_ocr, voice_to_crm, event_prep_brief, attendee_enrich, meeting_suggest, lead_qualify, debrief_generate, ticket_summarize - SwitchBlade: root_cause, alert_narrative, cve_remediation, csrd_narrative, transceiver_advisor, bandwidth_report, ticket_draft, firmware_assess, topology_explain - PeerCortex: as_narrative, health_summary, rpki_explain, anomaly_hypothesis, peer_recommendation, incident_brief - NOGnet: cfp_evaluate, cfp_feedback, topic_gap_analysis, meeting_match, speaker_enrich, sponsor_pitch, event_debrief, agenda_summary, session_intro - ShieldX: threat_classify, pattern_describe, healing_recommend, compliance_report, false_positive - Content: linkedin_post_de, linkedin_post_en, newsletter_dispatch_de, email_draft_de - Internal: ban_detect, prompt_improve - Routing rules: +55 entries for all template-based task types - Ban lists: en.csv, de.csv, auto.csv created in Gitea (llm-banlists repo)	2026-04-02 23:14:30 +02:00
Rene Fichtmueller	c82b187548	feat: fix template resolution + add 40 routing rules for all project task types - completion.ts now uses taskType directly for resolvePrompt (not decision.prompt_template) so tip_transceiver_enrich.yaml is used instead of generic_qa fallback template - routing-rules.yaml: +40 task type entries for TIP (8), EO Pulse (8), SwitchBlade (9), PeerCortex (6), NOGnet (9), internal (2) — all with correct model tier assignments - qwen2.5:3b for fast tasks (classify, short outputs) - qwen2.5:14b for medium (most analysis tasks) - qwen2.5:32b for large (blog posts, detailed reports, CSRD)	2026-04-02 23:11:21 +02:00
Rene Fichtmueller	2c5f7f6ebe	fix: OLLAMA_URL env var takes precedence over hardcoded models.yaml URL Gateway was reading ollama_base_url from YAML (192.168.178.169) instead of OLLAMA_URL env var (https://ollama.fichtmueller.org). Fix getOllamaBaseUrl() to prefer process.env['OLLAMA_URL'] and update YAML default to CF tunnel.	2026-04-02 23:05:13 +02:00
Rene Fichtmueller	773fd368e0	fix: parse DATABASE_URL in pool clients + extend Ollama health timeout to 15s Gateway and learning DB clients now prefer DATABASE_URL connection string over individual DB_* env vars — matches ecosystem.config.cjs convention. Ollama health check timeout increased 5→15s for Cloudflare tunnel latency.	2026-04-02 23:03:31 +02:00
Rene Fichtmueller	4c5003f9fc	feat: fix OLLAMA_URL to use Cloudflare tunnel + add 35 prompt templates - Update OLLAMA_URL from 192.168.178.169 to https://ollama.fichtmueller.org - Fix port from 3100 to 3103 (3100 was taken by Docker proxy on Erik) - Fix DATABASE_URL password to llm_secure_2026 - Add GITEA_URL env var for ban list sync - Add 35 prompt templates: TIP (10), EO Global Pulse (8), SwitchBlade (9), PeerCortex (3), internal (3), ShieldX (1), general (1)	2026-04-02 23:00:37 +02:00
Rene Fichtmueller	3a00ff4d33	feat: initial llm-gateway implementation - Complete Fastify gateway with 8-stage pipeline - Circuit breaker (opossum) per model tier - Rate limiting per caller - Ban list validation (EN/DE/auto-detected) - TIP validator (SFF-8024, part numbers, wavelengths) - Prometheus metrics - pg-boss async queue - PostgreSQL audit log + review queue - 9 prompt templates (TIP, LinkedIn, ShieldX) - Learning engine scaffolding - Auto-learning: ban-list, few-shot, routing, prompt optimizer	2026-04-02 22:48:55 +02:00

40 Commits