# LLM Gateway Centralized AI inference layer for all Context X projects. Routes requests to local Ollama models on Mac Studio (192.168.178.169), validates outputs with ShieldX, and records all interactions for the self-improving learning engine. **Port:** 3100 **Production:** http://llm-gateway.context-x.org (Cloudflare Tunnel → Erik) --- ## Architecture ``` Projects (TIP, EO Pulse, SwitchBlade, PeerCortex, NOGnet, CtxEvent) ↓ @llm-gateway/client LLM Gateway :3100 ├── Prompt Engine (versioned templates per task_type) ├── ShieldX Guard (prompt injection validation) ├── Ollama Router (model tier selection: 3b / 14b / 32b / 70b) └── Learning Engine (feedback loop, self-improvement) ↓ PostgreSQL (llm_gateway DB) Ollama (Mac Studio :11434) ``` --- ## Prerequisites | Dependency | Version | Notes | |----------------|---------|--------------------------------| | Node.js | 22+ | `node --version` | | PostgreSQL | 17 | Local or remote | | Ollama | latest | Running on Mac Studio .169 | | PM2 | latest | `npm install -g pm2` (Erik) | --- ## 1. Local Development Setup ```bash # Clone git clone http://gitea.context-x.org/rene/llm-gateway.git cd llm-gateway # Install all workspace dependencies npm install # Copy and configure environment cp .env.example .env # Edit .env: set DATABASE_URL, OLLAMA_URL at minimum # Initialize database bash scripts/init-db.sh # Pull required Ollama models (runs against OLLAMA_URL from .env) bash scripts/pull-models.sh # Start gateway npm run dev # In a separate terminal: start learning engine npm run learning ``` Gateway is available at http://localhost:3100. --- ## 2. Environment Variables See `.env.example` for all variables with descriptions. | Variable | Required | Default | Description | |-------------------|----------|--------------------------|---------------------------------| | `DATABASE_URL` | YES | — | PostgreSQL DSN for llm_gateway | | `TIP_DATABASE_URL`| NO | — | TIP DB (read-only) | | `OLLAMA_URL` | YES | http://...169:11434 | Ollama inference server | | `SHIELDX_URL` | NO | — | ShieldX endpoint (leave blank to skip) | | `PORT` | NO | 3100 | HTTP port | | `LOG_LEVEL` | NO | info | error / warn / info / debug | --- ## 3. Running Migrations ```bash # Full init (create DB + user + run all migrations) bash scripts/init-db.sh # Custom Postgres host (e.g. Erik) PGHOST=217.154.82.179 PGPORT=5432 bash scripts/init-db.sh ``` Migration files live in: - `packages/gateway/src/db/migrations/001_initial.sql` - `packages/learning/src/db/migrations/002_learning.sql` --- ## 4. Pulling Ollama Models ```bash bash scripts/pull-models.sh # Against a different Ollama instance: OLLAMA_URL=http://localhost:11434 bash scripts/pull-models.sh ``` Required models: | Model | Tier | Use case | |-------------------|-----------|-----------------------------------| | `qwen2.5:3b` | Fast | Low-complexity, sub-second tasks | | `qwen2.5:14b` | Medium | Standard completions | | `qwen2.5:32b` | Large | Complex analysis | | `deepseek-r1:14b` | Reasoning | Step-by-step logic | | `llama3.3:70b` | Premium | Best quality, used sparingly | --- ## 5. API Usage ### Completion ```bash curl -X POST http://localhost:3100/v1/completion \ -H "Content-Type: application/json" \ -d '{ "caller": "my-project", "task_type": "summarize", "input": "Long document text here...", "language": "en" }' ``` Response: ```json { "request_id": "uuid", "status": "approved", "output": "Summary...", "confidence": 0.92, "model_used": "qwen2.5:14b", "prompt_version": "summarize/v2", "token_count": { "input": 512, "output": 128 }, "latency_ms": 1240 } ``` ### Classify input ```bash curl -X POST http://localhost:3100/v1/classify \ -H "Content-Type: application/json" \ -d '{ "caller": "my-project", "input": "What transceivers work with Cisco ASR9k?" }' ``` ### Health ```bash curl http://localhost:3100/health curl http://localhost:3100/health/live # liveness probe (k8s / Docker) curl http://localhost:3100/health/ready # readiness probe ``` --- ## 6. Project-specific Client Usage Install the client in any workspace project: ```bash npm install @llm-gateway/client ``` ### TIP (Transceiver Intelligence Platform) ```typescript import { createTIPClient } from '@llm-gateway/client'; const llm = createTIPClient(); // reads LLM_GATEWAY_URL from env const result = await llm.completion({ task_type: 'extract_specs', input: rawHtml, context: { vendor: 'Cisco', sku: 'SFP-10G-SR' }, }); if (result.status === 'approved') { console.log(result.output); } ``` ### EO Global Pulse ```typescript import { createEOPulseClient } from '@llm-gateway/client'; const llm = createEOPulseClient(); // Safe completion: returns null when gateway is down (graceful degradation) const result = await llm.safeCompletion({ task_type: 'meeting_summary', input: transcriptText, language: 'de', }); ``` ### SwitchBlade ```typescript import { createSwitchBladeClient } from '@llm-gateway/client'; const llm = createSwitchBladeClient(); const { batch_id } = await llm.batch( tasks.map(t => ({ task_type: 'analyze_alert', input: t.raw })), 'http://switchblade.context-x.org/webhooks/llm-batch', ); ``` ### Custom client (any project) ```typescript import { LLMGatewayClient } from '@llm-gateway/client'; const llm = new LLMGatewayClient({ caller: 'my-service', baseUrl: process.env.LLM_GATEWAY_URL, timeout: 20_000, }); ``` --- ## 7. Deployment to Erik ### One-command deploy (from local Mac) ```bash bash deploy/deploy.sh # Skip local build (if already built): bash deploy/deploy.sh --skip-build # Health check only: bash deploy/deploy.sh --health-only ``` ### First-time setup on Erik ```bash # SSH to Erik ssh root@217.154.82.179 # Run setup script (idempotent — safe to re-run) cd /opt/llm-gateway bash deploy/setup-erik.sh ``` ### PM2 management ```bash ssh erik "pm2 status" ssh erik "pm2 logs llm-gateway" ssh erik "pm2 logs llm-learning" ssh erik "pm2 restart llm-gateway" ssh erik "pm2 monit" ``` --- ## 8. Monitoring ### Prometheus metrics ``` GET http://localhost:3100/metrics ``` ### Grafana Metrics are scraped by the existing Prometheus instance. Import the dashboard from `deploy/grafana-dashboard.json` (if present). ### Key metrics to watch | Metric | Alert threshold | |-----------------------------|------------------------| | `gateway_request_latency_p99` | > 5 000 ms | | `gateway_error_rate` | > 5% | | `ollama_queue_depth` | > 20 | | `learning_feedback_lag` | > 1 h | ### Log locations (Erik) ``` /var/log/llm-gateway/out.log # gateway stdout /var/log/llm-gateway/error.log # gateway stderr /var/log/llm-gateway/learning-out.log # learning engine stdout /var/log/llm-gateway/learning-error.log ``` --- ## 9. Cloudflare Tunnel See `deploy/cloudflare-tunnel.md` for instructions to expose the gateway via `https://llm-gateway.context-x.org`. --- ## 10. Docker (alternative to PM2) ```bash # Build and start all services cp .env.example .env # fill in DATABASE_URL docker compose up -d # Check status docker compose ps docker compose logs llm-gateway # Stop docker compose down ``` --- ## Repository structure ``` llm-gateway/ ├── packages/ │ ├── gateway/ # Core HTTP server (Express + Ollama + ShieldX) │ │ ├── src/ │ │ │ ├── server.ts │ │ │ ├── routes/ │ │ │ ├── db/ │ │ │ │ └── migrations/ │ │ │ └── prompts/ │ │ └── prompts/ # Versioned prompt templates │ ├── learning/ # Self-improving feedback engine │ │ └── src/ │ └── client/ # @llm-gateway/client TypeScript library │ └── src/index.ts ├── deploy/ │ ├── setup-erik.sh # First-time server setup │ ├── deploy.sh # One-command local → Erik deploy │ ├── ecosystem.config.cjs # PM2 config │ ├── nginx.conf # Optional nginx reverse proxy │ └── cloudflare-tunnel.md ├── scripts/ │ ├── init-db.sh # Database initialization │ └── pull-models.sh # Pull Ollama models ├── Dockerfile ├── docker-compose.yaml ├── .env.example └── package.json # npm workspaces root ```