llm-gateway/README.md
Rene Fichtmueller 3a00ff4d33 feat: initial llm-gateway implementation
- Complete Fastify gateway with 8-stage pipeline
- Circuit breaker (opossum) per model tier
- Rate limiting per caller
- Ban list validation (EN/DE/auto-detected)
- TIP validator (SFF-8024, part numbers, wavelengths)
- Prometheus metrics
- pg-boss async queue
- PostgreSQL audit log + review queue
- 9 prompt templates (TIP, LinkedIn, ShieldX)
- Learning engine scaffolding
- Auto-learning: ban-list, few-shot, routing, prompt optimizer
2026-04-02 22:48:55 +02:00

359 lines
8.9 KiB
Markdown

# LLM Gateway
Centralized AI inference layer for all Context X projects. Routes requests to local Ollama models on Mac Studio (192.168.178.169), validates outputs with ShieldX, and records all interactions for the self-improving learning engine.
**Port:** 3100
**Production:** http://llm-gateway.context-x.org (Cloudflare Tunnel → Erik)
---
## Architecture
```
Projects (TIP, EO Pulse, SwitchBlade, PeerCortex, NOGnet, CtxEvent)
↓ @llm-gateway/client
LLM Gateway :3100
├── Prompt Engine (versioned templates per task_type)
├── ShieldX Guard (prompt injection validation)
├── Ollama Router (model tier selection: 3b / 14b / 32b / 70b)
└── Learning Engine (feedback loop, self-improvement)
PostgreSQL (llm_gateway DB)
Ollama (Mac Studio :11434)
```
---
## Prerequisites
| Dependency | Version | Notes |
|----------------|---------|--------------------------------|
| Node.js | 22+ | `node --version` |
| PostgreSQL | 17 | Local or remote |
| Ollama | latest | Running on Mac Studio .169 |
| PM2 | latest | `npm install -g pm2` (Erik) |
---
## 1. Local Development Setup
```bash
# Clone
git clone http://gitea.context-x.org/rene/llm-gateway.git
cd llm-gateway
# Install all workspace dependencies
npm install
# Copy and configure environment
cp .env.example .env
# Edit .env: set DATABASE_URL, OLLAMA_URL at minimum
# Initialize database
bash scripts/init-db.sh
# Pull required Ollama models (runs against OLLAMA_URL from .env)
bash scripts/pull-models.sh
# Start gateway
npm run dev
# In a separate terminal: start learning engine
npm run learning
```
Gateway is available at http://localhost:3100.
---
## 2. Environment Variables
See `.env.example` for all variables with descriptions.
| Variable | Required | Default | Description |
|-------------------|----------|--------------------------|---------------------------------|
| `DATABASE_URL` | YES | — | PostgreSQL DSN for llm_gateway |
| `TIP_DATABASE_URL`| NO | — | TIP DB (read-only) |
| `OLLAMA_URL` | YES | http://...169:11434 | Ollama inference server |
| `SHIELDX_URL` | NO | — | ShieldX endpoint (leave blank to skip) |
| `PORT` | NO | 3100 | HTTP port |
| `LOG_LEVEL` | NO | info | error / warn / info / debug |
---
## 3. Running Migrations
```bash
# Full init (create DB + user + run all migrations)
bash scripts/init-db.sh
# Custom Postgres host (e.g. Erik)
PGHOST=217.154.82.179 PGPORT=5432 bash scripts/init-db.sh
```
Migration files live in:
- `packages/gateway/src/db/migrations/001_initial.sql`
- `packages/learning/src/db/migrations/002_learning.sql`
---
## 4. Pulling Ollama Models
```bash
bash scripts/pull-models.sh
# Against a different Ollama instance:
OLLAMA_URL=http://localhost:11434 bash scripts/pull-models.sh
```
Required models:
| Model | Tier | Use case |
|-------------------|-----------|-----------------------------------|
| `qwen2.5:3b` | Fast | Low-complexity, sub-second tasks |
| `qwen2.5:14b` | Medium | Standard completions |
| `qwen2.5:32b` | Large | Complex analysis |
| `deepseek-r1:14b` | Reasoning | Step-by-step logic |
| `llama3.3:70b` | Premium | Best quality, used sparingly |
---
## 5. API Usage
### Completion
```bash
curl -X POST http://localhost:3100/v1/completion \
-H "Content-Type: application/json" \
-d '{
"caller": "my-project",
"task_type": "summarize",
"input": "Long document text here...",
"language": "en"
}'
```
Response:
```json
{
"request_id": "uuid",
"status": "approved",
"output": "Summary...",
"confidence": 0.92,
"model_used": "qwen2.5:14b",
"prompt_version": "summarize/v2",
"token_count": { "input": 512, "output": 128 },
"latency_ms": 1240
}
```
### Classify input
```bash
curl -X POST http://localhost:3100/v1/classify \
-H "Content-Type: application/json" \
-d '{ "caller": "my-project", "input": "What transceivers work with Cisco ASR9k?" }'
```
### Health
```bash
curl http://localhost:3100/health
curl http://localhost:3100/health/live # liveness probe (k8s / Docker)
curl http://localhost:3100/health/ready # readiness probe
```
---
## 6. Project-specific Client Usage
Install the client in any workspace project:
```bash
npm install @llm-gateway/client
```
### TIP (Transceiver Intelligence Platform)
```typescript
import { createTIPClient } from '@llm-gateway/client';
const llm = createTIPClient(); // reads LLM_GATEWAY_URL from env
const result = await llm.completion({
task_type: 'extract_specs',
input: rawHtml,
context: { vendor: 'Cisco', sku: 'SFP-10G-SR' },
});
if (result.status === 'approved') {
console.log(result.output);
}
```
### EO Global Pulse
```typescript
import { createEOPulseClient } from '@llm-gateway/client';
const llm = createEOPulseClient();
// Safe completion: returns null when gateway is down (graceful degradation)
const result = await llm.safeCompletion({
task_type: 'meeting_summary',
input: transcriptText,
language: 'de',
});
```
### SwitchBlade
```typescript
import { createSwitchBladeClient } from '@llm-gateway/client';
const llm = createSwitchBladeClient();
const { batch_id } = await llm.batch(
tasks.map(t => ({ task_type: 'analyze_alert', input: t.raw })),
'http://switchblade.context-x.org/webhooks/llm-batch',
);
```
### Custom client (any project)
```typescript
import { LLMGatewayClient } from '@llm-gateway/client';
const llm = new LLMGatewayClient({
caller: 'my-service',
baseUrl: process.env.LLM_GATEWAY_URL,
timeout: 20_000,
});
```
---
## 7. Deployment to Erik
### One-command deploy (from local Mac)
```bash
bash deploy/deploy.sh
# Skip local build (if already built):
bash deploy/deploy.sh --skip-build
# Health check only:
bash deploy/deploy.sh --health-only
```
### First-time setup on Erik
```bash
# SSH to Erik
ssh root@217.154.82.179
# Run setup script (idempotent — safe to re-run)
cd /opt/llm-gateway
bash deploy/setup-erik.sh
```
### PM2 management
```bash
ssh erik "pm2 status"
ssh erik "pm2 logs llm-gateway"
ssh erik "pm2 logs llm-learning"
ssh erik "pm2 restart llm-gateway"
ssh erik "pm2 monit"
```
---
## 8. Monitoring
### Prometheus metrics
```
GET http://localhost:3100/metrics
```
### Grafana
Metrics are scraped by the existing Prometheus instance. Import the dashboard from `deploy/grafana-dashboard.json` (if present).
### Key metrics to watch
| Metric | Alert threshold |
|-----------------------------|------------------------|
| `gateway_request_latency_p99` | > 5 000 ms |
| `gateway_error_rate` | > 5% |
| `ollama_queue_depth` | > 20 |
| `learning_feedback_lag` | > 1 h |
### Log locations (Erik)
```
/var/log/llm-gateway/out.log # gateway stdout
/var/log/llm-gateway/error.log # gateway stderr
/var/log/llm-gateway/learning-out.log # learning engine stdout
/var/log/llm-gateway/learning-error.log
```
---
## 9. Cloudflare Tunnel
See `deploy/cloudflare-tunnel.md` for instructions to expose the gateway via `https://llm-gateway.context-x.org`.
---
## 10. Docker (alternative to PM2)
```bash
# Build and start all services
cp .env.example .env # fill in DATABASE_URL
docker compose up -d
# Check status
docker compose ps
docker compose logs llm-gateway
# Stop
docker compose down
```
---
## Repository structure
```
llm-gateway/
├── packages/
│ ├── gateway/ # Core HTTP server (Express + Ollama + ShieldX)
│ │ ├── src/
│ │ │ ├── server.ts
│ │ │ ├── routes/
│ │ │ ├── db/
│ │ │ │ └── migrations/
│ │ │ └── prompts/
│ │ └── prompts/ # Versioned prompt templates
│ ├── learning/ # Self-improving feedback engine
│ │ └── src/
│ └── client/ # @llm-gateway/client TypeScript library
│ └── src/index.ts
├── deploy/
│ ├── setup-erik.sh # First-time server setup
│ ├── deploy.sh # One-command local → Erik deploy
│ ├── ecosystem.config.cjs # PM2 config
│ ├── nginx.conf # Optional nginx reverse proxy
│ └── cloudflare-tunnel.md
├── scripts/
│ ├── init-db.sh # Database initialization
│ └── pull-models.sh # Pull Ollama models
├── Dockerfile
├── docker-compose.yaml
├── .env.example
└── package.json # npm workspaces root
```