- Complete Fastify gateway with 8-stage pipeline - Circuit breaker (opossum) per model tier - Rate limiting per caller - Ban list validation (EN/DE/auto-detected) - TIP validator (SFF-8024, part numbers, wavelengths) - Prometheus metrics - pg-boss async queue - PostgreSQL audit log + review queue - 9 prompt templates (TIP, LinkedIn, ShieldX) - Learning engine scaffolding - Auto-learning: ban-list, few-shot, routing, prompt optimizer
359 lines
8.9 KiB
Markdown
359 lines
8.9 KiB
Markdown
# LLM Gateway
|
|
|
|
Centralized AI inference layer for all Context X projects. Routes requests to local Ollama models on Mac Studio (192.168.178.169), validates outputs with ShieldX, and records all interactions for the self-improving learning engine.
|
|
|
|
**Port:** 3100
|
|
**Production:** http://llm-gateway.context-x.org (Cloudflare Tunnel → Erik)
|
|
|
|
---
|
|
|
|
## Architecture
|
|
|
|
```
|
|
Projects (TIP, EO Pulse, SwitchBlade, PeerCortex, NOGnet, CtxEvent)
|
|
↓ @llm-gateway/client
|
|
LLM Gateway :3100
|
|
├── Prompt Engine (versioned templates per task_type)
|
|
├── ShieldX Guard (prompt injection validation)
|
|
├── Ollama Router (model tier selection: 3b / 14b / 32b / 70b)
|
|
└── Learning Engine (feedback loop, self-improvement)
|
|
↓
|
|
PostgreSQL (llm_gateway DB)
|
|
Ollama (Mac Studio :11434)
|
|
```
|
|
|
|
---
|
|
|
|
## Prerequisites
|
|
|
|
| Dependency | Version | Notes |
|
|
|----------------|---------|--------------------------------|
|
|
| Node.js | 22+ | `node --version` |
|
|
| PostgreSQL | 17 | Local or remote |
|
|
| Ollama | latest | Running on Mac Studio .169 |
|
|
| PM2 | latest | `npm install -g pm2` (Erik) |
|
|
|
|
---
|
|
|
|
## 1. Local Development Setup
|
|
|
|
```bash
|
|
# Clone
|
|
git clone http://gitea.context-x.org/rene/llm-gateway.git
|
|
cd llm-gateway
|
|
|
|
# Install all workspace dependencies
|
|
npm install
|
|
|
|
# Copy and configure environment
|
|
cp .env.example .env
|
|
# Edit .env: set DATABASE_URL, OLLAMA_URL at minimum
|
|
|
|
# Initialize database
|
|
bash scripts/init-db.sh
|
|
|
|
# Pull required Ollama models (runs against OLLAMA_URL from .env)
|
|
bash scripts/pull-models.sh
|
|
|
|
# Start gateway
|
|
npm run dev
|
|
|
|
# In a separate terminal: start learning engine
|
|
npm run learning
|
|
```
|
|
|
|
Gateway is available at http://localhost:3100.
|
|
|
|
---
|
|
|
|
## 2. Environment Variables
|
|
|
|
See `.env.example` for all variables with descriptions.
|
|
|
|
| Variable | Required | Default | Description |
|
|
|-------------------|----------|--------------------------|---------------------------------|
|
|
| `DATABASE_URL` | YES | — | PostgreSQL DSN for llm_gateway |
|
|
| `TIP_DATABASE_URL`| NO | — | TIP DB (read-only) |
|
|
| `OLLAMA_URL` | YES | http://...169:11434 | Ollama inference server |
|
|
| `SHIELDX_URL` | NO | — | ShieldX endpoint (leave blank to skip) |
|
|
| `PORT` | NO | 3100 | HTTP port |
|
|
| `LOG_LEVEL` | NO | info | error / warn / info / debug |
|
|
|
|
---
|
|
|
|
## 3. Running Migrations
|
|
|
|
```bash
|
|
# Full init (create DB + user + run all migrations)
|
|
bash scripts/init-db.sh
|
|
|
|
# Custom Postgres host (e.g. Erik)
|
|
PGHOST=217.154.82.179 PGPORT=5432 bash scripts/init-db.sh
|
|
```
|
|
|
|
Migration files live in:
|
|
- `packages/gateway/src/db/migrations/001_initial.sql`
|
|
- `packages/learning/src/db/migrations/002_learning.sql`
|
|
|
|
---
|
|
|
|
## 4. Pulling Ollama Models
|
|
|
|
```bash
|
|
bash scripts/pull-models.sh
|
|
|
|
# Against a different Ollama instance:
|
|
OLLAMA_URL=http://localhost:11434 bash scripts/pull-models.sh
|
|
```
|
|
|
|
Required models:
|
|
|
|
| Model | Tier | Use case |
|
|
|-------------------|-----------|-----------------------------------|
|
|
| `qwen2.5:3b` | Fast | Low-complexity, sub-second tasks |
|
|
| `qwen2.5:14b` | Medium | Standard completions |
|
|
| `qwen2.5:32b` | Large | Complex analysis |
|
|
| `deepseek-r1:14b` | Reasoning | Step-by-step logic |
|
|
| `llama3.3:70b` | Premium | Best quality, used sparingly |
|
|
|
|
---
|
|
|
|
## 5. API Usage
|
|
|
|
### Completion
|
|
|
|
```bash
|
|
curl -X POST http://localhost:3100/v1/completion \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"caller": "my-project",
|
|
"task_type": "summarize",
|
|
"input": "Long document text here...",
|
|
"language": "en"
|
|
}'
|
|
```
|
|
|
|
Response:
|
|
```json
|
|
{
|
|
"request_id": "uuid",
|
|
"status": "approved",
|
|
"output": "Summary...",
|
|
"confidence": 0.92,
|
|
"model_used": "qwen2.5:14b",
|
|
"prompt_version": "summarize/v2",
|
|
"token_count": { "input": 512, "output": 128 },
|
|
"latency_ms": 1240
|
|
}
|
|
```
|
|
|
|
### Classify input
|
|
|
|
```bash
|
|
curl -X POST http://localhost:3100/v1/classify \
|
|
-H "Content-Type: application/json" \
|
|
-d '{ "caller": "my-project", "input": "What transceivers work with Cisco ASR9k?" }'
|
|
```
|
|
|
|
### Health
|
|
|
|
```bash
|
|
curl http://localhost:3100/health
|
|
curl http://localhost:3100/health/live # liveness probe (k8s / Docker)
|
|
curl http://localhost:3100/health/ready # readiness probe
|
|
```
|
|
|
|
---
|
|
|
|
## 6. Project-specific Client Usage
|
|
|
|
Install the client in any workspace project:
|
|
|
|
```bash
|
|
npm install @llm-gateway/client
|
|
```
|
|
|
|
### TIP (Transceiver Intelligence Platform)
|
|
|
|
```typescript
|
|
import { createTIPClient } from '@llm-gateway/client';
|
|
|
|
const llm = createTIPClient(); // reads LLM_GATEWAY_URL from env
|
|
|
|
const result = await llm.completion({
|
|
task_type: 'extract_specs',
|
|
input: rawHtml,
|
|
context: { vendor: 'Cisco', sku: 'SFP-10G-SR' },
|
|
});
|
|
|
|
if (result.status === 'approved') {
|
|
console.log(result.output);
|
|
}
|
|
```
|
|
|
|
### EO Global Pulse
|
|
|
|
```typescript
|
|
import { createEOPulseClient } from '@llm-gateway/client';
|
|
|
|
const llm = createEOPulseClient();
|
|
|
|
// Safe completion: returns null when gateway is down (graceful degradation)
|
|
const result = await llm.safeCompletion({
|
|
task_type: 'meeting_summary',
|
|
input: transcriptText,
|
|
language: 'de',
|
|
});
|
|
```
|
|
|
|
### SwitchBlade
|
|
|
|
```typescript
|
|
import { createSwitchBladeClient } from '@llm-gateway/client';
|
|
|
|
const llm = createSwitchBladeClient();
|
|
|
|
const { batch_id } = await llm.batch(
|
|
tasks.map(t => ({ task_type: 'analyze_alert', input: t.raw })),
|
|
'http://switchblade.context-x.org/webhooks/llm-batch',
|
|
);
|
|
```
|
|
|
|
### Custom client (any project)
|
|
|
|
```typescript
|
|
import { LLMGatewayClient } from '@llm-gateway/client';
|
|
|
|
const llm = new LLMGatewayClient({
|
|
caller: 'my-service',
|
|
baseUrl: process.env.LLM_GATEWAY_URL,
|
|
timeout: 20_000,
|
|
});
|
|
```
|
|
|
|
---
|
|
|
|
## 7. Deployment to Erik
|
|
|
|
### One-command deploy (from local Mac)
|
|
|
|
```bash
|
|
bash deploy/deploy.sh
|
|
|
|
# Skip local build (if already built):
|
|
bash deploy/deploy.sh --skip-build
|
|
|
|
# Health check only:
|
|
bash deploy/deploy.sh --health-only
|
|
```
|
|
|
|
### First-time setup on Erik
|
|
|
|
```bash
|
|
# SSH to Erik
|
|
ssh root@217.154.82.179
|
|
|
|
# Run setup script (idempotent — safe to re-run)
|
|
cd /opt/llm-gateway
|
|
bash deploy/setup-erik.sh
|
|
```
|
|
|
|
### PM2 management
|
|
|
|
```bash
|
|
ssh erik "pm2 status"
|
|
ssh erik "pm2 logs llm-gateway"
|
|
ssh erik "pm2 logs llm-learning"
|
|
ssh erik "pm2 restart llm-gateway"
|
|
ssh erik "pm2 monit"
|
|
```
|
|
|
|
---
|
|
|
|
## 8. Monitoring
|
|
|
|
### Prometheus metrics
|
|
|
|
```
|
|
GET http://localhost:3100/metrics
|
|
```
|
|
|
|
### Grafana
|
|
|
|
Metrics are scraped by the existing Prometheus instance. Import the dashboard from `deploy/grafana-dashboard.json` (if present).
|
|
|
|
### Key metrics to watch
|
|
|
|
| Metric | Alert threshold |
|
|
|-----------------------------|------------------------|
|
|
| `gateway_request_latency_p99` | > 5 000 ms |
|
|
| `gateway_error_rate` | > 5% |
|
|
| `ollama_queue_depth` | > 20 |
|
|
| `learning_feedback_lag` | > 1 h |
|
|
|
|
### Log locations (Erik)
|
|
|
|
```
|
|
/var/log/llm-gateway/out.log # gateway stdout
|
|
/var/log/llm-gateway/error.log # gateway stderr
|
|
/var/log/llm-gateway/learning-out.log # learning engine stdout
|
|
/var/log/llm-gateway/learning-error.log
|
|
```
|
|
|
|
---
|
|
|
|
## 9. Cloudflare Tunnel
|
|
|
|
See `deploy/cloudflare-tunnel.md` for instructions to expose the gateway via `https://llm-gateway.context-x.org`.
|
|
|
|
---
|
|
|
|
## 10. Docker (alternative to PM2)
|
|
|
|
```bash
|
|
# Build and start all services
|
|
cp .env.example .env # fill in DATABASE_URL
|
|
docker compose up -d
|
|
|
|
# Check status
|
|
docker compose ps
|
|
docker compose logs llm-gateway
|
|
|
|
# Stop
|
|
docker compose down
|
|
```
|
|
|
|
---
|
|
|
|
## Repository structure
|
|
|
|
```
|
|
llm-gateway/
|
|
├── packages/
|
|
│ ├── gateway/ # Core HTTP server (Express + Ollama + ShieldX)
|
|
│ │ ├── src/
|
|
│ │ │ ├── server.ts
|
|
│ │ │ ├── routes/
|
|
│ │ │ ├── db/
|
|
│ │ │ │ └── migrations/
|
|
│ │ │ └── prompts/
|
|
│ │ └── prompts/ # Versioned prompt templates
|
|
│ ├── learning/ # Self-improving feedback engine
|
|
│ │ └── src/
|
|
│ └── client/ # @llm-gateway/client TypeScript library
|
|
│ └── src/index.ts
|
|
├── deploy/
|
|
│ ├── setup-erik.sh # First-time server setup
|
|
│ ├── deploy.sh # One-command local → Erik deploy
|
|
│ ├── ecosystem.config.cjs # PM2 config
|
|
│ ├── nginx.conf # Optional nginx reverse proxy
|
|
│ └── cloudflare-tunnel.md
|
|
├── scripts/
|
|
│ ├── init-db.sh # Database initialization
|
|
│ └── pull-models.sh # Pull Ollama models
|
|
├── Dockerfile
|
|
├── docker-compose.yaml
|
|
├── .env.example
|
|
└── package.json # npm workspaces root
|
|
```
|