llm-gateway/README.md

# LLM Gateway

Centralized AI inference layer for all Context X projects. Routes requests to local Ollama models on Mac Studio (192.168.178.169), validates outputs with ShieldX, and records all interactions for the self-improving learning engine.

**Port:** 3100
**Production:** http://llm-gateway.context-x.org (Cloudflare Tunnel → Erik)

---

## Architecture

```
Projects (TIP, EO Pulse, SwitchBlade, PeerCortex, NOGnet, CtxEvent)
    ↓  @llm-gateway/client
LLM Gateway :3100
    ├── Prompt Engine   (versioned templates per task_type)
    ├── ShieldX Guard   (prompt injection validation)
    ├── Ollama Router   (model tier selection: 3b / 14b / 32b / 70b)
    └── Learning Engine (feedback loop, self-improvement)
         ↓
    PostgreSQL (llm_gateway DB)
    Ollama     (Mac Studio :11434)
```

---

## Prerequisites

| Dependency     | Version | Notes                          |
|----------------|---------|--------------------------------|
| Node.js        | 22+     | `node --version`               |
| PostgreSQL     | 17      | Local or remote                |
| Ollama         | latest  | Running on Mac Studio .169     |
| PM2            | latest  | `npm install -g pm2` (Erik)    |

---

## 1. Local Development Setup

```bash
# Clone
git clone http://gitea.context-x.org/rene/llm-gateway.git
cd llm-gateway

# Install all workspace dependencies
npm install

# Copy and configure environment
cp .env.example .env
# Edit .env: set DATABASE_URL, OLLAMA_URL at minimum

# Initialize database
bash scripts/init-db.sh

# Pull required Ollama models (runs against OLLAMA_URL from .env)
bash scripts/pull-models.sh

# Start gateway
npm run dev

# In a separate terminal: start learning engine
npm run learning
```

Gateway is available at http://localhost:3100.

---

## 2. Environment Variables

See `.env.example` for all variables with descriptions.

| Variable          | Required | Default                  | Description                     |
|-------------------|----------|--------------------------|---------------------------------|
| `DATABASE_URL`    | YES      | —                        | PostgreSQL DSN for llm_gateway  |
| `TIP_DATABASE_URL`| NO       | —                        | TIP DB (read-only)              |
| `OLLAMA_URL`      | YES      | http://...169:11434      | Ollama inference server         |
| `SHIELDX_URL`     | NO       | —                        | ShieldX endpoint (leave blank to skip) |
| `PORT`            | NO       | 3100                     | HTTP port                       |
| `LOG_LEVEL`       | NO       | info                     | error / warn / info / debug     |

---

## 3. Running Migrations

```bash
# Full init (create DB + user + run all migrations)
bash scripts/init-db.sh

# Custom Postgres host (e.g. Erik)
PGHOST=217.154.82.179 PGPORT=5432 bash scripts/init-db.sh
```

Migration files live in:
- `packages/gateway/src/db/migrations/001_initial.sql`
- `packages/learning/src/db/migrations/002_learning.sql`

---

## 4. Pulling Ollama Models

```bash
bash scripts/pull-models.sh

# Against a different Ollama instance:
OLLAMA_URL=http://localhost:11434 bash scripts/pull-models.sh
```

Required models:

| Model             | Tier      | Use case                          |
|-------------------|-----------|-----------------------------------|
| `qwen2.5:3b`      | Fast      | Low-complexity, sub-second tasks  |
| `qwen2.5:14b`     | Medium    | Standard completions              |
| `qwen2.5:32b`     | Large     | Complex analysis                  |
| `deepseek-r1:14b` | Reasoning | Step-by-step logic                |
| `llama3.3:70b`    | Premium   | Best quality, used sparingly      |

---

## 5. API Usage

### Completion

```bash
curl -X POST http://localhost:3100/v1/completion \
  -H "Content-Type: application/json" \
  -d '{
    "caller": "my-project",
    "task_type": "summarize",
    "input": "Long document text here...",
    "language": "en"
  }'
```

Response:
```json
{
  "request_id": "uuid",
  "status": "approved",
  "output": "Summary...",
  "confidence": 0.92,
  "model_used": "qwen2.5:14b",
  "prompt_version": "summarize/v2",
  "token_count": { "input": 512, "output": 128 },
  "latency_ms": 1240
}
```

### Classify input

```bash
curl -X POST http://localhost:3100/v1/classify \
  -H "Content-Type: application/json" \
  -d '{ "caller": "my-project", "input": "What transceivers work with Cisco ASR9k?" }'
```

### Health

```bash
curl http://localhost:3100/health
curl http://localhost:3100/health/live   # liveness probe (k8s / Docker)
curl http://localhost:3100/health/ready  # readiness probe
```

---

## 6. Project-specific Client Usage

Install the client in any workspace project:

```bash
npm install @llm-gateway/client
```

### TIP (Transceiver Intelligence Platform)

```typescript
import { createTIPClient } from '@llm-gateway/client';

const llm = createTIPClient(); // reads LLM_GATEWAY_URL from env

const result = await llm.completion({
  task_type: 'extract_specs',
  input: rawHtml,
  context: { vendor: 'Cisco', sku: 'SFP-10G-SR' },
});

if (result.status === 'approved') {
  console.log(result.output);
}
```

### EO Global Pulse

```typescript
import { createEOPulseClient } from '@llm-gateway/client';

const llm = createEOPulseClient();

// Safe completion: returns null when gateway is down (graceful degradation)
const result = await llm.safeCompletion({
  task_type: 'meeting_summary',
  input: transcriptText,
  language: 'de',
});
```

### SwitchBlade

```typescript
import { createSwitchBladeClient } from '@llm-gateway/client';

const llm = createSwitchBladeClient();

const { batch_id } = await llm.batch(
  tasks.map(t => ({ task_type: 'analyze_alert', input: t.raw })),
  'http://switchblade.context-x.org/webhooks/llm-batch',
);
```

### Custom client (any project)

```typescript
import { LLMGatewayClient } from '@llm-gateway/client';

const llm = new LLMGatewayClient({
  caller: 'my-service',
  baseUrl: process.env.LLM_GATEWAY_URL,
  timeout: 20_000,
});
```

---

## 7. Deployment to Erik

### One-command deploy (from local Mac)

```bash
bash deploy/deploy.sh

# Skip local build (if already built):
bash deploy/deploy.sh --skip-build

# Health check only:
bash deploy/deploy.sh --health-only
```

### First-time setup on Erik

```bash
# SSH to Erik
ssh root@217.154.82.179

# Run setup script (idempotent — safe to re-run)
cd /opt/llm-gateway
bash deploy/setup-erik.sh
```

### PM2 management

```bash
ssh erik "pm2 status"
ssh erik "pm2 logs llm-gateway"
ssh erik "pm2 logs llm-learning"
ssh erik "pm2 restart llm-gateway"
ssh erik "pm2 monit"
```

---

## 8. Monitoring

### Prometheus metrics

```
GET http://localhost:3100/metrics
```

### Grafana

Metrics are scraped by the existing Prometheus instance. Import the dashboard from `deploy/grafana-dashboard.json` (if present).

### Key metrics to watch

| Metric                      | Alert threshold        |
|-----------------------------|------------------------|
| `gateway_request_latency_p99` | > 5 000 ms           |
| `gateway_error_rate`         | > 5%                   |
| `ollama_queue_depth`         | > 20                   |
| `learning_feedback_lag`      | > 1 h                  |

### Log locations (Erik)

```
/var/log/llm-gateway/out.log           # gateway stdout
/var/log/llm-gateway/error.log         # gateway stderr
/var/log/llm-gateway/learning-out.log  # learning engine stdout
/var/log/llm-gateway/learning-error.log
```

---

## 9. Cloudflare Tunnel

See `deploy/cloudflare-tunnel.md` for instructions to expose the gateway via `https://llm-gateway.context-x.org`.

---

## 10. Docker (alternative to PM2)

```bash
# Build and start all services
cp .env.example .env   # fill in DATABASE_URL
docker compose up -d

# Check status
docker compose ps
docker compose logs llm-gateway

# Stop
docker compose down
```

---

## Repository structure

```
llm-gateway/
├── packages/
│   ├── gateway/         # Core HTTP server (Express + Ollama + ShieldX)
│   │   ├── src/
│   │   │   ├── server.ts
│   │   │   ├── routes/
│   │   │   ├── db/
│   │   │   │   └── migrations/
│   │   │   └── prompts/
│   │   └── prompts/     # Versioned prompt templates
│   ├── learning/        # Self-improving feedback engine
│   │   └── src/
│   └── client/          # @llm-gateway/client TypeScript library
│       └── src/index.ts
├── deploy/
│   ├── setup-erik.sh       # First-time server setup
│   ├── deploy.sh           # One-command local → Erik deploy
│   ├── ecosystem.config.cjs # PM2 config
│   ├── nginx.conf          # Optional nginx reverse proxy
│   └── cloudflare-tunnel.md
├── scripts/
│   ├── init-db.sh          # Database initialization
│   └── pull-models.sh      # Pull Ollama models
├── Dockerfile
├── docker-compose.yaml
├── .env.example
└── package.json            # npm workspaces root
```