2026-05-12 23:31:02 +02:00
2026-05-12 23:31:02 +02:00

LLM Gateway

Centralized AI inference layer for all Context X projects. Routes requests to local Ollama models on Mac Studio (192.168.178.169), validates outputs with ShieldX, and records all interactions for the self-improving learning engine.

Port: 3100 Production: http://llm-gateway.context-x.org (Cloudflare Tunnel → Erik)


Architecture

Projects (TIP, EO Pulse, SwitchBlade, PeerCortex, NOGnet, CtxEvent)
    ↓  @llm-gateway/client
LLM Gateway :3100
    ├── Prompt Engine   (versioned templates per task_type)
    ├── ShieldX Guard   (prompt injection validation)
    ├── Ollama Router   (model tier selection: 3b / 14b / 32b / 70b)
    └── Learning Engine (feedback loop, self-improvement)
         ↓
    PostgreSQL (llm_gateway DB)
    Ollama     (Mac Studio :11434)

Prerequisites

Dependency Version Notes
Node.js 22+ node --version
PostgreSQL 17 Local or remote
Ollama latest Running on Mac Studio .169
PM2 latest npm install -g pm2 (Erik)

1. Local Development Setup

# Clone
git clone http://gitea.context-x.org/rene/llm-gateway.git
cd llm-gateway

# Install all workspace dependencies
npm install

# Copy and configure environment
cp .env.example .env
# Edit .env: set DATABASE_URL, OLLAMA_URL at minimum

# Initialize database
bash scripts/init-db.sh

# Pull required Ollama models (runs against OLLAMA_URL from .env)
bash scripts/pull-models.sh

# Start gateway
npm run dev

# In a separate terminal: start learning engine
npm run learning

Gateway is available at http://localhost:3100.


2. Environment Variables

See .env.example for all variables with descriptions.

Variable Required Default Description
DATABASE_URL YES PostgreSQL DSN for llm_gateway
TIP_DATABASE_URL NO TIP DB (read-only)
OLLAMA_URL YES http://...169:11434 Ollama inference server
SHIELDX_URL NO ShieldX endpoint (leave blank to skip)
PORT NO 3100 HTTP port
LOG_LEVEL NO info error / warn / info / debug

3. Running Migrations

# Full init (create DB + user + run all migrations)
bash scripts/init-db.sh

# Custom Postgres host (e.g. Erik)
PGHOST=217.154.82.179 PGPORT=5432 bash scripts/init-db.sh

Migration files live in:

  • packages/gateway/src/db/migrations/001_initial.sql
  • packages/learning/src/db/migrations/002_learning.sql

4. Pulling Ollama Models

bash scripts/pull-models.sh

# Against a different Ollama instance:
OLLAMA_URL=http://localhost:11434 bash scripts/pull-models.sh

Required models:

Model Tier Use case
qwen2.5:3b Fast Low-complexity, sub-second tasks
qwen2.5:14b Medium Standard completions
qwen2.5:32b Large Complex analysis
deepseek-r1:14b Reasoning Step-by-step logic
llama3.3:70b Premium Best quality, used sparingly

5. API Usage

Completion

curl -X POST http://localhost:3100/v1/completion \
  -H "Content-Type: application/json" \
  -d '{
    "caller": "my-project",
    "task_type": "summarize",
    "input": "Long document text here...",
    "language": "en"
  }'

Response:

{
  "request_id": "uuid",
  "status": "approved",
  "output": "Summary...",
  "confidence": 0.92,
  "model_used": "qwen2.5:14b",
  "prompt_version": "summarize/v2",
  "token_count": { "input": 512, "output": 128 },
  "latency_ms": 1240
}

Classify input

curl -X POST http://localhost:3100/v1/classify \
  -H "Content-Type: application/json" \
  -d '{ "caller": "my-project", "input": "What transceivers work with Cisco ASR9k?" }'

Health

curl http://localhost:3100/health
curl http://localhost:3100/health/live   # liveness probe (k8s / Docker)
curl http://localhost:3100/health/ready  # readiness probe

6. Project-specific Client Usage

Install the client in any workspace project:

npm install @llm-gateway/client

TIP (Transceiver Intelligence Platform)

import { createTIPClient } from '@llm-gateway/client';

const llm = createTIPClient(); // reads LLM_GATEWAY_URL from env

const result = await llm.completion({
  task_type: 'extract_specs',
  input: rawHtml,
  context: { vendor: 'Cisco', sku: 'SFP-10G-SR' },
});

if (result.status === 'approved') {
  console.log(result.output);
}

EO Global Pulse

import { createEOPulseClient } from '@llm-gateway/client';

const llm = createEOPulseClient();

// Safe completion: returns null when gateway is down (graceful degradation)
const result = await llm.safeCompletion({
  task_type: 'meeting_summary',
  input: transcriptText,
  language: 'de',
});

SwitchBlade

import { createSwitchBladeClient } from '@llm-gateway/client';

const llm = createSwitchBladeClient();

const { batch_id } = await llm.batch(
  tasks.map(t => ({ task_type: 'analyze_alert', input: t.raw })),
  'http://switchblade.context-x.org/webhooks/llm-batch',
);

Custom client (any project)

import { LLMGatewayClient } from '@llm-gateway/client';

const llm = new LLMGatewayClient({
  caller: 'my-service',
  baseUrl: process.env.LLM_GATEWAY_URL,
  timeout: 20_000,
});

7. Deployment to Erik

One-command deploy (from local Mac)

bash deploy/deploy.sh

# Skip local build (if already built):
bash deploy/deploy.sh --skip-build

# Health check only:
bash deploy/deploy.sh --health-only

First-time setup on Erik

# SSH to Erik
ssh root@217.154.82.179

# Run setup script (idempotent — safe to re-run)
cd /opt/llm-gateway
bash deploy/setup-erik.sh

PM2 management

ssh erik "pm2 status"
ssh erik "pm2 logs llm-gateway"
ssh erik "pm2 logs llm-learning"
ssh erik "pm2 restart llm-gateway"
ssh erik "pm2 monit"

8. Monitoring

Prometheus metrics

GET http://localhost:3100/metrics

Grafana

Metrics are scraped by the existing Prometheus instance. Import the dashboard from deploy/grafana-dashboard.json (if present).

Key metrics to watch

Metric Alert threshold
gateway_request_latency_p99 > 5 000 ms
gateway_error_rate > 5%
ollama_queue_depth > 20
learning_feedback_lag > 1 h

Log locations (Erik)

/var/log/llm-gateway/out.log           # gateway stdout
/var/log/llm-gateway/error.log         # gateway stderr
/var/log/llm-gateway/learning-out.log  # learning engine stdout
/var/log/llm-gateway/learning-error.log

9. Cloudflare Tunnel

See deploy/cloudflare-tunnel.md for instructions to expose the gateway via https://llm-gateway.context-x.org.


10. Docker (alternative to PM2)

# Build and start all services
cp .env.example .env   # fill in DATABASE_URL
docker compose up -d

# Check status
docker compose ps
docker compose logs llm-gateway

# Stop
docker compose down

Repository structure

llm-gateway/
├── packages/
│   ├── gateway/         # Core HTTP server (Express + Ollama + ShieldX)
│   │   ├── src/
│   │   │   ├── server.ts
│   │   │   ├── routes/
│   │   │   ├── db/
│   │   │   │   └── migrations/
│   │   │   └── prompts/
│   │   └── prompts/     # Versioned prompt templates
│   ├── learning/        # Self-improving feedback engine
│   │   └── src/
│   └── client/          # @llm-gateway/client TypeScript library
│       └── src/index.ts
├── deploy/
│   ├── setup-erik.sh       # First-time server setup
│   ├── deploy.sh           # One-command local → Erik deploy
│   ├── ecosystem.config.cjs # PM2 config
│   ├── nginx.conf          # Optional nginx reverse proxy
│   └── cloudflare-tunnel.md
├── scripts/
│   ├── init-db.sh          # Database initialization
│   └── pull-models.sh      # Pull Ollama models
├── Dockerfile
├── docker-compose.yaml
├── .env.example
└── package.json            # npm workspaces root
Description
Unified LLM orchestration layer for TIP, EO Global Pulse, PeerCortex, SwitchBlade, NOGnet, ShieldX
Readme 112 MiB
Languages
TypeScript 52.1%
Python 24.4%
HTML 18.7%
Shell 3.8%
JavaScript 0.9%
Other 0.1%