Rene Fichtmueller 200cc7f2dc fix: Correct Cloudflare tunnel and setup script to use port 3103
The LLM Gateway is configured to run on port 3103 in ecosystem.config.cjs,
but the Cloudflare tunnel configuration and setup script were referencing port
3100, causing 502 Bad Gateway errors.

Updates:
- cloudflare-tunnel.md: Changed tunnel ingress from localhost:3100 to localhost:3103
- setup-erik.sh: Updated health check URL and output messages to port 3103
- This fixes the Cloudflare tunnel connection that was causing public HTTPS access to fail

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-04-26 21:04:36 +02:00

LLM Gateway

Centralized AI inference layer for all Context X projects. Routes requests to local Ollama models on Mac Studio (192.168.178.169), validates outputs with ShieldX, and records all interactions for the self-improving learning engine.

Port: 3100 Production: http://llm-gateway.context-x.org (Cloudflare Tunnel → Erik)


Architecture

Projects (TIP, EO Pulse, SwitchBlade, PeerCortex, NOGnet, CtxEvent)
    ↓  @llm-gateway/client
LLM Gateway :3100
    ├── Prompt Engine   (versioned templates per task_type)
    ├── ShieldX Guard   (prompt injection validation)
    ├── Ollama Router   (model tier selection: 3b / 14b / 32b / 70b)
    └── Learning Engine (feedback loop, self-improvement)
         ↓
    PostgreSQL (llm_gateway DB)
    Ollama     (Mac Studio :11434)

Prerequisites

Dependency Version Notes
Node.js 22+ node --version
PostgreSQL 17 Local or remote
Ollama latest Running on Mac Studio .169
PM2 latest npm install -g pm2 (Erik)

1. Local Development Setup

# Clone
git clone http://gitea.context-x.org/rene/llm-gateway.git
cd llm-gateway

# Install all workspace dependencies
npm install

# Copy and configure environment
cp .env.example .env
# Edit .env: set DATABASE_URL, OLLAMA_URL at minimum

# Initialize database
bash scripts/init-db.sh

# Pull required Ollama models (runs against OLLAMA_URL from .env)
bash scripts/pull-models.sh

# Start gateway
npm run dev

# In a separate terminal: start learning engine
npm run learning

Gateway is available at http://localhost:3100.


2. Environment Variables

See .env.example for all variables with descriptions.

Variable Required Default Description
DATABASE_URL YES PostgreSQL DSN for llm_gateway
TIP_DATABASE_URL NO TIP DB (read-only)
OLLAMA_URL YES http://...169:11434 Ollama inference server
SHIELDX_URL NO ShieldX endpoint (leave blank to skip)
PORT NO 3100 HTTP port
LOG_LEVEL NO info error / warn / info / debug

3. Running Migrations

# Full init (create DB + user + run all migrations)
bash scripts/init-db.sh

# Custom Postgres host (e.g. Erik)
PGHOST=217.154.82.179 PGPORT=5432 bash scripts/init-db.sh

Migration files live in:

  • packages/gateway/src/db/migrations/001_initial.sql
  • packages/learning/src/db/migrations/002_learning.sql

4. Pulling Ollama Models

bash scripts/pull-models.sh

# Against a different Ollama instance:
OLLAMA_URL=http://localhost:11434 bash scripts/pull-models.sh

Required models:

Model Tier Use case
qwen2.5:3b Fast Low-complexity, sub-second tasks
qwen2.5:14b Medium Standard completions
qwen2.5:32b Large Complex analysis
deepseek-r1:14b Reasoning Step-by-step logic
llama3.3:70b Premium Best quality, used sparingly

5. API Usage

Completion

curl -X POST http://localhost:3100/v1/completion \
  -H "Content-Type: application/json" \
  -d '{
    "caller": "my-project",
    "task_type": "summarize",
    "input": "Long document text here...",
    "language": "en"
  }'

Response:

{
  "request_id": "uuid",
  "status": "approved",
  "output": "Summary...",
  "confidence": 0.92,
  "model_used": "qwen2.5:14b",
  "prompt_version": "summarize/v2",
  "token_count": { "input": 512, "output": 128 },
  "latency_ms": 1240
}

Classify input

curl -X POST http://localhost:3100/v1/classify \
  -H "Content-Type: application/json" \
  -d '{ "caller": "my-project", "input": "What transceivers work with Cisco ASR9k?" }'

Health

curl http://localhost:3100/health
curl http://localhost:3100/health/live   # liveness probe (k8s / Docker)
curl http://localhost:3100/health/ready  # readiness probe

6. Project-specific Client Usage

Install the client in any workspace project:

npm install @llm-gateway/client

TIP (Transceiver Intelligence Platform)

import { createTIPClient } from '@llm-gateway/client';

const llm = createTIPClient(); // reads LLM_GATEWAY_URL from env

const result = await llm.completion({
  task_type: 'extract_specs',
  input: rawHtml,
  context: { vendor: 'Cisco', sku: 'SFP-10G-SR' },
});

if (result.status === 'approved') {
  console.log(result.output);
}

EO Global Pulse

import { createEOPulseClient } from '@llm-gateway/client';

const llm = createEOPulseClient();

// Safe completion: returns null when gateway is down (graceful degradation)
const result = await llm.safeCompletion({
  task_type: 'meeting_summary',
  input: transcriptText,
  language: 'de',
});

SwitchBlade

import { createSwitchBladeClient } from '@llm-gateway/client';

const llm = createSwitchBladeClient();

const { batch_id } = await llm.batch(
  tasks.map(t => ({ task_type: 'analyze_alert', input: t.raw })),
  'http://switchblade.context-x.org/webhooks/llm-batch',
);

Custom client (any project)

import { LLMGatewayClient } from '@llm-gateway/client';

const llm = new LLMGatewayClient({
  caller: 'my-service',
  baseUrl: process.env.LLM_GATEWAY_URL,
  timeout: 20_000,
});

7. Deployment to Erik

One-command deploy (from local Mac)

bash deploy/deploy.sh

# Skip local build (if already built):
bash deploy/deploy.sh --skip-build

# Health check only:
bash deploy/deploy.sh --health-only

First-time setup on Erik

# SSH to Erik
ssh root@217.154.82.179

# Run setup script (idempotent — safe to re-run)
cd /opt/llm-gateway
bash deploy/setup-erik.sh

PM2 management

ssh erik "pm2 status"
ssh erik "pm2 logs llm-gateway"
ssh erik "pm2 logs llm-learning"
ssh erik "pm2 restart llm-gateway"
ssh erik "pm2 monit"

8. Monitoring

Prometheus metrics

GET http://localhost:3100/metrics

Grafana

Metrics are scraped by the existing Prometheus instance. Import the dashboard from deploy/grafana-dashboard.json (if present).

Key metrics to watch

Metric Alert threshold
gateway_request_latency_p99 > 5 000 ms
gateway_error_rate > 5%
ollama_queue_depth > 20
learning_feedback_lag > 1 h

Log locations (Erik)

/var/log/llm-gateway/out.log           # gateway stdout
/var/log/llm-gateway/error.log         # gateway stderr
/var/log/llm-gateway/learning-out.log  # learning engine stdout
/var/log/llm-gateway/learning-error.log

9. Cloudflare Tunnel

See deploy/cloudflare-tunnel.md for instructions to expose the gateway via https://llm-gateway.context-x.org.


10. Docker (alternative to PM2)

# Build and start all services
cp .env.example .env   # fill in DATABASE_URL
docker compose up -d

# Check status
docker compose ps
docker compose logs llm-gateway

# Stop
docker compose down

Repository structure

llm-gateway/
├── packages/
│   ├── gateway/         # Core HTTP server (Express + Ollama + ShieldX)
│   │   ├── src/
│   │   │   ├── server.ts
│   │   │   ├── routes/
│   │   │   ├── db/
│   │   │   │   └── migrations/
│   │   │   └── prompts/
│   │   └── prompts/     # Versioned prompt templates
│   ├── learning/        # Self-improving feedback engine
│   │   └── src/
│   └── client/          # @llm-gateway/client TypeScript library
│       └── src/index.ts
├── deploy/
│   ├── setup-erik.sh       # First-time server setup
│   ├── deploy.sh           # One-command local → Erik deploy
│   ├── ecosystem.config.cjs # PM2 config
│   ├── nginx.conf          # Optional nginx reverse proxy
│   └── cloudflare-tunnel.md
├── scripts/
│   ├── init-db.sh          # Database initialization
│   └── pull-models.sh      # Pull Ollama models
├── Dockerfile
├── docker-compose.yaml
├── .env.example
└── package.json            # npm workspaces root
Description
Unified LLM orchestration layer for TIP, EO Global Pulse, PeerCortex, SwitchBlade, NOGnet, ShieldX
Readme 112 MiB
Languages
TypeScript 52.1%
Python 24.4%
HTML 18.7%
Shell 3.8%
JavaScript 0.9%
Other 0.1%