Go to file

Rene Fichtmueller 200cc7f2dc fix: Correct Cloudflare tunnel and setup script to use port 3103

The LLM Gateway is configured to run on port 3103 in ecosystem.config.cjs,
but the Cloudflare tunnel configuration and setup script were referencing port
3100, causing 502 Bad Gateway errors.

Updates:
- cloudflare-tunnel.md: Changed tunnel ingress from localhost:3100 to localhost:3103
- setup-erik.sh: Updated health check URL and output messages to port 3103
- This fixes the Cloudflare tunnel connection that was causing public HTTPS access to fail

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

2026-04-26 21:04:36 +02:00

copilot-bridge

fix: correct copilot-api dependency version to 0.7.0 (published on npm)

2026-04-25 12:41:30 +02:00

deploy

fix: Correct Cloudflare tunnel and setup script to use port 3103

2026-04-26 21:04:36 +02:00

docs/adr

feat: Complete LightRAG Sidecar Phase 2 — Hybrid Retrieval Implementation

2026-04-25 05:47:18 +02:00

knowledge

feat: Phase 2F — Multi-Agent Integration (ADRs + Client Fallback + Tests)

2026-04-19 21:39:44 +02:00

openai-bridge

feat: integrate GitHub Copilot as third LLM provider via copilot-bridge

2026-04-25 12:38:30 +02:00

packages

fix: add missing jose dependency for JWT validation

2026-04-26 20:45:05 +02:00

scripts

feat: integrate GitHub Copilot as third LLM provider via copilot-bridge

2026-04-25 12:38:30 +02:00

src

fix: only send HSTS header on HTTPS connections, not HTTP

2026-04-26 19:01:41 +02:00

.gitignore

feat: initial llm-gateway implementation

2026-04-02 22:48:55 +02:00

CHANGELOG_PENDING.md

feat: Phase 2F — Multi-Agent Integration (ADRs + Client Fallback + Tests)

2026-04-19 21:39:44 +02:00

DEPLOYMENT_BLOCKED.md

feat: Complete LightRAG Sidecar Phase 2 — Hybrid Retrieval Implementation

2026-04-25 05:47:18 +02:00

DEPLOYMENT-BRIDGES.md

feat: integrate GitHub Copilot as third LLM provider via copilot-bridge

2026-04-25 12:38:30 +02:00

docker-compose.yaml

refactor: MAGATAMA pipeline code quality audit — all functions <50 lines

2026-04-25 17:38:11 +02:00

Dockerfile

feat: initial llm-gateway implementation

2026-04-02 22:48:55 +02:00

FINDINGS_DATABASE.json

fix: only send HSTS header on HTTPS connections, not HTTP

2026-04-26 19:01:41 +02:00

FINDINGS_RESOLVED.md

fix: only send HSTS header on HTTPS connections, not HTTP

2026-04-26 19:01:41 +02:00

INTEGRATION-STATUS.md

docs: update integration status to reflect full provider integration (OpenAI + Copilot)

2026-04-25 12:40:17 +02:00

MAGATAMA_DEPLOY_STATE.md

chore: MAGATAMA deployment state — download in progress, Pi-hole bypassed

2026-04-16 16:35:54 +02:00

package-lock.json

fix: Correct Cloudflare tunnel and setup script to use port 3103

2026-04-26 21:04:36 +02:00

package.json

fix: only send HSTS header on HTTPS connections, not HTTP

2026-04-26 19:01:41 +02:00

PHASE_2F_DEPLOYMENT.md

feat: Phase 2F — Multi-Agent Integration (ADRs + Client Fallback + Tests)

2026-04-19 21:39:44 +02:00

README.md

feat: initial llm-gateway implementation

2026-04-02 22:48:55 +02:00

README.md

LLM Gateway

Centralized AI inference layer for all Context X projects. Routes requests to local Ollama models on Mac Studio (192.168.178.169), validates outputs with ShieldX, and records all interactions for the self-improving learning engine.

Port: 3100 Production: http://llm-gateway.context-x.org (Cloudflare Tunnel → Erik)

Architecture

Projects (TIP, EO Pulse, SwitchBlade, PeerCortex, NOGnet, CtxEvent)
    ↓  @llm-gateway/client
LLM Gateway :3100
    ├── Prompt Engine   (versioned templates per task_type)
    ├── ShieldX Guard   (prompt injection validation)
    ├── Ollama Router   (model tier selection: 3b / 14b / 32b / 70b)
    └── Learning Engine (feedback loop, self-improvement)
         ↓
    PostgreSQL (llm_gateway DB)
    Ollama     (Mac Studio :11434)

Prerequisites

Dependency	Version	Notes
Node.js	22+	`node --version`
PostgreSQL	17	Local or remote
Ollama	latest	Running on Mac Studio .169
PM2	latest	`npm install -g pm2` (Erik)

1. Local Development Setup

# Clone
git clone http://gitea.context-x.org/rene/llm-gateway.git
cd llm-gateway

# Install all workspace dependencies
npm install

# Copy and configure environment
cp .env.example .env
# Edit .env: set DATABASE_URL, OLLAMA_URL at minimum

# Initialize database
bash scripts/init-db.sh

# Pull required Ollama models (runs against OLLAMA_URL from .env)
bash scripts/pull-models.sh

# Start gateway
npm run dev

# In a separate terminal: start learning engine
npm run learning

Gateway is available at http://localhost:3100.

2. Environment Variables

See .env.example for all variables with descriptions.

Variable	Required	Default	Description
`DATABASE_URL`	YES	—	PostgreSQL DSN for llm_gateway
`TIP_DATABASE_URL`	NO	—	TIP DB (read-only)
`OLLAMA_URL`	YES	http://...169:11434	Ollama inference server
`SHIELDX_URL`	NO	—	ShieldX endpoint (leave blank to skip)
`PORT`	NO	3100	HTTP port
`LOG_LEVEL`	NO	info	error / warn / info / debug

3. Running Migrations

# Full init (create DB + user + run all migrations)
bash scripts/init-db.sh

# Custom Postgres host (e.g. Erik)
PGHOST=217.154.82.179 PGPORT=5432 bash scripts/init-db.sh

Migration files live in:

packages/gateway/src/db/migrations/001_initial.sql
packages/learning/src/db/migrations/002_learning.sql

4. Pulling Ollama Models

bash scripts/pull-models.sh

# Against a different Ollama instance:
OLLAMA_URL=http://localhost:11434 bash scripts/pull-models.sh

Required models:

Model	Tier	Use case
`qwen2.5:3b`	Fast	Low-complexity, sub-second tasks
`qwen2.5:14b`	Medium	Standard completions
`qwen2.5:32b`	Large	Complex analysis
`deepseek-r1:14b`	Reasoning	Step-by-step logic
`llama3.3:70b`	Premium	Best quality, used sparingly

5. API Usage

Completion

curl -X POST http://localhost:3100/v1/completion \
  -H "Content-Type: application/json" \
  -d '{
    "caller": "my-project",
    "task_type": "summarize",
    "input": "Long document text here...",
    "language": "en"
  }'

Response:

{
  "request_id": "uuid",
  "status": "approved",
  "output": "Summary...",
  "confidence": 0.92,
  "model_used": "qwen2.5:14b",
  "prompt_version": "summarize/v2",
  "token_count": { "input": 512, "output": 128 },
  "latency_ms": 1240
}

Classify input

curl -X POST http://localhost:3100/v1/classify \
  -H "Content-Type: application/json" \
  -d '{ "caller": "my-project", "input": "What transceivers work with Cisco ASR9k?" }'

Health

curl http://localhost:3100/health
curl http://localhost:3100/health/live   # liveness probe (k8s / Docker)
curl http://localhost:3100/health/ready  # readiness probe

6. Project-specific Client Usage

Install the client in any workspace project:

npm install @llm-gateway/client

TIP (Transceiver Intelligence Platform)

import { createTIPClient } from '@llm-gateway/client';

const llm = createTIPClient(); // reads LLM_GATEWAY_URL from env

const result = await llm.completion({
  task_type: 'extract_specs',
  input: rawHtml,
  context: { vendor: 'Cisco', sku: 'SFP-10G-SR' },
});

if (result.status === 'approved') {
  console.log(result.output);
}

EO Global Pulse

import { createEOPulseClient } from '@llm-gateway/client';

const llm = createEOPulseClient();

// Safe completion: returns null when gateway is down (graceful degradation)
const result = await llm.safeCompletion({
  task_type: 'meeting_summary',
  input: transcriptText,
  language: 'de',
});

SwitchBlade

import { createSwitchBladeClient } from '@llm-gateway/client';

const llm = createSwitchBladeClient();

const { batch_id } = await llm.batch(
  tasks.map(t => ({ task_type: 'analyze_alert', input: t.raw })),
  'http://switchblade.context-x.org/webhooks/llm-batch',
);

Custom client (any project)

import { LLMGatewayClient } from '@llm-gateway/client';

const llm = new LLMGatewayClient({
  caller: 'my-service',
  baseUrl: process.env.LLM_GATEWAY_URL,
  timeout: 20_000,
});

7. Deployment to Erik

One-command deploy (from local Mac)

bash deploy/deploy.sh

# Skip local build (if already built):
bash deploy/deploy.sh --skip-build

# Health check only:
bash deploy/deploy.sh --health-only

First-time setup on Erik

# SSH to Erik
ssh root@217.154.82.179

# Run setup script (idempotent — safe to re-run)
cd /opt/llm-gateway
bash deploy/setup-erik.sh

PM2 management

ssh erik "pm2 status"
ssh erik "pm2 logs llm-gateway"
ssh erik "pm2 logs llm-learning"
ssh erik "pm2 restart llm-gateway"
ssh erik "pm2 monit"

8. Monitoring

Prometheus metrics

GET http://localhost:3100/metrics

Grafana

Metrics are scraped by the existing Prometheus instance. Import the dashboard from deploy/grafana-dashboard.json (if present).

Key metrics to watch

Metric	Alert threshold
`gateway_request_latency_p99`	> 5 000 ms
`gateway_error_rate`	> 5%
`ollama_queue_depth`	> 20
`learning_feedback_lag`	> 1 h

Log locations (Erik)

/var/log/llm-gateway/out.log           # gateway stdout
/var/log/llm-gateway/error.log         # gateway stderr
/var/log/llm-gateway/learning-out.log  # learning engine stdout
/var/log/llm-gateway/learning-error.log

9. Cloudflare Tunnel

See deploy/cloudflare-tunnel.md for instructions to expose the gateway via https://llm-gateway.context-x.org.

10. Docker (alternative to PM2)

# Build and start all services
cp .env.example .env   # fill in DATABASE_URL
docker compose up -d

# Check status
docker compose ps
docker compose logs llm-gateway

# Stop
docker compose down

Repository structure

llm-gateway/
├── packages/
│   ├── gateway/         # Core HTTP server (Express + Ollama + ShieldX)
│   │   ├── src/
│   │   │   ├── server.ts
│   │   │   ├── routes/
│   │   │   ├── db/
│   │   │   │   └── migrations/
│   │   │   └── prompts/
│   │   └── prompts/     # Versioned prompt templates
│   ├── learning/        # Self-improving feedback engine
│   │   └── src/
│   └── client/          # @llm-gateway/client TypeScript library
│       └── src/index.ts
├── deploy/
│   ├── setup-erik.sh       # First-time server setup
│   ├── deploy.sh           # One-command local → Erik deploy
│   ├── ecosystem.config.cjs # PM2 config
│   ├── nginx.conf          # Optional nginx reverse proxy
│   └── cloudflare-tunnel.md
├── scripts/
│   ├── init-db.sh          # Database initialization
│   └── pull-models.sh      # Pull Ollama models
├── Dockerfile
├── docker-compose.yaml
├── .env.example
└── package.json            # npm workspaces root

Languages

TypeScript 52.1%

Python 24.4%

HTML 18.7%

Shell 3.8%

JavaScript 0.9%

Other 0.1%