Was hardcoded to qwen2.5:3b. Now reads from process.env LLM_JUDGE_MODEL
with qwen2.5:3b fallback.
Production env updated to magatama-coder:judge-r1 — a snapshot of the
magatamallm post-chunk-4 LoRA adapter exported via train.py --export-only.
Chunk-4 picked because it had the best val_loss (0.861) of the 5 balanced
chunks; chunk-5 spiked back to val=2.531.
Sanity test on the new judge model:
injection prompt -> "INFORMATIONAL" (not the strict INJECTION word
we'd want — judge needs Phase-2
dedicated fine-tune on binary
classification format)
safe prompt -> "SAFE" (correct)
Implication: INJECTION_DEFENSE_MODE is staying at 'block' for now —
switching to 'llm_judge' mode with this provisional judge would actually
weaken defense because magatamallm's training tilts toward operator-task
output ("here's the fix") rather than binary INJECTION/SAFE classification.
Follow-up (Phase 2): train a dedicated `magatama-judge` model — small base
(Qwen 2.5:1.5b or Phi-3-mini), trained purely on injection-classification
SFT pairs extracted from our existing:
- llm-security-prompt-injection-2026-05-12.train.jsonl
- pulso-magatama-injection-guard-2026-05-13.train.jsonl
- guard-exposure-firewall-verified-2026-05-16.train.jsonl
- jailbreak-corpus-candidates.jsonl (L1B3RT4S gaps)
- benign samples from train.jsonl labeled SAFE
Architecture rationale: separation of concerns. Even if attacker manipulates
the primary backbone model, judge stays independent. ~5-10k pairs should
be enough for a focused 1.5B classifier. Training ~2-3h on Mac Studio MPS.
LLM Gateway
Centralized AI inference layer for all Context X projects. Routes requests to local Ollama models on Mac Studio (192.168.178.169), validates outputs with ShieldX, and records all interactions for the self-improving learning engine.
Port: 3100 Production: http://llm-gateway.context-x.org (Cloudflare Tunnel → Erik)
Architecture
Projects (TIP, EO Pulse, SwitchBlade, PeerCortex, NOGnet, CtxEvent)
↓ @llm-gateway/client
LLM Gateway :3100
├── Prompt Engine (versioned templates per task_type)
├── ShieldX Guard (prompt injection validation)
├── Ollama Router (model tier selection: 3b / 14b / 32b / 70b)
└── Learning Engine (feedback loop, self-improvement)
↓
PostgreSQL (llm_gateway DB)
Ollama (Mac Studio :11434)
Prerequisites
| Dependency | Version | Notes |
|---|---|---|
| Node.js | 22+ | node --version |
| PostgreSQL | 17 | Local or remote |
| Ollama | latest | Running on Mac Studio .169 |
| PM2 | latest | npm install -g pm2 (Erik) |
1. Local Development Setup
# Clone
git clone http://gitea.context-x.org/rene/llm-gateway.git
cd llm-gateway
# Install all workspace dependencies
npm install
# Copy and configure environment
cp .env.example .env
# Edit .env: set DATABASE_URL, OLLAMA_URL at minimum
# Initialize database
bash scripts/init-db.sh
# Pull required Ollama models (runs against OLLAMA_URL from .env)
bash scripts/pull-models.sh
# Start gateway
npm run dev
# In a separate terminal: start learning engine
npm run learning
Gateway is available at http://localhost:3100.
2. Environment Variables
See .env.example for all variables with descriptions.
| Variable | Required | Default | Description |
|---|---|---|---|
DATABASE_URL |
YES | — | PostgreSQL DSN for llm_gateway |
TIP_DATABASE_URL |
NO | — | TIP DB (read-only) |
OLLAMA_URL |
YES | http://...169:11434 | Ollama inference server |
SHIELDX_URL |
NO | — | ShieldX endpoint (leave blank to skip) |
PORT |
NO | 3100 | HTTP port |
LOG_LEVEL |
NO | info | error / warn / info / debug |
3. Running Migrations
# Full init (create DB + user + run all migrations)
bash scripts/init-db.sh
# Custom Postgres host (e.g. Erik)
PGHOST=217.154.82.179 PGPORT=5432 bash scripts/init-db.sh
Migration files live in:
packages/gateway/src/db/migrations/001_initial.sqlpackages/learning/src/db/migrations/002_learning.sql
4. Pulling Ollama Models
bash scripts/pull-models.sh
# Against a different Ollama instance:
OLLAMA_URL=http://localhost:11434 bash scripts/pull-models.sh
Required models:
| Model | Tier | Use case |
|---|---|---|
qwen2.5:3b |
Fast | Low-complexity, sub-second tasks |
qwen2.5:14b |
Medium | Standard completions |
qwen2.5:32b |
Large | Complex analysis |
deepseek-r1:14b |
Reasoning | Step-by-step logic |
llama3.3:70b |
Premium | Best quality, used sparingly |
5. API Usage
Completion
curl -X POST http://localhost:3100/v1/completion \
-H "Content-Type: application/json" \
-d '{
"caller": "my-project",
"task_type": "summarize",
"input": "Long document text here...",
"language": "en"
}'
Response:
{
"request_id": "uuid",
"status": "approved",
"output": "Summary...",
"confidence": 0.92,
"model_used": "qwen2.5:14b",
"prompt_version": "summarize/v2",
"token_count": { "input": 512, "output": 128 },
"latency_ms": 1240
}
Classify input
curl -X POST http://localhost:3100/v1/classify \
-H "Content-Type: application/json" \
-d '{ "caller": "my-project", "input": "What transceivers work with Cisco ASR9k?" }'
Health
curl http://localhost:3100/health
curl http://localhost:3100/health/live # liveness probe (k8s / Docker)
curl http://localhost:3100/health/ready # readiness probe
6. Project-specific Client Usage
Install the client in any workspace project:
npm install @llm-gateway/client
TIP (Transceiver Intelligence Platform)
import { createTIPClient } from '@llm-gateway/client';
const llm = createTIPClient(); // reads LLM_GATEWAY_URL from env
const result = await llm.completion({
task_type: 'extract_specs',
input: rawHtml,
context: { vendor: 'Cisco', sku: 'SFP-10G-SR' },
});
if (result.status === 'approved') {
console.log(result.output);
}
EO Global Pulse
import { createEOPulseClient } from '@llm-gateway/client';
const llm = createEOPulseClient();
// Safe completion: returns null when gateway is down (graceful degradation)
const result = await llm.safeCompletion({
task_type: 'meeting_summary',
input: transcriptText,
language: 'de',
});
SwitchBlade
import { createSwitchBladeClient } from '@llm-gateway/client';
const llm = createSwitchBladeClient();
const { batch_id } = await llm.batch(
tasks.map(t => ({ task_type: 'analyze_alert', input: t.raw })),
'http://switchblade.context-x.org/webhooks/llm-batch',
);
Custom client (any project)
import { LLMGatewayClient } from '@llm-gateway/client';
const llm = new LLMGatewayClient({
caller: 'my-service',
baseUrl: process.env.LLM_GATEWAY_URL,
timeout: 20_000,
});
7. Deployment to Erik
One-command deploy (from local Mac)
bash deploy/deploy.sh
# Skip local build (if already built):
bash deploy/deploy.sh --skip-build
# Health check only:
bash deploy/deploy.sh --health-only
First-time setup on Erik
# SSH to Erik
ssh root@217.154.82.179
# Run setup script (idempotent — safe to re-run)
cd /opt/llm-gateway
bash deploy/setup-erik.sh
PM2 management
ssh erik "pm2 status"
ssh erik "pm2 logs llm-gateway"
ssh erik "pm2 logs llm-learning"
ssh erik "pm2 restart llm-gateway"
ssh erik "pm2 monit"
8. Monitoring
Prometheus metrics
GET http://localhost:3100/metrics
Grafana
Metrics are scraped by the existing Prometheus instance. Import the dashboard from deploy/grafana-dashboard.json (if present).
Key metrics to watch
| Metric | Alert threshold |
|---|---|
gateway_request_latency_p99 |
> 5 000 ms |
gateway_error_rate |
> 5% |
ollama_queue_depth |
> 20 |
learning_feedback_lag |
> 1 h |
Log locations (Erik)
/var/log/llm-gateway/out.log # gateway stdout
/var/log/llm-gateway/error.log # gateway stderr
/var/log/llm-gateway/learning-out.log # learning engine stdout
/var/log/llm-gateway/learning-error.log
9. Cloudflare Tunnel
See deploy/cloudflare-tunnel.md for instructions to expose the gateway via https://llm-gateway.context-x.org.
10. Docker (alternative to PM2)
# Build and start all services
cp .env.example .env # fill in DATABASE_URL
docker compose up -d
# Check status
docker compose ps
docker compose logs llm-gateway
# Stop
docker compose down
Repository structure
llm-gateway/
├── packages/
│ ├── gateway/ # Core HTTP server (Express + Ollama + ShieldX)
│ │ ├── src/
│ │ │ ├── server.ts
│ │ │ ├── routes/
│ │ │ ├── db/
│ │ │ │ └── migrations/
│ │ │ └── prompts/
│ │ └── prompts/ # Versioned prompt templates
│ ├── learning/ # Self-improving feedback engine
│ │ └── src/
│ └── client/ # @llm-gateway/client TypeScript library
│ └── src/index.ts
├── deploy/
│ ├── setup-erik.sh # First-time server setup
│ ├── deploy.sh # One-command local → Erik deploy
│ ├── ecosystem.config.cjs # PM2 config
│ ├── nginx.conf # Optional nginx reverse proxy
│ └── cloudflare-tunnel.md
├── scripts/
│ ├── init-db.sh # Database initialization
│ └── pull-models.sh # Pull Ollama models
├── Dockerfile
├── docker-compose.yaml
├── .env.example
└── package.json # npm workspaces root
Description
Unified LLM orchestration layer for TIP, EO Global Pulse, PeerCortex, SwitchBlade, NOGnet, ShieldX
Languages
TypeScript
52.1%
Python
24.4%
HTML
18.7%
Shell
3.8%
JavaScript
0.9%
Other
0.1%