Go to file

Rene Fichtmueller c731900a90 sec(gateway): Layer-3 llm_judge model now configurable via LLM_JUDGE_MODEL env

Was hardcoded to qwen2.5:3b. Now reads from process.env LLM_JUDGE_MODEL
with qwen2.5:3b fallback.

Production env updated to magatama-coder:judge-r1 — a snapshot of the
magatamallm post-chunk-4 LoRA adapter exported via train.py --export-only.
Chunk-4 picked because it had the best val_loss (0.861) of the 5 balanced
chunks; chunk-5 spiked back to val=2.531.

Sanity test on the new judge model:
  injection prompt -> "INFORMATIONAL"  (not the strict INJECTION word
                                         we'd want — judge needs Phase-2
                                         dedicated fine-tune on binary
                                         classification format)
  safe prompt      -> "SAFE"           (correct)

Implication: INJECTION_DEFENSE_MODE is staying at 'block' for now —
switching to 'llm_judge' mode with this provisional judge would actually
weaken defense because magatamallm's training tilts toward operator-task
output ("here's the fix") rather than binary INJECTION/SAFE classification.

Follow-up (Phase 2): train a dedicated `magatama-judge` model — small base
(Qwen 2.5:1.5b or Phi-3-mini), trained purely on injection-classification
SFT pairs extracted from our existing:
  - llm-security-prompt-injection-2026-05-12.train.jsonl
  - pulso-magatama-injection-guard-2026-05-13.train.jsonl
  - guard-exposure-firewall-verified-2026-05-16.train.jsonl
  - jailbreak-corpus-candidates.jsonl (L1B3RT4S gaps)
  - benign samples from train.jsonl labeled SAFE

Architecture rationale: separation of concerns. Even if attacker manipulates
the primary backbone model, judge stays independent. ~5-10k pairs should
be enough for a focused 1.5B classifier. Training ~2-3h on Mac Studio MPS.

2026-05-16 23:36:26 +02:00

copilot-bridge

fix: correct copilot-api dependency version to 0.7.0 (published on npm)

2026-04-25 12:41:30 +02:00

deploy

fix: Correct Cloudflare tunnel and setup script to use port 3103

2026-04-26 21:04:36 +02:00

docs/adr

feat: Complete LightRAG Sidecar Phase 2 — Hybrid Retrieval Implementation

2026-04-25 05:47:18 +02:00

knowledge

feat: Phase 2F — Multi-Agent Integration (ADRs + Client Fallback + Tests)

2026-04-19 21:39:44 +02:00

openai-bridge

feat: integrate GitHub Copilot as third LLM provider via copilot-bridge

2026-04-25 12:38:30 +02:00

packages

sec(gateway): Layer-3 llm_judge model now configurable via LLM_JUDGE_MODEL env

2026-05-16 23:36:26 +02:00

scripts

feat: integrate GitHub Copilot as third LLM provider via copilot-bridge

2026-04-25 12:38:30 +02:00

src

fix: only send HSTS header on HTTPS connections, not HTTP

2026-04-26 19:01:41 +02:00

sync

sync: record gateway final hardening

2026-05-12 23:31:02 +02:00

.gitignore

feat: initial llm-gateway implementation

2026-04-02 22:48:55 +02:00

AI_CONTROL_PLANE_SYSTEM_DESIGN.md

feat: publish llm gateway v2 dashboard alongside restored workbench

2026-05-01 17:43:32 +02:00

CHANGELOG_PENDING.md

feat: Phase 2F — Multi-Agent Integration (ADRs + Client Fallback + Tests)

2026-04-19 21:39:44 +02:00

DEPLOYMENT_BLOCKED.md

feat: Complete LightRAG Sidecar Phase 2 — Hybrid Retrieval Implementation

2026-04-25 05:47:18 +02:00

DEPLOYMENT-BRIDGES.md

feat: integrate GitHub Copilot as third LLM provider via copilot-bridge

2026-04-25 12:38:30 +02:00

docker-compose.yaml

refactor: MAGATAMA pipeline code quality audit — all functions <50 lines

2026-04-25 17:38:11 +02:00

Dockerfile

feat: initial llm-gateway implementation

2026-04-02 22:48:55 +02:00

FINDINGS_DATABASE.json

fix: only send HSTS header on HTTPS connections, not HTTP

2026-04-26 19:01:41 +02:00

FINDINGS_RESOLVED.md

fix: only send HSTS header on HTTPS connections, not HTTP

2026-04-26 19:01:41 +02:00

INTEGRATION-STATUS.md

docs: update integration status to reflect full provider integration (OpenAI + Copilot)

2026-04-25 12:40:17 +02:00

MAGATAMA_DEPLOY_STATE.md

chore: MAGATAMA deployment state — download in progress, Pi-hole bypassed

2026-04-16 16:35:54 +02:00

OPEN_SOURCE_BLUEPRINT.md

feat: publish llm gateway v2 dashboard alongside restored workbench

2026-05-01 17:43:32 +02:00

OPEN_SOURCE_FEATURE_MATRIX.md

feat: publish llm gateway v2 dashboard alongside restored workbench

2026-05-01 17:43:32 +02:00

OPEN_SOURCE_GAP_ANALYSIS.md

feat: publish llm gateway v2 dashboard alongside restored workbench

2026-05-01 17:43:32 +02:00

OPEN_SOURCE_IMPLEMENTATION_ROADMAP.md

feat: publish llm gateway v2 dashboard alongside restored workbench

2026-05-01 17:43:32 +02:00

package-lock.json

fix: Correct Cloudflare tunnel and setup script to use port 3103

2026-04-26 21:04:36 +02:00

package.json

fix: only send HSTS header on HTTPS connections, not HTTP

2026-04-26 19:01:41 +02:00

PHASE_2F_DEPLOYMENT.md

feat: Phase 2F — Multi-Agent Integration (ADRs + Client Fallback + Tests)

2026-04-19 21:39:44 +02:00

README.md

feat: initial llm-gateway implementation

2026-04-02 22:48:55 +02:00

README.md

LLM Gateway

Centralized AI inference layer for all Context X projects. Routes requests to local Ollama models on Mac Studio (192.168.178.169), validates outputs with ShieldX, and records all interactions for the self-improving learning engine.

Port: 3100 Production: http://llm-gateway.context-x.org (Cloudflare Tunnel → Erik)

Architecture

Projects (TIP, EO Pulse, SwitchBlade, PeerCortex, NOGnet, CtxEvent)
    ↓  @llm-gateway/client
LLM Gateway :3100
    ├── Prompt Engine   (versioned templates per task_type)
    ├── ShieldX Guard   (prompt injection validation)
    ├── Ollama Router   (model tier selection: 3b / 14b / 32b / 70b)
    └── Learning Engine (feedback loop, self-improvement)
         ↓
    PostgreSQL (llm_gateway DB)
    Ollama     (Mac Studio :11434)

Prerequisites

Dependency	Version	Notes
Node.js	22+	`node --version`
PostgreSQL	17	Local or remote
Ollama	latest	Running on Mac Studio .169
PM2	latest	`npm install -g pm2` (Erik)

1. Local Development Setup

# Clone
git clone http://gitea.context-x.org/rene/llm-gateway.git
cd llm-gateway

# Install all workspace dependencies
npm install

# Copy and configure environment
cp .env.example .env
# Edit .env: set DATABASE_URL, OLLAMA_URL at minimum

# Initialize database
bash scripts/init-db.sh

# Pull required Ollama models (runs against OLLAMA_URL from .env)
bash scripts/pull-models.sh

# Start gateway
npm run dev

# In a separate terminal: start learning engine
npm run learning

Gateway is available at http://localhost:3100.

2. Environment Variables

See .env.example for all variables with descriptions.

Variable	Required	Default	Description
`DATABASE_URL`	YES	—	PostgreSQL DSN for llm_gateway
`TIP_DATABASE_URL`	NO	—	TIP DB (read-only)
`OLLAMA_URL`	YES	http://...169:11434	Ollama inference server
`SHIELDX_URL`	NO	—	ShieldX endpoint (leave blank to skip)
`PORT`	NO	3100	HTTP port
`LOG_LEVEL`	NO	info	error / warn / info / debug

3. Running Migrations

# Full init (create DB + user + run all migrations)
bash scripts/init-db.sh

# Custom Postgres host (e.g. Erik)
PGHOST=217.154.82.179 PGPORT=5432 bash scripts/init-db.sh

Migration files live in:

packages/gateway/src/db/migrations/001_initial.sql
packages/learning/src/db/migrations/002_learning.sql

4. Pulling Ollama Models

bash scripts/pull-models.sh

# Against a different Ollama instance:
OLLAMA_URL=http://localhost:11434 bash scripts/pull-models.sh

Required models:

Model	Tier	Use case
`qwen2.5:3b`	Fast	Low-complexity, sub-second tasks
`qwen2.5:14b`	Medium	Standard completions
`qwen2.5:32b`	Large	Complex analysis
`deepseek-r1:14b`	Reasoning	Step-by-step logic
`llama3.3:70b`	Premium	Best quality, used sparingly

5. API Usage

Completion

curl -X POST http://localhost:3100/v1/completion \
  -H "Content-Type: application/json" \
  -d '{
    "caller": "my-project",
    "task_type": "summarize",
    "input": "Long document text here...",
    "language": "en"
  }'

Response:

{
  "request_id": "uuid",
  "status": "approved",
  "output": "Summary...",
  "confidence": 0.92,
  "model_used": "qwen2.5:14b",
  "prompt_version": "summarize/v2",
  "token_count": { "input": 512, "output": 128 },
  "latency_ms": 1240
}

Classify input

curl -X POST http://localhost:3100/v1/classify \
  -H "Content-Type: application/json" \
  -d '{ "caller": "my-project", "input": "What transceivers work with Cisco ASR9k?" }'

Health

curl http://localhost:3100/health
curl http://localhost:3100/health/live   # liveness probe (k8s / Docker)
curl http://localhost:3100/health/ready  # readiness probe

6. Project-specific Client Usage

Install the client in any workspace project:

npm install @llm-gateway/client

TIP (Transceiver Intelligence Platform)

import { createTIPClient } from '@llm-gateway/client';

const llm = createTIPClient(); // reads LLM_GATEWAY_URL from env

const result = await llm.completion({
  task_type: 'extract_specs',
  input: rawHtml,
  context: { vendor: 'Cisco', sku: 'SFP-10G-SR' },
});

if (result.status === 'approved') {
  console.log(result.output);
}

EO Global Pulse

import { createEOPulseClient } from '@llm-gateway/client';

const llm = createEOPulseClient();

// Safe completion: returns null when gateway is down (graceful degradation)
const result = await llm.safeCompletion({
  task_type: 'meeting_summary',
  input: transcriptText,
  language: 'de',
});

SwitchBlade

import { createSwitchBladeClient } from '@llm-gateway/client';

const llm = createSwitchBladeClient();

const { batch_id } = await llm.batch(
  tasks.map(t => ({ task_type: 'analyze_alert', input: t.raw })),
  'http://switchblade.context-x.org/webhooks/llm-batch',
);

Custom client (any project)

import { LLMGatewayClient } from '@llm-gateway/client';

const llm = new LLMGatewayClient({
  caller: 'my-service',
  baseUrl: process.env.LLM_GATEWAY_URL,
  timeout: 20_000,
});

7. Deployment to Erik

One-command deploy (from local Mac)

bash deploy/deploy.sh

# Skip local build (if already built):
bash deploy/deploy.sh --skip-build

# Health check only:
bash deploy/deploy.sh --health-only

First-time setup on Erik

# SSH to Erik
ssh root@217.154.82.179

# Run setup script (idempotent — safe to re-run)
cd /opt/llm-gateway
bash deploy/setup-erik.sh

PM2 management

ssh erik "pm2 status"
ssh erik "pm2 logs llm-gateway"
ssh erik "pm2 logs llm-learning"
ssh erik "pm2 restart llm-gateway"
ssh erik "pm2 monit"

8. Monitoring

Prometheus metrics

GET http://localhost:3100/metrics

Grafana

Metrics are scraped by the existing Prometheus instance. Import the dashboard from deploy/grafana-dashboard.json (if present).

Key metrics to watch

Metric	Alert threshold
`gateway_request_latency_p99`	> 5 000 ms
`gateway_error_rate`	> 5%
`ollama_queue_depth`	> 20
`learning_feedback_lag`	> 1 h

Log locations (Erik)

/var/log/llm-gateway/out.log           # gateway stdout
/var/log/llm-gateway/error.log         # gateway stderr
/var/log/llm-gateway/learning-out.log  # learning engine stdout
/var/log/llm-gateway/learning-error.log

9. Cloudflare Tunnel

See deploy/cloudflare-tunnel.md for instructions to expose the gateway via https://llm-gateway.context-x.org.

10. Docker (alternative to PM2)

# Build and start all services
cp .env.example .env   # fill in DATABASE_URL
docker compose up -d

# Check status
docker compose ps
docker compose logs llm-gateway

# Stop
docker compose down

Repository structure

llm-gateway/
├── packages/
│   ├── gateway/         # Core HTTP server (Express + Ollama + ShieldX)
│   │   ├── src/
│   │   │   ├── server.ts
│   │   │   ├── routes/
│   │   │   ├── db/
│   │   │   │   └── migrations/
│   │   │   └── prompts/
│   │   └── prompts/     # Versioned prompt templates
│   ├── learning/        # Self-improving feedback engine
│   │   └── src/
│   └── client/          # @llm-gateway/client TypeScript library
│       └── src/index.ts
├── deploy/
│   ├── setup-erik.sh       # First-time server setup
│   ├── deploy.sh           # One-command local → Erik deploy
│   ├── ecosystem.config.cjs # PM2 config
│   ├── nginx.conf          # Optional nginx reverse proxy
│   └── cloudflare-tunnel.md
├── scripts/
│   ├── init-db.sh          # Database initialization
│   └── pull-models.sh      # Pull Ollama models
├── Dockerfile
├── docker-compose.yaml
├── .env.example
└── package.json            # npm workspaces root

Languages

TypeScript 52.1%

Python 24.4%

HTML 18.7%

Shell 3.8%

JavaScript 0.9%

Other 0.1%