feat: initial llm-gateway implementation

- Complete Fastify gateway with 8-stage pipeline - Circuit breaker (opossum) per model tier - Rate limiting per caller - Ban list validation (EN/DE/auto-detected) - TIP validator (SFF-8024, part numbers, wavelengths) - Prometheus metrics - pg-boss async queue - PostgreSQL audit log + review queue - 9 prompt templates (TIP, LinkedIn, ShieldX) - Learning engine scaffolding - Auto-learning: ban-list, few-shot, routing, prompt optimizer
2026-04-02 22:48:55 +02:00 · 2026-04-02 22:48:55 +02:00 · 3a00ff4d33
commit 3a00ff4d33
93 changed files with 16229 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1,10 @@
+node_modules/
+dist/
+.env
+*.local
+.DS_Store
+packages/fine-tuner/models/
+packages/fine-tuner/adapters/
+__pycache__/
+*.pyc
+.env*
--- a/58
+++ b/58
@ -0,0 +1,58 @@
+# ============================================================
+# Stage 1: Builder
+# ============================================================
+FROM node:22-alpine AS builder
+
+WORKDIR /app
+
+# Copy workspace manifests first for layer caching
+COPY package.json package-lock.json* ./
+COPY packages/gateway/package.json ./packages/gateway/package.json
+
+# Install all workspace dependencies
+RUN npm install --workspace=packages/gateway
+
+# Copy gateway source
+COPY packages/gateway/ ./packages/gateway/
+
+# Build TypeScript
+RUN npm run build --workspace=packages/gateway
+
+# ============================================================
+# Stage 2: Runner
+# ============================================================
+FROM node:22-alpine AS runner
+
+WORKDIR /app
+
+# Security: run as non-root
+RUN addgroup -S gateway && adduser -S gateway -G gateway
+
+# Install wget for healthcheck (alpine has it by default, but be explicit)
+RUN apk add --no-cache wget
+
+# Copy compiled output
+COPY --from=builder /app/packages/gateway/dist ./packages/gateway/dist
+
+# Copy production node_modules
+COPY --from=builder /app/node_modules ./node_modules
+COPY --from=builder /app/packages/gateway/node_modules ./packages/gateway/node_modules 2>/dev/null || true
+
+# Copy runtime assets (prompt templates, config)
+COPY packages/gateway/prompts ./packages/gateway/prompts
+
+# Copy start script
+COPY packages/gateway/package.json ./packages/gateway/package.json
+COPY package.json ./package.json
+
+# Create log directory
+RUN mkdir -p /var/log/llm-gateway && chown -R gateway:gateway /var/log/llm-gateway /app
+
+USER gateway
+
+EXPOSE 3100
+
+HEALTHCHECK --interval=30s --timeout=10s --start-period=15s --retries=3 \
+  CMD wget -q -O- http://localhost:3100/health/live || exit 1
+
+CMD ["node", "packages/gateway/dist/server.js"]
--- a/README.md
+++ b/README.md
@ -0,0 +1,358 @@
+# LLM Gateway
+
+Centralized AI inference layer for all Context X projects. Routes requests to local Ollama models on Mac Studio (192.168.178.169), validates outputs with ShieldX, and records all interactions for the self-improving learning engine.
+
+**Port:** 3100
+**Production:** http://llm-gateway.context-x.org (Cloudflare Tunnel → Erik)
+
+---
+
+## Architecture
+
+```
+Projects (TIP, EO Pulse, SwitchBlade, PeerCortex, NOGnet, CtxEvent)
+    ↓  @llm-gateway/client
+LLM Gateway :3100
+    ├── Prompt Engine   (versioned templates per task_type)
+    ├── ShieldX Guard   (prompt injection validation)
+    ├── Ollama Router   (model tier selection: 3b / 14b / 32b / 70b)
+    └── Learning Engine (feedback loop, self-improvement)
+         ↓
+    PostgreSQL (llm_gateway DB)
+    Ollama     (Mac Studio :11434)
+```
+
+---
+
+## Prerequisites
+
+| Dependency     | Version | Notes                          |
+|----------------|---------|--------------------------------|
+| Node.js        | 22+     | `node --version`               |
+| PostgreSQL     | 17      | Local or remote                |
+| Ollama         | latest  | Running on Mac Studio .169     |
+| PM2            | latest  | `npm install -g pm2` (Erik)    |
+
+---
+
+## 1. Local Development Setup
+
+```bash
+# Clone
+git clone http://gitea.context-x.org/rene/llm-gateway.git
+cd llm-gateway
+
+# Install all workspace dependencies
+npm install
+
+# Copy and configure environment
+cp .env.example .env
+# Edit .env: set DATABASE_URL, OLLAMA_URL at minimum
+
+# Initialize database
+bash scripts/init-db.sh
+
+# Pull required Ollama models (runs against OLLAMA_URL from .env)
+bash scripts/pull-models.sh
+
+# Start gateway
+npm run dev
+
+# In a separate terminal: start learning engine
+npm run learning
+```
+
+Gateway is available at http://localhost:3100.
+
+---
+
+## 2. Environment Variables
+
+See `.env.example` for all variables with descriptions.
+
+| Variable          | Required | Default                  | Description                     |
+|-------------------|----------|--------------------------|---------------------------------|
+| `DATABASE_URL`    | YES      | —                        | PostgreSQL DSN for llm_gateway  |
+| `TIP_DATABASE_URL`| NO       | —                        | TIP DB (read-only)              |
+| `OLLAMA_URL`      | YES      | http://...169:11434      | Ollama inference server         |
+| `SHIELDX_URL`     | NO       | —                        | ShieldX endpoint (leave blank to skip) |
+| `PORT`            | NO       | 3100                     | HTTP port                       |
+| `LOG_LEVEL`       | NO       | info                     | error / warn / info / debug     |
+
+---
+
+## 3. Running Migrations
+
+```bash
+# Full init (create DB + user + run all migrations)
+bash scripts/init-db.sh
+
+# Custom Postgres host (e.g. Erik)
+PGHOST=217.154.82.179 PGPORT=5432 bash scripts/init-db.sh
+```
+
+Migration files live in:
+- `packages/gateway/src/db/migrations/001_initial.sql`
+- `packages/learning/src/db/migrations/002_learning.sql`
+
+---
+
+## 4. Pulling Ollama Models
+
+```bash
+bash scripts/pull-models.sh
+
+# Against a different Ollama instance:
+OLLAMA_URL=http://localhost:11434 bash scripts/pull-models.sh
+```
+
+Required models:
+
+| Model             | Tier      | Use case                          |
+|-------------------|-----------|-----------------------------------|
+| `qwen2.5:3b`      | Fast      | Low-complexity, sub-second tasks  |
+| `qwen2.5:14b`     | Medium    | Standard completions              |
+| `qwen2.5:32b`     | Large     | Complex analysis                  |
+| `deepseek-r1:14b` | Reasoning | Step-by-step logic                |
+| `llama3.3:70b`    | Premium   | Best quality, used sparingly      |
+
+---
+
+## 5. API Usage
+
+### Completion
+
+```bash
+curl -X POST http://localhost:3100/v1/completion \
+  -H "Content-Type: application/json" \
+  -d '{
+    "caller": "my-project",
+    "task_type": "summarize",
+    "input": "Long document text here...",
+    "language": "en"
+  }'
+```
+
+Response:
+```json
+{
+  "request_id": "uuid",
+  "status": "approved",
+  "output": "Summary...",
+  "confidence": 0.92,
+  "model_used": "qwen2.5:14b",
+  "prompt_version": "summarize/v2",
+  "token_count": { "input": 512, "output": 128 },
+  "latency_ms": 1240
+}
+```
+
+### Classify input
+
+```bash
+curl -X POST http://localhost:3100/v1/classify \
+  -H "Content-Type: application/json" \
+  -d '{ "caller": "my-project", "input": "What transceivers work with Cisco ASR9k?" }'
+```
+
+### Health
+
+```bash
+curl http://localhost:3100/health
+curl http://localhost:3100/health/live   # liveness probe (k8s / Docker)
+curl http://localhost:3100/health/ready  # readiness probe
+```
+
+---
+
+## 6. Project-specific Client Usage
+
+Install the client in any workspace project:
+
+```bash
+npm install @llm-gateway/client
+```
+
+### TIP (Transceiver Intelligence Platform)
+
+```typescript
+import { createTIPClient } from '@llm-gateway/client';
+
+const llm = createTIPClient(); // reads LLM_GATEWAY_URL from env
+
+const result = await llm.completion({
+  task_type: 'extract_specs',
+  input: rawHtml,
+  context: { vendor: 'Cisco', sku: 'SFP-10G-SR' },
+});
+
+if (result.status === 'approved') {
+  console.log(result.output);
+}
+```
+
+### EO Global Pulse
+
+```typescript
+import { createEOPulseClient } from '@llm-gateway/client';
+
+const llm = createEOPulseClient();
+
+// Safe completion: returns null when gateway is down (graceful degradation)
+const result = await llm.safeCompletion({
+  task_type: 'meeting_summary',
+  input: transcriptText,
+  language: 'de',
+});
+```
+
+### SwitchBlade
+
+```typescript
+import { createSwitchBladeClient } from '@llm-gateway/client';
+
+const llm = createSwitchBladeClient();
+
+const { batch_id } = await llm.batch(
+  tasks.map(t => ({ task_type: 'analyze_alert', input: t.raw })),
+  'http://switchblade.context-x.org/webhooks/llm-batch',
+);
+```
+
+### Custom client (any project)
+
+```typescript
+import { LLMGatewayClient } from '@llm-gateway/client';
+
+const llm = new LLMGatewayClient({
+  caller: 'my-service',
+  baseUrl: process.env.LLM_GATEWAY_URL,
+  timeout: 20_000,
+});
+```
+
+---
+
+## 7. Deployment to Erik
+
+### One-command deploy (from local Mac)
+
+```bash
+bash deploy/deploy.sh
+
+# Skip local build (if already built):
+bash deploy/deploy.sh --skip-build
+
+# Health check only:
+bash deploy/deploy.sh --health-only
+```
+
+### First-time setup on Erik
+
+```bash
+# SSH to Erik
+ssh root@217.154.82.179
+
+# Run setup script (idempotent — safe to re-run)
+cd /opt/llm-gateway
+bash deploy/setup-erik.sh
+```
+
+### PM2 management
+
+```bash
+ssh erik "pm2 status"
+ssh erik "pm2 logs llm-gateway"
+ssh erik "pm2 logs llm-learning"
+ssh erik "pm2 restart llm-gateway"
+ssh erik "pm2 monit"
+```
+
+---
+
+## 8. Monitoring
+
+### Prometheus metrics
+
+```
+GET http://localhost:3100/metrics
+```
+
+### Grafana
+
+Metrics are scraped by the existing Prometheus instance. Import the dashboard from `deploy/grafana-dashboard.json` (if present).
+
+### Key metrics to watch
+
+| Metric                      | Alert threshold        |
+|-----------------------------|------------------------|
+| `gateway_request_latency_p99` | > 5 000 ms           |
+| `gateway_error_rate`         | > 5%                   |
+| `ollama_queue_depth`         | > 20                   |
+| `learning_feedback_lag`      | > 1 h                  |
+
+### Log locations (Erik)
+
+```
+/var/log/llm-gateway/out.log           # gateway stdout
+/var/log/llm-gateway/error.log         # gateway stderr
+/var/log/llm-gateway/learning-out.log  # learning engine stdout
+/var/log/llm-gateway/learning-error.log
+```
+
+---
+
+## 9. Cloudflare Tunnel
+
+See `deploy/cloudflare-tunnel.md` for instructions to expose the gateway via `https://llm-gateway.context-x.org`.
+
+---
+
+## 10. Docker (alternative to PM2)
+
+```bash
+# Build and start all services
+cp .env.example .env   # fill in DATABASE_URL
+docker compose up -d
+
+# Check status
+docker compose ps
+docker compose logs llm-gateway
+
+# Stop
+docker compose down
+```
+
+---
+
+## Repository structure
+
+```
+llm-gateway/
+├── packages/
+│   ├── gateway/         # Core HTTP server (Express + Ollama + ShieldX)
+│   │   ├── src/
+│   │   │   ├── server.ts
+│   │   │   ├── routes/
+│   │   │   ├── db/
+│   │   │   │   └── migrations/
+│   │   │   └── prompts/
+│   │   └── prompts/     # Versioned prompt templates
+│   ├── learning/        # Self-improving feedback engine
+│   │   └── src/
+│   └── client/          # @llm-gateway/client TypeScript library
+│       └── src/index.ts
+├── deploy/
+│   ├── setup-erik.sh       # First-time server setup
+│   ├── deploy.sh           # One-command local → Erik deploy
+│   ├── ecosystem.config.cjs # PM2 config
+│   ├── nginx.conf          # Optional nginx reverse proxy
+│   └── cloudflare-tunnel.md
+├── scripts/
+│   ├── init-db.sh          # Database initialization
+│   └── pull-models.sh      # Pull Ollama models
+├── Dockerfile
+├── docker-compose.yaml
+├── .env.example
+└── package.json            # npm workspaces root
+```
--- a/deploy/cloudflare-tunnel.md
+++ b/deploy/cloudflare-tunnel.md
@ -0,0 +1,75 @@
+# Cloudflare Tunnel — LLM Gateway
+
+Add the LLM Gateway to the existing Cloudflare Tunnel on Erik server.
+
+## Current tunnel setup on Erik
+
+Tunnels are managed by `cloudflared` running as a service. Config lives at:
+
+```
+~/.cloudflared/config.yml
+```
+
+or (if installed as root):
+
+```
+/etc/cloudflare-one/config.yml
+```
+
+## Add llm-gateway ingress rule
+
+Edit the config file and add the following **before** the catch-all `http_status:404` rule:
+
+```yaml
+ingress:
+  # ... existing services ...
+
+  - hostname: llm-gateway.context-x.org
+    service: http://localhost:3100
+    originRequest:
+      connectTimeout: 10s
+      noHappyEyeballs: false
+      # Allow large LLM responses to stream without timeout
+      keepAliveTimeout: 130s
+
+  # Catch-all (must be last)
+  - service: http_status:404
+```
+
+## DNS record
+
+In Cloudflare Dashboard → DNS → context-x.org:
+
+| Type  | Name        | Target                        | Proxy |
+|-------|-------------|-------------------------------|-------|
+| CNAME | llm-gateway | `<tunnel-id>.cfargotunnel.com` | ON    |
+
+The tunnel ID can be found with:
+
+```bash
+ssh erik "cloudflared tunnel list"
+```
+
+## Reload tunnel
+
+```bash
+ssh erik "systemctl restart cloudflared"
+# Verify:
+curl https://llm-gateway.context-x.org/health/live
+```
+
+## Verify routing
+
+```bash
+# From any machine:
+curl https://llm-gateway.context-x.org/health
+
+# Expected:
+# {"status":"ok","ollama":{...},"queue":{...}}
+```
+
+## Notes
+
+- The tunnel connects directly to `localhost:3100` on Erik — nginx is **not** required.
+- Cloudflare handles TLS termination and DDoS protection automatically.
+- Rate limiting can be added via Cloudflare WAF rules on top of the gateway's built-in limits.
--- a/deploy/deploy.sh
+++ b/deploy/deploy.sh
@ -0,0 +1,186 @@
+#!/bin/bash
+# ============================================================
+# LLM Gateway — One-command deploy (run locally on Mac)
+#
+# Usage:
+#   bash deploy/deploy.sh
+#   bash deploy/deploy.sh --skip-build     # skip local build
+#   bash deploy/deploy.sh --health-only    # just check remote health
+# ============================================================
+set -euo pipefail
+
+ERIK_HOST="217.154.82.179"
+ERIK_USER="root"
+REMOTE_DIR="/opt/llm-gateway"
+GITEA_BRANCH="main"
+HEALTH_URL="http://${ERIK_HOST}:3100/health"
+HEALTH_URL_CF="https://llm-gateway.context-x.org/health"
+
+RED='\033[0;31m'
+GREEN='\033[0;32m'
+YELLOW='\033[1;33m'
+BLUE='\033[0;34m'
+NC='\033[0m'
+
+info()    { echo -e "${GREEN}[INFO]${NC}  $*"; }
+warn()    { echo -e "${YELLOW}[WARN]${NC}  $*"; }
+error()   { echo -e "${RED}[ERROR]${NC} $*" >&2; exit 1; }
+section() { echo -e "\n${BLUE}>> $*${NC}"; }
+
+# Parse args
+SKIP_BUILD=false
+HEALTH_ONLY=false
+for arg in "$@"; do
+  case $arg in
+    --skip-build)   SKIP_BUILD=true ;;
+    --health-only)  HEALTH_ONLY=true ;;
+    *) warn "Unknown argument: $arg" ;;
+  esac
+done
+
+# -------------------------------------------------------
+# Health-only mode
+# -------------------------------------------------------
+if [[ "$HEALTH_ONLY" == "true" ]]; then
+  section "Remote health check"
+  STATUS=$(curl -s -o /dev/null -w "%{http_code}" "$HEALTH_URL" 2>/dev/null || echo "000")
+  BODY=$(curl -s "$HEALTH_URL" 2>/dev/null || echo "{}")
+  echo "  HTTP status: $STATUS"
+  echo "  Response:    $BODY"
+  [[ "$STATUS" == "200" ]] && info "Gateway is healthy." || warn "Gateway may be unhealthy."
+  exit 0
+fi
+
+# -------------------------------------------------------
+# 0. Pre-deploy checks
+# -------------------------------------------------------
+section "0. Pre-deploy checks"
+
+command -v npm >/dev/null || error "npm not found."
+command -v git >/dev/null || error "git not found."
+command -v ssh >/dev/null || error "ssh not found."
+command -v curl >/dev/null || error "curl not found."
+
+# Check we're on main (warn only — don't block)
+CURRENT_BRANCH=$(git rev-parse --abbrev-ref HEAD 2>/dev/null || echo "unknown")
+if [[ "$CURRENT_BRANCH" != "$GITEA_BRANCH" ]]; then
+  warn "Current branch is '$CURRENT_BRANCH', not '$GITEA_BRANCH'. Proceeding anyway."
+fi
+
+# Check for uncommitted changes
+if ! git diff --quiet 2>/dev/null; then
+  warn "You have uncommitted changes. They will NOT be deployed."
+fi
+
+info "Pre-deploy checks passed."
+
+# -------------------------------------------------------
+# 1. Local build
+# -------------------------------------------------------
+section "1. Local build"
+
+if [[ "$SKIP_BUILD" == "true" ]]; then
+  warn "Skipping local build (--skip-build)"
+else
+  info "Running: npm run build"
+  npm run build
+  info "Build successful."
+fi
+
+# -------------------------------------------------------
+# 2. Push to Gitea
+# -------------------------------------------------------
+section "2. Pushing to Gitea"
+
+COMMIT_SHA=$(git rev-parse --short HEAD 2>/dev/null || echo "unknown")
+info "Pushing commit $COMMIT_SHA to Gitea ($GITEA_BRANCH)..."
+
+git push origin "$GITEA_BRANCH"
+info "Push complete."
+
+# -------------------------------------------------------
+# 3. Deploy on Erik
+# -------------------------------------------------------
+section "3. Deploying on Erik (${ERIK_HOST})"
+
+info "Connecting via SSH..."
+
+# shellcheck disable=SC2087
+ssh -o ConnectTimeout=15 "${ERIK_USER}@${ERIK_HOST}" bash << 'REMOTE_SCRIPT'
+set -euo pipefail
+
+REMOTE_DIR="/opt/llm-gateway"
+LOG_DIR="/var/log/llm-gateway"
+
+echo "[remote] Pulling latest code..."
+cd "$REMOTE_DIR"
+git fetch origin
+git reset --hard origin/main
+
+echo "[remote] Installing dependencies..."
+npm install --prefer-offline 2>/dev/null || npm install
+
+echo "[remote] Building..."
+npm run build
+
+echo "[remote] Ensuring log directory..."
+mkdir -p "$LOG_DIR"
+
+echo "[remote] Restarting PM2 processes..."
+if pm2 list | grep -q "llm-gateway"; then
+  pm2 reload llm-gateway llm-learning --update-env
+else
+  echo "[remote] PM2 processes not found — starting from ecosystem config..."
+  pm2 start deploy/ecosystem.config.cjs
+fi
+
+pm2 save
+
+echo "[remote] Deploy complete."
+REMOTE_SCRIPT
+
+info "Remote deploy finished."
+
+# -------------------------------------------------------
+# 4. Post-deploy health check
+# -------------------------------------------------------
+section "4. Post-deploy health check"
+
+MAX_RETRIES=8
+RETRY_DELAY=5
+
+info "Waiting ${RETRY_DELAY}s for restart to complete..."
+sleep $RETRY_DELAY
+
+for i in $(seq 1 $MAX_RETRIES); do
+  STATUS=$(curl -s -o /dev/null -w "%{http_code}" --max-time 10 "$HEALTH_URL" 2>/dev/null || echo "000")
+
+  if [[ "$STATUS" == "200" ]]; then
+    BODY=$(curl -s --max-time 10 "$HEALTH_URL" 2>/dev/null || echo "{}")
+    info "Health check PASSED (HTTP 200)"
+    echo "  $BODY"
+    break
+  fi
+
+  if [[ $i -eq $MAX_RETRIES ]]; then
+    warn "Health check did not return 200 after ${MAX_RETRIES} attempts."
+    warn "Check logs on Erik: ssh root@${ERIK_HOST} 'pm2 logs llm-gateway --lines 50'"
+    exit 1
+  fi
+
+  info "  Attempt $i/$MAX_RETRIES — HTTP $STATUS. Retrying in ${RETRY_DELAY}s..."
+  sleep $RETRY_DELAY
+done
+
+# -------------------------------------------------------
+# 5. Summary
+# -------------------------------------------------------
+echo ""
+echo -e "${GREEN}Deploy successful!${NC}"
+echo ""
+echo "  Commit:         $COMMIT_SHA"
+echo "  Direct:         $HEALTH_URL"
+echo "  Cloudflare:     $HEALTH_URL_CF"
+echo "  PM2 status:     ssh root@${ERIK_HOST} 'pm2 status'"
+echo "  Logs:           ssh root@${ERIK_HOST} 'pm2 logs llm-gateway'"
+echo ""
--- a/deploy/ecosystem.config.cjs
+++ b/deploy/ecosystem.config.cjs
@ -0,0 +1,59 @@
+/**
+ * PM2 Ecosystem Config — LLM Gateway on Erik (217.154.82.179)
+ *
+ * Deploy:  pm2 start deploy/ecosystem.config.cjs
+ * Reload:  pm2 reload llm-gateway llm-learning
+ * Logs:    pm2 logs llm-gateway
+ * Status:  pm2 status
+ */
+module.exports = {
+  apps: [
+    {
+      name: 'llm-gateway',
+      script: 'packages/gateway/dist/server.js',
+      cwd: '/opt/llm-gateway',
+      instances: 1,
+      exec_mode: 'fork',
+      env: {
+        NODE_ENV: 'production',
+        PORT: 3100,
+        DATABASE_URL: 'postgresql://llm:llm_secure_password@localhost:5432/llm_gateway',
+        TIP_DATABASE_URL: 'postgresql://tip:tip_prod_2026@localhost:5433/transceiver_db',
+        OLLAMA_URL: 'http://192.168.178.169:11434',
+        LOG_LEVEL: 'info',
+      },
+      // Restart on crash, but not on intentional stop
+      autorestart: true,
+      watch: false,
+      max_memory_restart: '512M',
+      // Graceful shutdown: wait up to 10s for in-flight requests
+      kill_timeout: 10000,
+      // Log rotation
+      error_file: '/var/log/llm-gateway/error.log',
+      out_file: '/var/log/llm-gateway/out.log',
+      log_date_format: 'YYYY-MM-DD HH:mm:ss Z',
+      merge_logs: true,
+    },
+    {
+      name: 'llm-learning',
+      script: 'packages/learning/src/index.ts',
+      interpreter: 'node',
+      interpreter_args: '--import tsx/esm',
+      cwd: '/opt/llm-gateway',
+      instances: 1,
+      exec_mode: 'fork',
+      env: {
+        NODE_ENV: 'production',
+        DATABASE_URL: 'postgresql://llm:llm_secure_password@localhost:5432/llm_gateway',
+        GATEWAY_URL: 'http://localhost:3100',
+      },
+      autorestart: true,
+      watch: false,
+      max_memory_restart: '256M',
+      kill_timeout: 5000,
+      error_file: '/var/log/llm-gateway/learning-error.log',
+      out_file: '/var/log/llm-gateway/learning-out.log',
+      log_date_format: 'YYYY-MM-DD HH:mm:ss Z',
+    },
+  ],
+}
--- a/deploy/nginx.conf
+++ b/deploy/nginx.conf
@ -0,0 +1,53 @@
+# Nginx reverse proxy for LLM Gateway
+# Place at: /etc/nginx/sites-available/llm-gateway
+# Enable:   ln -s /etc/nginx/sites-available/llm-gateway /etc/nginx/sites-enabled/
+# Reload:   nginx -t && systemctl reload nginx
+#
+# NOTE: If using Cloudflare Tunnel, nginx is optional.
+#       The tunnel connects directly to localhost:3100.
+
+upstream llm_gateway {
+    server localhost:3100;
+    keepalive 32;
+}
+
+server {
+    listen 80;
+    server_name llm-gateway.context-x.org;
+
+    # Security headers
+    add_header X-Content-Type-Options nosniff;
+    add_header X-Frame-Options DENY;
+    add_header X-XSS-Protection "1; mode=block";
+
+    # Don't leak nginx version
+    server_tokens off;
+
+    location / {
+        proxy_pass http://llm_gateway;
+
+        # Timeouts: 130s to handle large LLM responses without proxy interruption
+        proxy_read_timeout    130s;
+        proxy_connect_timeout 10s;
+        proxy_send_timeout    130s;
+
+        # Standard proxy headers
+        proxy_set_header Host              $host;
+        proxy_set_header X-Real-IP         $remote_addr;
+        proxy_set_header X-Forwarded-For   $proxy_add_x_forwarded_for;
+        proxy_set_header X-Forwarded-Proto $scheme;
+
+        # Keep-alive to upstream
+        proxy_http_version 1.1;
+        proxy_set_header Connection "";
+
+        # Buffer streaming responses (LLMs can stream tokens)
+        proxy_buffering off;
+    }
+
+    # Health endpoint — no logging (noisy in monitoring)
+    location /health {
+        proxy_pass http://llm_gateway/health;
+        access_log off;
+    }
+}
--- a/deploy/setup-erik.sh
+++ b/deploy/setup-erik.sh
@ -0,0 +1,195 @@
+#!/bin/bash
+# ============================================================
+# LLM Gateway — Full server setup for Erik (217.154.82.179)
+#
+# Run ONCE on a fresh server. Idempotent: safe to re-run.
+# Prerequisites: PostgreSQL 17, Node.js 22, git, pm2
+# ============================================================
+set -euo pipefail
+
+GATEWAY_DIR="/opt/llm-gateway"
+LOG_DIR="/var/log/llm-gateway"
+GITEA_REPO="http://gitea.context-x.org/rene/llm-gateway.git"
+DB_NAME="llm_gateway"
+DB_USER="llm"
+DB_PASS="llm_secure_password"
+PM2_USER="${SUDO_USER:-root}"
+
+RED='\033[0;31m'
+GREEN='\033[0;32m'
+YELLOW='\033[1;33m'
+NC='\033[0m'
+
+info()    { echo -e "${GREEN}[INFO]${NC}  $*"; }
+warn()    { echo -e "${YELLOW}[WARN]${NC}  $*"; }
+error()   { echo -e "${RED}[ERROR]${NC} $*" >&2; exit 1; }
+section() { echo -e "\n${GREEN}==============================${NC}"; echo -e "${GREEN}$*${NC}"; echo -e "${GREEN}==============================${NC}"; }
+
+# -------------------------------------------------------
+# 0. Preflight checks
+# -------------------------------------------------------
+section "0. Preflight checks"
+
+[[ $EUID -eq 0 ]] || error "Run as root: sudo bash deploy/setup-erik.sh"
+
+command -v node  >/dev/null || error "Node.js not found. Install Node.js 22 first."
+command -v npm   >/dev/null || error "npm not found."
+command -v psql  >/dev/null || error "psql not found. Install PostgreSQL 17 first."
+command -v pm2   >/dev/null || error "pm2 not found. Run: npm install -g pm2"
+command -v git   >/dev/null || error "git not found."
+
+NODE_VER=$(node --version | cut -d. -f1 | tr -d 'v')
+[[ $NODE_VER -ge 22 ]] || warn "Node.js 22+ recommended. Found: $(node --version)"
+
+info "All preflight checks passed."
+
+# -------------------------------------------------------
+# 1. Create application directory
+# -------------------------------------------------------
+section "1. Creating application directory"
+
+if [[ -d "$GATEWAY_DIR" ]]; then
+  warn "$GATEWAY_DIR already exists — skipping git clone (will pull later)"
+else
+  git clone "$GITEA_REPO" "$GATEWAY_DIR"
+  info "Cloned repository to $GATEWAY_DIR"
+fi
+
+cd "$GATEWAY_DIR"
+
+# -------------------------------------------------------
+# 2. Create log directory
+# -------------------------------------------------------
+section "2. Creating log directory"
+
+mkdir -p "$LOG_DIR"
+chown -R "$PM2_USER:$PM2_USER" "$LOG_DIR" 2>/dev/null || true
+info "Log directory: $LOG_DIR"
+
+# -------------------------------------------------------
+# 3. PostgreSQL — database + user + migrations
+# -------------------------------------------------------
+section "3. Setting up PostgreSQL"
+
+# Create user if not exists
+if sudo -u postgres psql -tAc "SELECT 1 FROM pg_roles WHERE rolname='$DB_USER'" | grep -q 1; then
+  info "PostgreSQL user '$DB_USER' already exists."
+else
+  sudo -u postgres psql -c "CREATE USER $DB_USER WITH PASSWORD '$DB_PASS';"
+  info "Created PostgreSQL user '$DB_USER'."
+fi
+
+# Create database if not exists
+if sudo -u postgres psql -tAc "SELECT 1 FROM pg_database WHERE datname='$DB_NAME'" | grep -q 1; then
+  info "Database '$DB_NAME' already exists."
+else
+  sudo -u postgres psql -c "CREATE DATABASE $DB_NAME OWNER $DB_USER;"
+  sudo -u postgres psql -c "GRANT ALL PRIVILEGES ON DATABASE $DB_NAME TO $DB_USER;"
+  info "Created database '$DB_NAME'."
+fi
+
+# Run migrations
+info "Running migrations..."
+MIGRATION_DIR="$GATEWAY_DIR/packages/gateway/src/db/migrations"
+if [[ -d "$MIGRATION_DIR" ]]; then
+  for sql_file in "$MIGRATION_DIR"/*.sql; do
+    [[ -f "$sql_file" ]] || continue
+    filename=$(basename "$sql_file")
+    info "  Applying migration: $filename"
+    PGPASSWORD="$DB_PASS" psql -U "$DB_USER" -d "$DB_NAME" -h localhost -f "$sql_file" \
+      && info "  ✓ $filename" \
+      || warn "  Migration $filename may have already been applied (ignoring error)"
+  done
+else
+  warn "Migration directory not found at $MIGRATION_DIR — skipping migrations"
+fi
+
+# Learning engine migrations
+LEARNING_MIGRATION_DIR="$GATEWAY_DIR/packages/learning/src/db/migrations"
+if [[ -d "$LEARNING_MIGRATION_DIR" ]]; then
+  for sql_file in "$LEARNING_MIGRATION_DIR"/*.sql; do
+    [[ -f "$sql_file" ]] || continue
+    filename=$(basename "$sql_file")
+    info "  Applying learning migration: $filename"
+    PGPASSWORD="$DB_PASS" psql -U "$DB_USER" -d "$DB_NAME" -h localhost -f "$sql_file" \
+      && info "  ✓ $filename" \
+      || warn "  Migration $filename may have already been applied (ignoring error)"
+  done
+fi
+
+# -------------------------------------------------------
+# 4. npm install + build
+# -------------------------------------------------------
+section "4. Installing dependencies and building"
+
+cd "$GATEWAY_DIR"
+npm install
+npm run build
+
+info "Build complete."
+
+# -------------------------------------------------------
+# 5. PM2 — register and start processes
+# -------------------------------------------------------
+section "5. Starting PM2 processes"
+
+# If already registered, reload; otherwise start fresh
+if pm2 list | grep -q "llm-gateway"; then
+  info "PM2 process 'llm-gateway' exists — reloading..."
+  pm2 reload llm-gateway
+else
+  info "Starting PM2 processes from ecosystem config..."
+  pm2 start "$GATEWAY_DIR/deploy/ecosystem.config.cjs"
+fi
+
+# Save PM2 state so it survives reboots
+pm2 save
+
+# Register PM2 startup script (only if not already done)
+if ! systemctl is-enabled pm2-root &>/dev/null 2>&1 && ! systemctl is-enabled "pm2-$PM2_USER" &>/dev/null 2>&1; then
+  info "Registering PM2 startup hook..."
+  pm2 startup systemd -u "$PM2_USER" --hp "/root" | tail -1 | bash || true
+fi
+
+# -------------------------------------------------------
+# 6. Health check
+# -------------------------------------------------------
+section "6. Health check"
+
+info "Waiting 5s for gateway to start..."
+sleep 5
+
+MAX_RETRIES=10
+RETRY_DELAY=3
+HEALTH_URL="http://localhost:3100/health/live"
+
+for i in $(seq 1 $MAX_RETRIES); do
+  STATUS=$(curl -s -o /dev/null -w "%{http_code}" "$HEALTH_URL" 2>/dev/null || echo "000")
+  if [[ "$STATUS" == "200" ]]; then
+    info "Health check PASSED (HTTP 200)"
+    break
+  fi
+  if [[ $i -eq $MAX_RETRIES ]]; then
+    warn "Health check did not return 200 after ${MAX_RETRIES} attempts (got: $STATUS)"
+    warn "Check logs: pm2 logs llm-gateway"
+  else
+    info "  Attempt $i/$MAX_RETRIES — got HTTP $STATUS, retrying in ${RETRY_DELAY}s..."
+    sleep $RETRY_DELAY
+  fi
+done
+
+# -------------------------------------------------------
+# 7. Summary
+# -------------------------------------------------------
+section "Setup complete"
+echo ""
+echo "  Gateway:   http://localhost:3100"
+echo "  Health:    http://localhost:3100/health"
+echo "  Logs:      pm2 logs llm-gateway"
+echo "  PM2 UI:    pm2 monit"
+echo ""
+echo "  Next steps:"
+echo "  1. Add Cloudflare tunnel ingress (see deploy/cloudflare-tunnel.md)"
+echo "  2. Pull Ollama models: bash scripts/pull-models.sh"
+echo "  3. Verify: curl http://localhost:3100/health"
+echo ""
--- a/docker-compose.yaml
+++ b/docker-compose.yaml
@ -0,0 +1,42 @@
+services:
+  llm-gateway:
+    build: .
+    container_name: llm-gateway
+    ports:
+      - "3100:3100"
+    environment:
+      NODE_ENV: production
+      PORT: "3100"
+      DATABASE_URL: "${DATABASE_URL}"
+      TIP_DATABASE_URL: "postgresql://tip:tip_prod_2026@217.154.82.179:5433/transceiver_db"
+      OLLAMA_URL: "http://192.168.178.169:11434"
+      SHIELDX_URL: "${SHIELDX_URL:-}"
+      GITEA_URL: "http://gitea.context-x.org"
+      LOG_LEVEL: "${LOG_LEVEL:-info}"
+    restart: unless-stopped
+    healthcheck:
+      test: ["CMD", "wget", "-q", "-O-", "http://localhost:3100/health/live"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+    networks:
+      - gateway-net
+
+  llm-learning:
+    build:
+      context: packages/learning
+      dockerfile: Dockerfile
+    container_name: llm-learning
+    environment:
+      DATABASE_URL: "${DATABASE_URL}"
+      GATEWAY_URL: "http://llm-gateway:3100"
+    restart: unless-stopped
+    depends_on:
+      llm-gateway:
+        condition: service_healthy
+    networks:
+      - gateway-net
+
+networks:
+  gateway-net:
+    driver: bridge
--- a/package-lock.json
+++ b/package-lock.json
--- a/package.json
+++ b/package.json
@ -0,0 +1,16 @@
+{
+  "name": "llm-gateway",
+  "version": "1.0.0",
+  "private": true,
+  "workspaces": ["packages/*"],
+  "scripts": {
+    "dev": "npm run dev --workspace=packages/gateway",
+    "build": "npm run build --workspace=packages/gateway",
+    "start": "npm run start --workspace=packages/gateway",
+    "learning": "npm run start --workspace=packages/learning",
+    "install:all": "npm install",
+    "test": "vitest",
+    "db:migrate": "bash scripts/init-db.sh",
+    "models:pull": "bash scripts/pull-models.sh"
+  }
+}
--- a/packages/client/package.json
+++ b/packages/client/package.json
@ -0,0 +1,12 @@
+{
+  "name": "@llm-gateway/client",
+  "version": "1.0.0",
+  "type": "module",
+  "main": "src/index.ts",
+  "exports": { ".": "./src/index.ts" },
+  "dependencies": {},
+  "devDependencies": {
+    "typescript": "^5.7.2",
+    "@types/node": "^22.10.6"
+  }
+}
--- a/packages/client/src/index.ts
+++ b/packages/client/src/index.ts
@ -0,0 +1,254 @@
+/**
+ * @llm-gateway/client
+ *
+ * TypeScript client library for the LLM Gateway.
+ * Used by all Context X projects: TIP, EO Global Pulse, SwitchBlade,
+ * PeerCortex, NOGnet, ShieldX, and CtxEvent.
+ *
+ * Usage:
+ *   import { LLMGatewayClient, createTIPClient } from '@llm-gateway/client';
+ *   const client = createTIPClient();
+ *   const result = await client.completion({ task_type: 'summarize', input: '...' });
+ */
+
+// ============================================================
+// Request / Response types
+// ============================================================
+
+export interface CompletionRequest {
+  /** Identifies which project/service is calling (e.g. 'tip-scraper', 'eo-global-pulse') */
+  caller: string;
+  /** Task type that maps to a prompt template (e.g. 'summarize', 'classify', 'translate') */
+  task_type: string;
+  /** The raw input text to process */
+  input: string;
+  /** Preferred output language */
+  language?: 'de' | 'en';
+  /** Additional context passed to the prompt template */
+  context?: Record<string, unknown>;
+  /** Per-request model / behavior overrides */
+  options?: {
+    /** Override the model (e.g. 'qwen2.5:32b'). Gateway picks a sensible default. */
+    model?: string;
+    /** Sampling temperature 0–1 */
+    temperature?: number;
+    /** Max output tokens */
+    max_tokens?: number;
+    /** Include full validation details in the response */
+    return_validation_details?: boolean;
+  };
+}
+
+export interface CompletionResponse {
+  /** UUID for the request — use for tracing / support */
+  request_id: string;
+  /** Overall status of the response */
+  status: 'approved' | 'warning' | 'pending_review' | 'rejected';
+  /** The LLM output, or null if rejected/failed */
+  output: unknown | null;
+  /** Model confidence score 0–1 */
+  confidence: number;
+  /** Ollama model that produced the output */
+  model_used: string;
+  /** Prompt template version used */
+  prompt_version: string;
+  /** Token usage */
+  token_count: { input: number; output: number };
+  /** End-to-end latency in milliseconds */
+  latency_ms: number;
+  /** Validation details (present when return_validation_details=true or status != 'approved') */
+  validation?: {
+    passed: boolean;
+    ban_hits: unknown[];
+    warnings: string[];
+  };
+}
+
+export interface ClassifyResponse {
+  task_type: string;
+  content_type: string;
+  language: string;
+  complexity: 'low' | 'medium' | 'high';
+  requires_facts: boolean;
+  suggested_task_types: string[];
+}
+
+export interface BatchResponse {
+  batch_id: string;
+}
+
+export interface HealthResponse {
+  status: 'ok' | 'degraded' | 'down';
+  ollama: unknown;
+  queue: unknown;
+}
+
+// ============================================================
+// Gateway client
+// ============================================================
+
+export class LLMGatewayClient {
+  private readonly baseUrl: string;
+  private readonly caller: string;
+  private readonly timeout: number;
+
+  constructor(config: {
+    baseUrl?: string;
+    caller: string;
+    /** Request timeout in ms (default: 30 000) */
+    timeout?: number;
+  }) {
+    this.baseUrl = config.baseUrl
+      ?? process.env['LLM_GATEWAY_URL']
+      ?? 'http://localhost:3100';
+    this.caller = config.caller;
+    this.timeout = config.timeout ?? 30_000;
+  }
+
+  // ----------------------------------------------------------
+  // Core: completion
+  // ----------------------------------------------------------
+
+  async completion(
+    params: Omit<CompletionRequest, 'caller'>,
+  ): Promise<CompletionResponse> {
+    const body: CompletionRequest = { ...params, caller: this.caller };
+    return this.post<CompletionResponse>('/v1/completion', body);
+  }
+
+  // ----------------------------------------------------------
+  // Classify input before routing
+  // ----------------------------------------------------------
+
+  async classify(input: string): Promise<ClassifyResponse> {
+    return this.post<ClassifyResponse>('/v1/classify', {
+      caller: this.caller,
+      input,
+    });
+  }
+
+  // ----------------------------------------------------------
+  // Batch: submit multiple tasks, results delivered via webhook
+  // ----------------------------------------------------------
+
+  async batch(
+    tasks: Array<Omit<CompletionRequest, 'caller'>>,
+    webhookUrl: string,
+  ): Promise<BatchResponse> {
+    return this.post<BatchResponse>('/v1/batch', {
+      caller: this.caller,
+      tasks,
+      webhook_url: webhookUrl,
+    });
+  }
+
+  // ----------------------------------------------------------
+  // Health
+  // ----------------------------------------------------------
+
+  async health(): Promise<HealthResponse> {
+    const res = await this.fetchWithTimeout(`${this.baseUrl}/health`);
+    if (!res.ok) {
+      throw new Error(`Health check failed: ${res.status}`);
+    }
+    return res.json() as Promise<HealthResponse>;
+  }
+
+  // ----------------------------------------------------------
+  // Graceful degradation — returns null when gateway is unavailable
+  // ----------------------------------------------------------
+
+  async safeCompletion(
+    params: Omit<CompletionRequest, 'caller'>,
+  ): Promise<CompletionResponse | null> {
+    try {
+      return await this.completion(params);
+    } catch {
+      // Gateway is down or timed out — caller handles degraded mode
+      return null;
+    }
+  }
+
+  // ----------------------------------------------------------
+  // Internal helpers
+  // ----------------------------------------------------------
+
+  private async post<T>(path: string, body: unknown): Promise<T> {
+    const res = await this.fetchWithTimeout(`${this.baseUrl}${path}`, {
+      method: 'POST',
+      headers: { 'Content-Type': 'application/json' },
+      body: JSON.stringify(body),
+    });
+
+    if (!res.ok) {
+      const text = await res.text().catch(() => '');
+      throw new Error(`Gateway error ${res.status} on ${path}: ${text}`);
+    }
+
+    return res.json() as Promise<T>;
+  }
+
+  private fetchWithTimeout(url: string, init?: RequestInit): Promise<Response> {
+    const controller = new AbortController();
+    const timer = setTimeout(() => controller.abort(), this.timeout);
+
+    return fetch(url, { ...init, signal: controller.signal }).finally(() =>
+      clearTimeout(timer),
+    );
+  }
+}
+
+// ============================================================
+// Project-specific pre-configured factory functions
+// ============================================================
+
+/**
+ * TIP (Transceiver Intelligence Platform)
+ * Long timeout because scraping + AI analysis can take time.
+ */
+export function createTIPClient(baseUrl?: string): LLMGatewayClient {
+  return new LLMGatewayClient({ caller: 'tip-scraper', baseUrl, timeout: 60_000 });
+}
+
+/**
+ * EO Global Pulse — team collaboration & CRM intelligence
+ */
+export function createEOPulseClient(baseUrl?: string): LLMGatewayClient {
+  return new LLMGatewayClient({ caller: 'eo-global-pulse', baseUrl, timeout: 30_000 });
+}
+
+/**
+ * SwitchBlade — infrastructure management platform
+ */
+export function createSwitchBladeClient(baseUrl?: string): LLMGatewayClient {
+  return new LLMGatewayClient({ caller: 'switchblade', baseUrl, timeout: 15_000 });
+}
+
+/**
+ * PeerCortex — BGP/RPKI network intelligence
+ * Short timeout: results must be near-real-time for network monitoring.
+ */
+export function createPeerCortexClient(baseUrl?: string): LLMGatewayClient {
+  return new LLMGatewayClient({ caller: 'peercortex', baseUrl, timeout: 8_000 });
+}
+
+/**
+ * NOGnet — NOG Support Program & event management
+ */
+export function createNOGnetClient(baseUrl?: string): LLMGatewayClient {
+  return new LLMGatewayClient({ caller: 'nognet', baseUrl, timeout: 30_000 });
+}
+
+/**
+ * ShieldX — LLM prompt injection defense (internal meta-use)
+ */
+export function createShieldXClient(baseUrl?: string): LLMGatewayClient {
+  return new LLMGatewayClient({ caller: 'internal', baseUrl, timeout: 10_000 });
+}
+
+/**
+ * CtxEvent — event management platform
+ */
+export function createCtxEventClient(baseUrl?: string): LLMGatewayClient {
+  return new LLMGatewayClient({ caller: 'ctxevent', baseUrl, timeout: 20_000 });
+}
--- a/packages/client/tsconfig.json
+++ b/packages/client/tsconfig.json
@ -0,0 +1,21 @@
+{
+  "compilerOptions": {
+    "target": "ES2022",
+    "module": "NodeNext",
+    "moduleResolution": "NodeNext",
+    "lib": ["ES2022"],
+    "outDir": "dist",
+    "rootDir": "src",
+    "declaration": true,
+    "declarationMap": true,
+    "sourceMap": true,
+    "strict": true,
+    "noImplicitAny": true,
+    "noUnusedLocals": true,
+    "noUnusedParameters": true,
+    "exactOptionalPropertyTypes": true,
+    "skipLibCheck": true
+  },
+  "include": ["src/**/*"],
+  "exclude": ["dist", "node_modules"]
+}
--- a/packages/fine-tuner/config/fine_tuner.yaml
+++ b/packages/fine-tuner/config/fine_tuner.yaml
@ -0,0 +1,49 @@
+database_url: "postgresql://llm:llm_secure_password@localhost:5432/llm_gateway"
+gateway_url: "http://localhost:3100"
+ollama_url: "http://192.168.178.169:11434"
+
+models:
+  qwen_14b_hf: "Qwen/Qwen2.5-14B-Instruct"  # HuggingFace model ID — used for general fine-tuning
+  qwen_7b_hf: "Qwen/Qwen2.5-7B-Instruct"    # For task-specific runs (smaller, faster)
+
+training:
+  device: "mps"  # Apple Silicon MPS — fallback to "cpu" if MPS unavailable
+  max_seq_length: 2048
+  lora_r: 16
+  lora_alpha: 32
+  lora_dropout: 0.05
+  target_modules:
+    - "q_proj"
+    - "k_proj"
+    - "v_proj"
+    - "o_proj"
+    - "gate_proj"
+    - "up_proj"
+    - "down_proj"
+
+  sft:
+    num_epochs: 3
+    batch_size: 1
+    gradient_accumulation: 8
+    learning_rate: 2.0e-4
+    warmup_ratio: 0.1
+
+  dpo:
+    num_epochs: 1
+    batch_size: 1
+    gradient_accumulation: 4
+    beta: 0.1       # DPO temperature — higher = more conservative
+    learning_rate: 5.0e-5
+
+evaluation:
+  min_improvement_to_deploy: 0.3   # confidence delta required before deployment
+  n_eval_samples: 20
+
+output:
+  adapters_dir: "adapters"
+  models_dir: "models"
+
+llama_cpp:
+  convert_script: "/opt/homebrew/lib/python3.12/site-packages/llama_cpp/convert_hf_to_gguf.py"
+  quantize_binary: "/opt/homebrew/bin/llama-quantize"
+  default_quantization: "Q5_K_M"
--- a/packages/fine-tuner/requirements.txt
+++ b/packages/fine-tuner/requirements.txt
@ -0,0 +1,10 @@
+torch>=2.1.0
+transformers>=4.36.0
+peft>=0.7.0
+trl>=0.7.4
+datasets>=2.16.0
+accelerate>=0.25.0
+psycopg2-binary>=2.9.9
+pyyaml>=6.0.1
+requests>=2.31.0
+huggingface-hub>=0.20.0
--- a/packages/fine-tuner/scripts/install_deps.sh
+++ b/packages/fine-tuner/scripts/install_deps.sh
@ -0,0 +1,103 @@
+#!/bin/bash
+# install_deps.sh - Install Python dependencies for the LLM Gateway fine-tuner.
+# Tested on macOS with Apple Silicon (MPS) and Python 3.9+.
+
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+ROOT_DIR="$(dirname "$SCRIPT_DIR")"
+
+echo "=== LLM Gateway Fine-Tuner: Dependency Installer ==="
+echo "Root: $ROOT_DIR"
+echo ""
+
+# Verify Python version
+PYTHON_VERSION=$(python3 --version 2>&1 | awk '{print $2}')
+echo "Python: $PYTHON_VERSION"
+MAJOR=$(echo "$PYTHON_VERSION" | cut -d. -f1)
+MINOR=$(echo "$PYTHON_VERSION" | cut -d. -f2)
+if [ "$MAJOR" -lt 3 ] || { [ "$MAJOR" -eq 3 ] && [ "$MINOR" -lt 9 ]; }; then
+    echo "ERROR: Python 3.9+ required (found $PYTHON_VERSION)"
+    exit 1
+fi
+
+# Check for MPS availability
+python3 -c "import torch; print('MPS available:', torch.backends.mps.is_available())" 2>/dev/null || true
+
+# Core ML dependencies
+echo ""
+echo "--- Installing core ML packages ---"
+pip3 install \
+    "peft>=0.7.0" \
+    "trl>=0.7.4" \
+    "datasets>=2.16.0" \
+    "psycopg2-binary>=2.9.9" \
+    "pyyaml>=6.0.1" \
+    "requests>=2.31.0" \
+    "huggingface-hub>=0.20.0"
+
+# torch / transformers / accelerate should already be present per the
+# environment spec, but install if missing
+echo ""
+echo "--- Verifying torch / transformers / accelerate ---"
+pip3 install \
+    "torch>=2.1.0" \
+    "transformers>=4.36.0" \
+    "accelerate>=0.25.0" \
+    --upgrade 2>/dev/null || echo "WARNING: Could not upgrade torch stack — ensure versions are compatible"
+
+# llama.cpp for GGUF conversion
+echo ""
+echo "--- Checking llama.cpp (for GGUF conversion) ---"
+if command -v llama-quantize &>/dev/null; then
+    echo "OK: llama-quantize found at $(which llama-quantize)"
+else
+    echo "llama-quantize not found — attempting brew install..."
+    if command -v brew &>/dev/null; then
+        brew install llama.cpp || echo "WARNING: brew install llama.cpp failed — GGUF conversion will be unavailable"
+    else
+        echo "WARNING: brew not found. Install llama.cpp manually: https://github.com/ggerganov/llama.cpp"
+        echo "  Or via pip:  pip3 install llama-cpp-python"
+        echo "  The fine-tuner will still train but cannot convert to GGUF."
+    fi
+fi
+
+# Check for convert_hf_to_gguf.py (may ship with llama-cpp-python)
+if python3 -c "import llama_cpp; import pathlib; p = pathlib.Path(llama_cpp.__file__).parent / 'convert_hf_to_gguf.py'; exit(0 if p.exists() else 1)" 2>/dev/null; then
+    echo "OK: convert_hf_to_gguf.py found via llama-cpp-python"
+else
+    echo "NOTE: convert_hf_to_gguf.py not found in llama-cpp-python package."
+    echo "      If you need GGUF conversion, install llama.cpp via brew or build from source."
+fi
+
+echo ""
+echo "--- Verifying installation ---"
+python3 - <<'PYEOF'
+import sys
+required = ["torch", "transformers", "peft", "trl", "datasets", "accelerate", "psycopg2", "yaml", "requests", "huggingface_hub"]
+missing = []
+for mod in required:
+    try:
+        __import__(mod)
+        print(f"  OK  {mod}")
+    except ImportError:
+        print(f"  MISSING  {mod}")
+        missing.append(mod)
+
+if missing:
+    print(f"\nERROR: Missing modules: {missing}")
+    sys.exit(1)
+
+import torch
+print(f"\nTorch version: {torch.__version__}")
+print(f"MPS available: {torch.backends.mps.is_available()}")
+print(f"MPS built: {torch.backends.mps.is_built()}")
+PYEOF
+
+echo ""
+echo "=== Fine-tuner dependencies installed successfully ==="
+echo ""
+echo "Next steps:"
+echo "  1. Ensure PostgreSQL is running with the llm_gateway database"
+echo "  2. Copy config/fine_tuner.yaml and adjust URLs if needed"
+echo "  3. Start the service: python3 -m src.main"
--- a/packages/fine-tuner/scripts/manual_trigger.py
+++ b/packages/fine-tuner/scripts/manual_trigger.py
@ -0,0 +1,240 @@
+#!/usr/bin/env python3
+"""
+manual_trigger.py - Manually trigger a fine-tuning run without waiting for the 30-minute poll.
+
+Usage:
+    # Task-specific LoRA for a single task_type
+    python3 scripts/manual_trigger.py --task-type tip-transceiver-enrich
+
+    # Task-specific with a lower minimum example count
+    python3 scripts/manual_trigger.py --task-type linkedin-post-de --min-examples 50
+
+    # General fine-tuning across all task types
+    python3 scripts/manual_trigger.py --general
+
+    # DPO preference learning
+    python3 scripts/manual_trigger.py --dpo
+
+    # Dry-run: show what would trigger without running anything
+    python3 scripts/manual_trigger.py --dry-run
+
+    # Use a specific config file
+    python3 scripts/manual_trigger.py --task-type linkedin-post-de --config /path/to/fine_tuner.yaml
+"""
+
+from __future__ import annotations
+
+import argparse
+import logging
+import sys
+import uuid
+from pathlib import Path
+
+# Ensure the package root is on the path when running as a script
+_REPO_ROOT = Path(__file__).parent.parent
+sys.path.insert(0, str(_REPO_ROOT.parent.parent))  # workspace root
+sys.path.insert(0, str(_REPO_ROOT))  # fine-tuner root
+
+from src.main import _connect, _create_run_record, load_config, run_fine_tuning
+from src.scheduler import (
+    list_active_task_types,
+    should_trigger_dpo,
+    should_trigger_general,
+    should_trigger_task_specific,
+)
+from src.data_collector import get_corpus_stats
+
+logging.basicConfig(
+    level=logging.INFO,
+    format="%(asctime)s %(levelname)s [%(name)s] %(message)s",
+)
+logger = logging.getLogger("manual_trigger")
+
+
+# ---------------------------------------------------------------------------
+# CLI
+# ---------------------------------------------------------------------------
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(
+        description="Manually trigger LLM Gateway fine-tuning runs.",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog=__doc__,
+    )
+
+    mode = parser.add_mutually_exclusive_group(required=True)
+    mode.add_argument(
+        "--task-type",
+        metavar="TASK_TYPE",
+        help="Run task-specific LoRA fine-tuning for this task_type.",
+    )
+    mode.add_argument(
+        "--general",
+        action="store_true",
+        help="Run general SFT fine-tuning across all task types.",
+    )
+    mode.add_argument(
+        "--dpo",
+        action="store_true",
+        help="Run DPO preference learning.",
+    )
+    mode.add_argument(
+        "--status",
+        action="store_true",
+        help="Show corpus statistics and trigger eligibility, then exit.",
+    )
+
+    parser.add_argument(
+        "--min-examples",
+        type=int,
+        default=None,
+        help="Override minimum example count for this run (bypasses threshold check).",
+    )
+    parser.add_argument(
+        "--dry-run",
+        action="store_true",
+        help="Show what would be triggered without actually running anything.",
+    )
+    parser.add_argument(
+        "--config",
+        metavar="PATH",
+        default=None,
+        help="Path to fine_tuner.yaml (default: config/fine_tuner.yaml).",
+    )
+    parser.add_argument(
+        "--force",
+        action="store_true",
+        help="Skip trigger threshold checks and run regardless of example counts.",
+    )
+
+    return parser
+
+
+# ---------------------------------------------------------------------------
+# Status report
+# ---------------------------------------------------------------------------
+
+def print_status(conn, cfg: dict) -> None:
+    """Print corpus statistics and trigger eligibility for all task types."""
+    stats = get_corpus_stats(conn)
+    task_types = list_active_task_types(conn)
+
+    print("\n=== LLM Gateway Fine-Tuner Status ===\n")
+    print(f"DB:       {cfg['database_url'].split('@')[-1]}")  # hide credentials
+    print(f"Gateway:  {cfg['gateway_url']}")
+    print(f"Ollama:   {cfg['ollama_url']}")
+    print()
+
+    print("--- Corpus by Task Type ---")
+    print(f"{'Task Type':<35} {'Total':>6} {'Available':>10} {'Trigger?':>10}")
+    print("-" * 65)
+
+    for task_type in task_types:
+        info = stats["by_task_type"].get(task_type, {"total": 0, "available_positive": 0})
+        trigger = should_trigger_task_specific(conn, task_type)
+        print(
+            f"{task_type:<35} {info['total']:>6} {info['available_positive']:>10} "
+            f"{'YES' if trigger else 'no':>10}"
+        )
+
+    print()
+    print(f"DPO pairs available: {stats['dpo_pairs_available']}")
+    print(f"General trigger:     {'YES' if should_trigger_general(conn) else 'no'}")
+    print(f"DPO trigger:         {'YES' if should_trigger_dpo(conn) else 'no'}")
+    print()
+
+
+# ---------------------------------------------------------------------------
+# Main
+# ---------------------------------------------------------------------------
+
+def main() -> int:
+    parser = build_parser()
+    args = parser.parse_args()
+
+    cfg = load_config(args.config)
+    db_url = cfg["database_url"]
+
+    try:
+        conn = _connect(db_url)
+    except Exception as exc:
+        logger.error("Cannot connect to database: %s", exc)
+        return 1
+
+    try:
+        if args.status:
+            print_status(conn, cfg)
+            return 0
+
+        # Determine trigger
+        if args.task_type:
+            run_type = "task_specific"
+            task_type = args.task_type
+
+            if not args.force:
+                eligible = should_trigger_task_specific(conn, task_type)
+                if not eligible:
+                    if args.min_examples is not None:
+                        # Override threshold — just check the count manually
+                        from src.data_collector import collect_positive_examples
+                        examples = collect_positive_examples(conn, task_type=task_type)
+                        if len(examples) < args.min_examples:
+                            logger.error(
+                                "Not enough examples for %s: need %d, found %d",
+                                task_type,
+                                args.min_examples,
+                                len(examples),
+                            )
+                            return 1
+                        logger.info(
+                            "Threshold override: proceeding with %d examples (min-examples=%d)",
+                            len(examples),
+                            args.min_examples,
+                        )
+                    else:
+                        logger.warning(
+                            "Task %s does not meet trigger thresholds. "
+                            "Use --force to run anyway, or --min-examples N to override.",
+                            task_type,
+                        )
+                        print_status(conn, cfg)
+                        return 1
+
+        elif args.general:
+            run_type = "general"
+            task_type = None
+            if not args.force and not should_trigger_general(conn):
+                logger.warning(
+                    "General fine-tuning threshold not met. Use --force to run anyway."
+                )
+                print_status(conn, cfg)
+                return 1
+
+        else:  # --dpo
+            run_type = "dpo"
+            task_type = None
+            if not args.force and not should_trigger_dpo(conn):
+                logger.warning(
+                    "DPO threshold not met. Use --force to run anyway."
+                )
+                print_status(conn, cfg)
+                return 1
+
+        trigger = {"run_type": run_type, "task_type": task_type}
+
+        if args.dry_run:
+            print(f"\nDRY RUN — would trigger: {trigger}")
+            print("No training was started (--dry-run).")
+            return 0
+
+        logger.info("Manual trigger: %s", trigger)
+        run_fine_tuning(conn, trigger, cfg)
+        logger.info("Manual trigger complete.")
+        return 0
+
+    finally:
+        conn.close()
+
+
+if __name__ == "__main__":
+    sys.exit(main())
--- a/packages/fine-tuner/src/init.py
+++ b/packages/fine-tuner/src/init.py
@ -0,0 +1,12 @@
+"""
+fine-tuner — LLM Gateway automatic fine-tuning service.
+
+Modules:
+  main           — orchestrator / main loop
+  scheduler      — trigger condition logic
+  data_collector — PostgreSQL training data pipeline
+  trainer        — LoRA SFT fine-tuning (PEFT + TRL)
+  dpo_trainer    — DPO preference learning (TRL)
+  converter      — GGUF conversion + Ollama registration
+  evaluator      — automated before/after evaluation
+"""
--- a/packages/fine-tuner/src/converter.py
+++ b/packages/fine-tuner/src/converter.py
@ -0,0 +1,458 @@
+"""
+converter.py - Convert fine-tuned LoRA adapter to GGUF and register with Ollama.
+
+Pipeline:
+  1. Merge LoRA adapter weights into the base model.
+  2. Save the merged full-precision HuggingFace model.
+  3. Convert to GGUF via llama.cpp convert_hf_to_gguf.py.
+  4. Quantize with llama-quantize (Q5_K_M by default).
+  5. Create an Ollama Modelfile.
+  6. Register the model with Ollama via POST /api/create.
+  7. Run a lightweight evaluation to confirm the model is responsive.
+
+All subprocess calls use a fixed argument list — no shell=True, no
+string interpolation of user-controlled data into shell commands.
+"""
+
+from __future__ import annotations
+
+import json
+import logging
+import subprocess
+import time
+from pathlib import Path
+from typing import Optional
+
+import requests
+
+logger = logging.getLogger(__name__)
+
+
+# ---------------------------------------------------------------------------
+# Step 1 — Merge LoRA adapter into base model
+# ---------------------------------------------------------------------------
+
+def merge_lora_and_save(
+    base_model_path: str,
+    adapter_path: str,
+    output_path: str,
+) -> None:
+    """
+    Merge LoRA adapter weights into the base model and save the result.
+
+    The merged model is saved in standard HuggingFace format and can
+    then be converted to GGUF.  The base model is loaded in float32
+    for maximum compatibility with MPS and llama.cpp.
+    """
+    from peft import PeftModel
+    from transformers import AutoModelForCausalLM, AutoTokenizer
+    import torch
+
+    logger.info(
+        "merge_lora_and_save: base=%s adapter=%s → output=%s",
+        base_model_path,
+        adapter_path,
+        output_path,
+    )
+
+    tokenizer = AutoTokenizer.from_pretrained(
+        adapter_path,
+        trust_remote_code=True,
+    )
+
+    base_model = AutoModelForCausalLM.from_pretrained(
+        base_model_path,
+        torch_dtype=torch.float32,
+        trust_remote_code=True,
+    )
+
+    model = PeftModel.from_pretrained(
+        base_model,
+        adapter_path,
+        torch_dtype=torch.float32,
+    )
+
+    logger.info("Merging LoRA weights into base model...")
+    merged = model.merge_and_unload()
+
+    out = Path(output_path)
+    out.mkdir(parents=True, exist_ok=True)
+
+    merged.save_pretrained(str(out), safe_serialization=True)
+    tokenizer.save_pretrained(str(out))
+    logger.info("Merged model saved to %s", out)
+
+
+# ---------------------------------------------------------------------------
+# Step 2 — Convert HuggingFace model to GGUF
+# ---------------------------------------------------------------------------
+
+def convert_to_gguf(
+    model_path: str,
+    output_gguf_path: str,
+    quantization: str = "Q5_K_M",
+    convert_script: str = "/opt/homebrew/lib/python3.12/site-packages/llama_cpp/convert_hf_to_gguf.py",
+    quantize_binary: str = "/opt/homebrew/bin/llama-quantize",
+) -> bool:
+    """
+    Convert a HuggingFace model directory to a quantized GGUF file.
+
+    Steps:
+    1. Run convert_hf_to_gguf.py → unquantized fp16 GGUF.
+    2. Run llama-quantize → Q5_K_M (or requested quantization).
+
+    Returns True on success, False on any failure.
+    All subprocess calls use explicit argument lists (no shell=True).
+    """
+    model_path_obj = Path(model_path)
+    output_path_obj = Path(output_gguf_path)
+    output_path_obj.parent.mkdir(parents=True, exist_ok=True)
+
+    # Intermediate unquantized GGUF
+    f16_gguf = str(output_path_obj.with_suffix("")) + "_f16.gguf"
+
+    # --- Conversion step ---
+    convert_script_path = Path(convert_script)
+    if not convert_script_path.exists():
+        # Try to find it relative to the llama-cpp-python package
+        import importlib.util
+        spec = importlib.util.find_spec("llama_cpp")
+        if spec and spec.origin:
+            pkg_dir = Path(spec.origin).parent
+            alt_script = pkg_dir / "convert_hf_to_gguf.py"
+            if alt_script.exists():
+                convert_script_path = alt_script
+            else:
+                logger.error(
+                    "convert_to_gguf: convert_hf_to_gguf.py not found at %s or %s",
+                    convert_script,
+                    alt_script,
+                )
+                return False
+        else:
+            logger.error(
+                "convert_to_gguf: convert_hf_to_gguf.py not found at %s", convert_script
+            )
+            return False
+
+    cmd_convert = [
+        "python3",
+        str(convert_script_path),
+        str(model_path_obj),
+        "--outfile",
+        f16_gguf,
+        "--outtype",
+        "f16",
+    ]
+
+    logger.info("convert_to_gguf: running conversion: %s", " ".join(cmd_convert))
+    try:
+        result = subprocess.run(
+            cmd_convert,
+            capture_output=True,
+            text=True,
+            timeout=3600,  # 1 hour — large models take time
+            check=False,
+        )
+        if result.returncode != 0:
+            logger.error(
+                "convert_to_gguf: conversion failed (rc=%d):\n%s\n%s",
+                result.returncode,
+                result.stdout[-2000:],
+                result.stderr[-2000:],
+            )
+            return False
+        logger.info("convert_to_gguf: f16 GGUF written to %s", f16_gguf)
+    except subprocess.TimeoutExpired:
+        logger.error("convert_to_gguf: conversion timed out after 3600s")
+        return False
+    except FileNotFoundError as exc:
+        logger.error("convert_to_gguf: python3 not found: %s", exc)
+        return False
+
+    # --- Quantization step ---
+    quantize_bin = Path(quantize_binary)
+    if not quantize_bin.exists():
+        logger.warning(
+            "convert_to_gguf: llama-quantize not found at %s, skipping quantization",
+            quantize_binary,
+        )
+        # Use f16 as the output without quantization
+        Path(f16_gguf).rename(output_gguf_path)
+        return True
+
+    cmd_quantize = [
+        str(quantize_bin),
+        f16_gguf,
+        output_gguf_path,
+        quantization,
+    ]
+
+    logger.info("convert_to_gguf: quantizing: %s", " ".join(cmd_quantize))
+    try:
+        result = subprocess.run(
+            cmd_quantize,
+            capture_output=True,
+            text=True,
+            timeout=3600,
+            check=False,
+        )
+        if result.returncode != 0:
+            logger.error(
+                "convert_to_gguf: quantization failed (rc=%d):\n%s\n%s",
+                result.returncode,
+                result.stdout[-2000:],
+                result.stderr[-2000:],
+            )
+            return False
+        logger.info("convert_to_gguf: quantized GGUF written to %s", output_gguf_path)
+    except subprocess.TimeoutExpired:
+        logger.error("convert_to_gguf: quantization timed out after 3600s")
+        return False
+
+    # Clean up intermediate f16 file
+    try:
+        Path(f16_gguf).unlink(missing_ok=True)
+    except OSError:
+        pass
+
+    return True
+
+
+# ---------------------------------------------------------------------------
+# Step 3 — Create Ollama Modelfile
+# ---------------------------------------------------------------------------
+
+def create_ollama_modelfile(
+    gguf_path: str,
+    base_name: str,
+    task_type: Optional[str],
+    temperature: float = 0.3,
+    num_ctx: int = 8192,
+) -> str:
+    """
+    Generate Ollama Modelfile content for the fine-tuned model.
+
+    The Modelfile uses the absolute GGUF path so Ollama can locate it
+    regardless of working directory.
+    """
+    task_label = task_type or "general"
+    model_name = f"llm-gateway-{task_label}-ft"
+
+    system_prompt = (
+        f"You are a fine-tuned assistant specialised in {task_label} tasks. "
+        "Provide accurate, detailed, professional responses. "
+        "Your outputs have been optimised through automated training on "
+        "high-quality examples from the LLM Gateway learning corpus."
+    )
+
+    modelfile = (
+        f"FROM {gguf_path}\n"
+        f"PARAMETER temperature {temperature}\n"
+        f"PARAMETER num_ctx {num_ctx}\n"
+        f"PARAMETER stop \"<|im_end|>\"\n"
+        f"TEMPLATE \"\"\"\n"
+        f"{{{{- if .System}}}}<|im_start|>system\n{{{{.System}}}}<|im_end|>\n{{{{- end}}}}\n"
+        f"{{{{- range .Messages}}}}\n"
+        f"<|im_start|>{{{{.Role}}}}\n{{{{.Content}}}}<|im_end|>\n"
+        f"{{{{- end}}}}\n"
+        f"<|im_start|>assistant\n\"\"\"\n"
+        f"SYSTEM \"{system_prompt}\"\n"
+    )
+    return modelfile
+
+
+# ---------------------------------------------------------------------------
+# Step 4 — Register with Ollama
+# ---------------------------------------------------------------------------
+
+def register_with_ollama(
+    modelfile_content: str,
+    model_name: str,
+    ollama_url: str,
+    timeout_s: int = 600,
+) -> bool:
+    """
+    Register a model with Ollama via POST /api/create (streaming response).
+
+    Streams the response to capture progress lines.
+    Returns True when Ollama confirms success, False on any error.
+    """
+    url = f"{ollama_url}/api/create"
+    payload = {"name": model_name, "modelfile": modelfile_content}
+
+    logger.info("register_with_ollama: model=%s url=%s", model_name, url)
+
+    try:
+        with requests.post(
+            url,
+            json=payload,
+            stream=True,
+            timeout=timeout_s,
+            headers={"Content-Type": "application/json"},
+        ) as resp:
+            resp.raise_for_status()
+
+            last_status = ""
+            for line in resp.iter_lines():
+                if not line:
+                    continue
+                try:
+                    data = json.loads(line)
+                except json.JSONDecodeError:
+                    continue
+
+                status = data.get("status", "")
+                if status != last_status:
+                    logger.info("Ollama create: %s", status)
+                    last_status = status
+
+                if data.get("error"):
+                    logger.error("Ollama create error: %s", data["error"])
+                    return False
+
+            logger.info("register_with_ollama: model=%s registered successfully", model_name)
+            return True
+
+    except requests.exceptions.Timeout:
+        logger.error("register_with_ollama: timed out after %ds", timeout_s)
+        return False
+    except requests.exceptions.RequestException as exc:
+        logger.error("register_with_ollama: HTTP error: %s", exc)
+        return False
+
+
+# ---------------------------------------------------------------------------
+# Step 5 — Evaluate deployed model
+# ---------------------------------------------------------------------------
+
+def evaluate_model(
+    model_name: str,
+    task_type: str,
+    gateway_url: str,
+    n_samples: int = 20,
+    timeout_s: int = 60,
+) -> float:
+    """
+    Run evaluation prompts through the gateway using the newly deployed model.
+
+    Returns average confidence score across all successful responses.
+    Returns 0.0 if no successful responses were obtained.
+    """
+    from .evaluator import EVAL_PROMPTS, _call_gateway
+
+    prompts = EVAL_PROMPTS.get(task_type, EVAL_PROMPTS.get("general", []))
+    if not prompts:
+        logger.warning(
+            "evaluate_model: no eval prompts for task_type=%s, using general", task_type
+        )
+        prompts = EVAL_PROMPTS.get("general", [])
+
+    # Limit to n_samples
+    selected_prompts = prompts[:n_samples]
+    scores: list[float] = []
+
+    for prompt in selected_prompts:
+        confidence = _call_gateway(
+            gateway_url=gateway_url,
+            model=model_name,
+            prompt=prompt,
+            task_type=task_type,
+            timeout_s=timeout_s,
+        )
+        if confidence is not None:
+            scores.append(confidence)
+        time.sleep(0.3)  # avoid overwhelming the gateway
+
+    if not scores:
+        logger.warning(
+            "evaluate_model: model=%s task=%s — no successful responses", model_name, task_type
+        )
+        return 0.0
+
+    avg = round(sum(scores) / len(scores), 3)
+    logger.info(
+        "evaluate_model: model=%s task=%s avg_confidence=%.3f (n=%d)",
+        model_name,
+        task_type,
+        avg,
+        len(scores),
+    )
+    return avg
+
+
+# ---------------------------------------------------------------------------
+# Convenience: full pipeline
+# ---------------------------------------------------------------------------
+
+def run_conversion_and_registration(
+    base_model_path: str,
+    adapter_path: str,
+    task_type: Optional[str],
+    output_base_dir: str,
+    ollama_url: str,
+    gateway_url: str,
+    quantization: str = "Q5_K_M",
+    min_confidence_to_deploy: float = 0.0,
+    convert_script: str = "/opt/homebrew/lib/python3.12/site-packages/llama_cpp/convert_hf_to_gguf.py",
+    quantize_binary: str = "/opt/homebrew/bin/llama-quantize",
+) -> dict:
+    """
+    End-to-end: merge → GGUF → Ollama registration → evaluation.
+
+    Returns a result dict with keys: success, model_name, confidence, error.
+    """
+    task_label = task_type or "general"
+    model_name = f"llm-gateway-{task_label}-ft"
+
+    base = Path(output_base_dir)
+    merged_dir = str(base / "merged")
+    gguf_path = str(base / f"{model_name}.gguf")
+
+    result: dict = {
+        "success": False,
+        "model_name": model_name,
+        "confidence": 0.0,
+        "error": None,
+    }
+
+    try:
+        logger.info("Pipeline step 1/4: merging LoRA adapter")
+        merge_lora_and_save(base_model_path, adapter_path, merged_dir)
+
+        logger.info("Pipeline step 2/4: converting to GGUF (%s)", quantization)
+        ok = convert_to_gguf(
+            merged_dir,
+            gguf_path,
+            quantization=quantization,
+            convert_script=convert_script,
+            quantize_binary=quantize_binary,
+        )
+        if not ok:
+            result["error"] = "GGUF conversion failed"
+            return result
+
+        logger.info("Pipeline step 3/4: registering with Ollama")
+        modelfile = create_ollama_modelfile(gguf_path, model_name, task_type)
+        registered = register_with_ollama(modelfile, model_name, ollama_url)
+        if not registered:
+            result["error"] = "Ollama registration failed"
+            return result
+
+        logger.info("Pipeline step 4/4: evaluating deployed model")
+        confidence = evaluate_model(model_name, task_label, gateway_url)
+
+        result["success"] = True
+        result["confidence"] = confidence
+
+        logger.info(
+            "Conversion pipeline complete: model=%s confidence=%.3f",
+            model_name,
+            confidence,
+        )
+
+    except Exception as exc:
+        logger.error("run_conversion_and_registration: unexpected error: %s", exc, exc_info=True)
+        result["error"] = str(exc)
+
+    return result
--- a/packages/fine-tuner/src/data_collector.py
+++ b/packages/fine-tuner/src/data_collector.py
@ -0,0 +1,285 @@
+"""
+data_collector.py - Training data pipeline from PostgreSQL.
+
+Pulls high-confidence approved outputs, human-edited preference pairs,
+and low-confidence negatives from the llm_gateway database.
+All queries are parameterised; no external data is trusted.
+"""
+
+from __future__ import annotations
+
+import logging
+import uuid
+from typing import Optional
+
+import psycopg2
+import psycopg2.extras
+
+logger = logging.getLogger(__name__)
+
+
+# ---------------------------------------------------------------------------
+# Positive examples (SFT)
+# ---------------------------------------------------------------------------
+
+def collect_positive_examples(
+    conn: psycopg2.extensions.connection,
+    task_type: Optional[str],
+    min_confidence: float = 7.5,
+    limit: int = 500,
+) -> list[dict]:
+    """
+    Pull high-confidence, approved outputs from the learning_corpus table.
+
+    Filters:
+    - confidence_score >= min_confidence
+    - status = 'approved'
+    - used_in_training IS NULL (not yet consumed)
+    - Optionally scoped to a single task_type
+    """
+    sql_base = """
+        SELECT
+            id,
+            task_type,
+            input_text,
+            output_text,
+            system_prompt,
+            confidence_score,
+            created_at
+        FROM learning_corpus
+        WHERE
+            status = 'approved'
+            AND confidence_score >= %(min_confidence)s
+            AND used_in_training IS NULL
+            AND system_prompt IS NOT NULL
+            AND input_text IS NOT NULL
+            AND output_text IS NOT NULL
+    """
+    params: dict = {"min_confidence": min_confidence, "limit": limit}
+
+    if task_type is not None:
+        sql_base += " AND task_type = %(task_type)s"
+        params["task_type"] = task_type
+
+    sql_base += " ORDER BY confidence_score DESC LIMIT %(limit)s"
+
+    with conn.cursor(cursor_factory=psycopg2.extras.RealDictCursor) as cur:
+        cur.execute(sql_base, params)
+        rows = cur.fetchall()
+
+    result = [dict(r) for r in rows]
+    logger.info(
+        "collect_positive_examples: task_type=%s, min_confidence=%.1f → %d rows",
+        task_type,
+        min_confidence,
+        len(result),
+    )
+    return result
+
+
+# ---------------------------------------------------------------------------
+# Preference pairs (DPO)
+# ---------------------------------------------------------------------------
+
+def collect_preference_pairs(
+    conn: psycopg2.extensions.connection,
+    task_type: Optional[str],
+    limit: int = 200,
+) -> list[dict]:
+    """
+    Pull human-edited output pairs for DPO training.
+
+    A valid preference pair requires:
+    - human_edited = TRUE
+    - edited_output IS NOT NULL and differs from output_text
+    - used_in_dpo_training IS NULL
+    """
+    sql_base = """
+        SELECT
+            id,
+            task_type,
+            input_text,
+            output_text,
+            edited_output,
+            system_prompt,
+            created_at
+        FROM learning_corpus
+        WHERE
+            human_edited = TRUE
+            AND edited_output IS NOT NULL
+            AND edited_output <> output_text
+            AND used_in_dpo_training IS NULL
+            AND input_text IS NOT NULL
+            AND output_text IS NOT NULL
+    """
+    params: dict = {"limit": limit}
+
+    if task_type is not None:
+        sql_base += " AND task_type = %(task_type)s"
+        params["task_type"] = task_type
+
+    sql_base += " ORDER BY created_at DESC LIMIT %(limit)s"
+
+    with conn.cursor(cursor_factory=psycopg2.extras.RealDictCursor) as cur:
+        cur.execute(sql_base, params)
+        rows = cur.fetchall()
+
+    result = [dict(r) for r in rows]
+    logger.info(
+        "collect_preference_pairs: task_type=%s → %d pairs",
+        task_type,
+        len(result),
+    )
+    return result
+
+
+# ---------------------------------------------------------------------------
+# Negative examples (optional, for debugging / contrastive studies)
+# ---------------------------------------------------------------------------
+
+def collect_negative_examples(
+    conn: psycopg2.extensions.connection,
+    task_type: Optional[str],
+    max_confidence: float = 4.0,
+    limit: int = 200,
+) -> list[dict]:
+    """
+    Pull low-confidence outputs — useful for contrastive analysis and
+    understanding failure modes, but NOT included in SFT datasets directly.
+    """
+    sql_base = """
+        SELECT
+            id,
+            task_type,
+            input_text,
+            output_text,
+            system_prompt,
+            confidence_score,
+            created_at
+        FROM learning_corpus
+        WHERE
+            confidence_score <= %(max_confidence)s
+            AND status IN ('rejected', 'reviewed')
+            AND input_text IS NOT NULL
+            AND output_text IS NOT NULL
+    """
+    params: dict = {"max_confidence": max_confidence, "limit": limit}
+
+    if task_type is not None:
+        sql_base += " AND task_type = %(task_type)s"
+        params["task_type"] = task_type
+
+    sql_base += " ORDER BY confidence_score ASC LIMIT %(limit)s"
+
+    with conn.cursor(cursor_factory=psycopg2.extras.RealDictCursor) as cur:
+        cur.execute(sql_base, params)
+        rows = cur.fetchall()
+
+    result = [dict(r) for r in rows]
+    logger.info(
+        "collect_negative_examples: task_type=%s, max_confidence=%.1f → %d rows",
+        task_type,
+        max_confidence,
+        len(result),
+    )
+    return result
+
+
+# ---------------------------------------------------------------------------
+# Mark consumed examples
+# ---------------------------------------------------------------------------
+
+def mark_as_used(
+    conn: psycopg2.extensions.connection,
+    example_ids: list[str],
+    run_id: str,
+) -> None:
+    """
+    Stamp consumed SFT examples with the run_id so they are not selected again.
+    Uses a single parameterised UPDATE; never formats IDs into SQL strings.
+    """
+    if not example_ids:
+        return
+
+    sql = """
+        UPDATE learning_corpus
+        SET used_in_training = %(run_id)s
+        WHERE id = ANY(%(ids)s::uuid[])
+    """
+    with conn.cursor() as cur:
+        cur.execute(sql, {"run_id": run_id, "ids": example_ids})
+    conn.commit()
+    logger.info("mark_as_used: stamped %d examples with run_id=%s", len(example_ids), run_id)
+
+
+def mark_as_used_dpo(
+    conn: psycopg2.extensions.connection,
+    example_ids: list[str],
+    run_id: str,
+) -> None:
+    """Stamp consumed DPO preference pairs with the run_id."""
+    if not example_ids:
+        return
+
+    sql = """
+        UPDATE learning_corpus
+        SET used_in_dpo_training = %(run_id)s
+        WHERE id = ANY(%(ids)s::uuid[])
+    """
+    with conn.cursor() as cur:
+        cur.execute(sql, {"run_id": run_id, "ids": example_ids})
+    conn.commit()
+    logger.info(
+        "mark_as_used_dpo: stamped %d preference pairs with run_id=%s",
+        len(example_ids),
+        run_id,
+    )
+
+
+# ---------------------------------------------------------------------------
+# Corpus statistics
+# ---------------------------------------------------------------------------
+
+def get_corpus_stats(conn: psycopg2.extensions.connection) -> dict:
+    """
+    Return a snapshot of the learning corpus useful for trigger decisions.
+    Returns counts per task_type plus global DPO pair count.
+    """
+    task_sql = """
+        SELECT
+            task_type,
+            COUNT(*) AS total,
+            COUNT(*) FILTER (
+                WHERE status = 'approved'
+                AND confidence_score >= 7.5
+                AND used_in_training IS NULL
+            ) AS available_positive
+        FROM learning_corpus
+        WHERE input_text IS NOT NULL AND output_text IS NOT NULL
+        GROUP BY task_type
+    """
+    dpo_sql = """
+        SELECT COUNT(*) AS dpo_count
+        FROM learning_corpus
+        WHERE
+            human_edited = TRUE
+            AND edited_output IS NOT NULL
+            AND edited_output <> output_text
+            AND used_in_dpo_training IS NULL
+    """
+
+    stats: dict = {"by_task_type": {}, "dpo_pairs_available": 0}
+
+    with conn.cursor(cursor_factory=psycopg2.extras.RealDictCursor) as cur:
+        cur.execute(task_sql)
+        for row in cur.fetchall():
+            stats["by_task_type"][row["task_type"]] = {
+                "total": row["total"],
+                "available_positive": row["available_positive"],
+            }
+
+        cur.execute(dpo_sql)
+        row = cur.fetchone()
+        stats["dpo_pairs_available"] = int(row["dpo_count"]) if row else 0
+
+    return stats
--- a/packages/fine-tuner/src/dpo_trainer.py
+++ b/packages/fine-tuner/src/dpo_trainer.py
@ -0,0 +1,286 @@
+"""
+dpo_trainer.py - DPO (Direct Preference Optimization) fine-tuning.
+
+Turns human-edited outputs into (chosen, rejected) preference pairs
+and trains a policy model to prefer the human-edited versions.
+
+MPS limitations apply identically to trainer.py:
+  - float32, no fp16/bf16, gradient_checkpointing=False,
+    dataloader_num_workers=0, device_map not used.
+"""
+
+from __future__ import annotations
+
+import logging
+from pathlib import Path
+from typing import Optional
+
+import torch
+from datasets import Dataset
+from peft import LoraConfig, TaskType, get_peft_model
+from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments
+from trl import DPOTrainer
+
+logger = logging.getLogger(__name__)
+
+
+# ---------------------------------------------------------------------------
+# Preference pair construction
+# ---------------------------------------------------------------------------
+
+def build_preference_pairs(corpus_rows: list[dict]) -> list[dict]:
+    """
+    Build (prompt, chosen, rejected) triples from human-edited corpus rows.
+
+    Filters:
+    - human_edited must be True
+    - edited_output must be non-empty and differ from output_text
+
+    Returns a list of dicts with keys: prompt, chosen, rejected, task_type.
+    Immutable input — corpus_rows is never mutated.
+    """
+    pairs: list[dict] = []
+    skipped = 0
+
+    for row in corpus_rows:
+        if not row.get("human_edited"):
+            skipped += 1
+            continue
+
+        prompt = (row.get("input_text") or "").strip()
+        chosen = (row.get("edited_output") or "").strip()
+        rejected = (row.get("output_text") or "").strip()
+
+        if not prompt or not chosen or not rejected:
+            skipped += 1
+            continue
+
+        if chosen == rejected:
+            skipped += 1
+            continue
+
+        pairs.append(
+            {
+                "prompt": prompt,
+                "chosen": chosen,
+                "rejected": rejected,
+                "task_type": row.get("task_type", "general"),
+            }
+        )
+
+    logger.info(
+        "build_preference_pairs: %d valid pairs, %d skipped", len(pairs), skipped
+    )
+    return pairs
+
+
+def _pairs_to_dataset(pairs: list[dict]) -> Dataset:
+    """
+    Convert preference pair dicts to a HuggingFace Dataset.
+    DPOTrainer expects columns: prompt, chosen, rejected.
+    """
+    records = [
+        {
+            "prompt": p["prompt"],
+            "chosen": p["chosen"],
+            "rejected": p["rejected"],
+        }
+        for p in pairs
+    ]
+    return Dataset.from_list(records)
+
+
+# ---------------------------------------------------------------------------
+# Device selection (mirrors trainer.py)
+# ---------------------------------------------------------------------------
+
+def _select_device() -> str:
+    if torch.backends.mps.is_available() and torch.backends.mps.is_built():
+        return "mps"
+    if torch.cuda.is_available():
+        return "cuda"
+    return "cpu"
+
+
+def _load_model_and_tokenizer(
+    base_model_path: str, device: str
+) -> tuple:
+    logger.info("DPO: loading tokenizer from %s", base_model_path)
+    tokenizer = AutoTokenizer.from_pretrained(
+        base_model_path,
+        trust_remote_code=True,
+        padding_side="right",
+    )
+    if tokenizer.pad_token is None:
+        tokenizer.pad_token = tokenizer.eos_token
+
+    logger.info("DPO: loading model from %s on device=%s", base_model_path, device)
+
+    if device == "cuda":
+        model = AutoModelForCausalLM.from_pretrained(
+            base_model_path,
+            torch_dtype=torch.bfloat16,
+            device_map="auto",
+            trust_remote_code=True,
+        )
+    else:
+        model = AutoModelForCausalLM.from_pretrained(
+            base_model_path,
+            torch_dtype=torch.float32,
+            trust_remote_code=True,
+        )
+
+    model.config.use_cache = False
+    return model, tokenizer
+
+
+# ---------------------------------------------------------------------------
+# Main DPO training entry point
+# ---------------------------------------------------------------------------
+
+def run_dpo_training(
+    base_model_path: str,
+    pairs: list[dict],
+    output_dir: str,
+    beta: float = 0.1,
+    num_epochs: int = 1,
+    batch_size: int = 1,
+    gradient_accumulation_steps: int = 4,
+    learning_rate: float = 5e-5,
+    max_length: int = 2048,
+    max_prompt_length: int = 512,
+    lora_r: int = 16,
+    lora_alpha: int = 32,
+    lora_dropout: float = 0.05,
+) -> dict:
+    """
+    Run DPO preference learning.
+
+    Uses LoRA adapters on the base model to keep memory footprint small.
+    The reference model is the frozen base model; the policy model trains
+    on top of the LoRA adapter.
+
+    Returns:
+    {
+        "train_loss": float,
+        "adapter_path": str,
+        "n_pairs": int,
+        "device": str,
+        "beta": float,
+    }
+
+    Raises on fatal errors.
+    """
+    if len(pairs) < 10:
+        raise ValueError(
+            f"Insufficient preference pairs: need >= 10, got {len(pairs)}"
+        )
+
+    device = _select_device()
+    logger.info(
+        "run_dpo_training: device=%s beta=%.2f pairs=%d output=%s",
+        device,
+        beta,
+        len(pairs),
+        output_dir,
+    )
+
+    dataset = _pairs_to_dataset(pairs)
+
+    # Split: 90% train, 10% eval
+    split = dataset.train_test_split(test_size=0.1, seed=42)
+    train_dataset = split["train"]
+    eval_dataset = split["test"]
+
+    model, tokenizer = _load_model_and_tokenizer(base_model_path, device)
+
+    # Separate reference model (frozen copy of base)
+    ref_model, _ = _load_model_and_tokenizer(base_model_path, device)
+    for param in ref_model.parameters():
+        param.requires_grad = False
+
+    # LoRA adapter on the policy model
+    lora_config = LoraConfig(
+        r=lora_r,
+        lora_alpha=lora_alpha,
+        lora_dropout=lora_dropout,
+        bias="none",
+        task_type=TaskType.CAUSAL_LM,
+        target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
+        inference_mode=False,
+    )
+    model = get_peft_model(model, lora_config)
+    model.print_trainable_parameters()
+
+    if device in ("mps", "cpu"):
+        model = model.to(device)
+        ref_model = ref_model.to(device)
+
+    output_path = Path(output_dir)
+    output_path.mkdir(parents=True, exist_ok=True)
+
+    training_args = TrainingArguments(
+        output_dir=str(output_path),
+        num_train_epochs=num_epochs,
+        per_device_train_batch_size=batch_size,
+        per_device_eval_batch_size=batch_size,
+        gradient_accumulation_steps=gradient_accumulation_steps,
+        learning_rate=learning_rate,
+        warmup_ratio=0.1,
+        eval_strategy="steps",
+        eval_steps=25,
+        save_strategy="steps",
+        save_steps=50,
+        load_best_model_at_end=True,
+        metric_for_best_model="eval_loss",
+        greater_is_better=False,
+        logging_steps=5,
+        report_to="none",
+        dataloader_num_workers=0,
+        fp16=False,
+        bf16=False,
+        optim="adamw_torch",
+        gradient_checkpointing=False,
+        remove_unused_columns=False,
+        label_names=["labels"],
+    )
+
+    trainer = DPOTrainer(
+        model=model,
+        ref_model=ref_model,
+        args=training_args,
+        beta=beta,
+        train_dataset=train_dataset,
+        eval_dataset=eval_dataset,
+        tokenizer=tokenizer,
+        max_length=max_length,
+        max_prompt_length=max_prompt_length,
+    )
+
+    logger.info(
+        "Starting DPO training: %d train pairs, %d eval pairs",
+        len(train_dataset),
+        len(eval_dataset),
+    )
+
+    train_result = trainer.train()
+
+    eval_metrics = trainer.evaluate()
+    logger.info("DPO eval metrics: %s", eval_metrics)
+
+    adapter_path = str(output_path / "dpo_adapter")
+    model.save_pretrained(adapter_path)
+    tokenizer.save_pretrained(adapter_path)
+    logger.info("Saved DPO adapter to %s", adapter_path)
+
+    return {
+        "train_loss": round(train_result.training_loss, 4),
+        "eval_loss": round(eval_metrics.get("eval_loss", -1.0), 4),
+        "train_runtime": round(train_result.metrics.get("train_runtime", 0.0), 1),
+        "n_pairs": len(pairs),
+        "train_pairs": len(train_dataset),
+        "eval_pairs": len(eval_dataset),
+        "adapter_path": adapter_path,
+        "device": device,
+        "beta": beta,
+    }
--- a/packages/fine-tuner/src/evaluator.py
+++ b/packages/fine-tuner/src/evaluator.py
@ -0,0 +1,268 @@
+"""
+evaluator.py - Automated evaluation before and after fine-tuning.
+
+Runs a fixed set of representative prompts through the gateway and
+compares confidence scores between a baseline model and a candidate
+fine-tuned model.  No mutable global state; all functions are pure
+with respect to external side effects beyond HTTP calls.
+"""
+
+from __future__ import annotations
+
+import logging
+import time
+from typing import Optional
+
+import requests
+
+logger = logging.getLogger(__name__)
+
+# ---------------------------------------------------------------------------
+# Evaluation prompt suites — one list per known task_type
+# ---------------------------------------------------------------------------
+
+EVAL_PROMPTS: dict[str, list[str]] = {
+    "linkedin-post-de": [
+        "Schreib einen professionellen LinkedIn-Post über 400G Transceiver Deployment in modernen Rechenzentren.",
+        "Post über RPKI und BGP-Sicherheit für Netzwerkingenieure.",
+        "Post über optische Transceiver im Rechenzentrum: Multimode vs. Singlemode.",
+        "Post über das Flexoptix T-Series Transceiver-Programm und seine Vorteile.",
+        "Post über Herausforderungen beim Netzwerk-Upgrade auf 100G im Enterprise-Bereich.",
+    ],
+    "linkedin-post-en": [
+        "Write a professional LinkedIn post about 400G transceiver deployment challenges.",
+        "Post about RPKI adoption and BGP security improvements in 2025.",
+        "Post about open optical networking and disaggregated architectures.",
+        "Post about the economics of refurbished transceivers vs OEM pricing.",
+        "Post about sustainable data center networking and energy efficiency.",
+    ],
+    "tip-transceiver-enrich": [
+        "Enrich this transceiver spec: QSFP28 100G LR4 1310nm 10km SMF LC duplex",
+        "Enrich this spec: SFP+ 10G SR 850nm 300m MMF LC duplex",
+        "Enrich: QSFP-DD 400G DR4 1310nm 500m SMF MPO-12",
+        "Enrich: CFP2-DCO 100G DWDM tunable C-band coherent",
+        "Enrich: SFP28 25G LR 1310nm 10km SMF LC duplex",
+    ],
+    "tip-compatibility-check": [
+        "Is a Cisco SFP-10G-SR compatible with a Juniper QFX5100?",
+        "Can I use a generic QSFP28 100G LR4 in a Nokia 7750?",
+        "Is Flexoptix SFP+ compatible with Arista 7050X?",
+        "Does a QSFP-DD 400G SR8 work in a Cisco Nexus 9000?",
+        "Can I mix different vendor QSFP28 transceivers in the same chassis?",
+    ],
+    "tip-datasheet-parse": [
+        "Parse key specs from: Finisar FTLX8574D3BCL 10GBase-SR SFP+ 850nm 300m -5 to 85C 3.3V 1W",
+        "Extract specs: InnoLight T-DP4CNH-OC1 100G QSFP28 SR4 850nm 100m 3.5W",
+        "Parse: Lumentum 10GB-LR-SFPP 10G SFP+ LR 1310nm 10km DDM",
+    ],
+    "email-draft-de": [
+        "Schreib eine Vertriebs-E-Mail an einen ISP-Kunden über optische Transceiver.",
+        "Follow-up-E-Mail nach Messe-Gespräch über 400G-Upgrade-Projekt.",
+        "E-Mail an Einkauf: Angebot für 50x QSFP28 100G LR4 Transceiver.",
+    ],
+    "content-summary": [
+        "Summarize this BGP RFC: RFC 4271 defines Border Gateway Protocol 4, covering OPEN, UPDATE, NOTIFICATION, KEEPALIVE messages and path attributes for inter-domain routing.",
+        "Summarize: IEEE 802.3ba defines 40GbE and 100GbE standards including CR4, SR4, LR4 and ER4 physical layer specifications.",
+    ],
+    "general": [
+        "What is the difference between coherent and direct-detect optical transceivers?",
+        "Explain LoRA vs full fine-tuning for LLMs.",
+        "What is DWDM and how does it increase fiber capacity?",
+    ],
+}
+
+
+# ---------------------------------------------------------------------------
+# Single-model evaluation
+# ---------------------------------------------------------------------------
+
+def run_eval_suite(
+    model_name: str,
+    gateway_url: str,
+    task_types: Optional[list[str]] = None,
+    timeout_s: int = 60,
+) -> dict[str, float]:
+    """
+    Run all standard eval prompts through the gateway for model_name.
+
+    Returns a mapping of task_type → average confidence score.
+    Missing task_type keys indicate evaluation was skipped (e.g. no prompts).
+    """
+    if task_types is None:
+        task_types = list(EVAL_PROMPTS.keys())
+
+    results: dict[str, list[float]] = {tt: [] for tt in task_types}
+
+    for task_type in task_types:
+        prompts = EVAL_PROMPTS.get(task_type, [])
+        if not prompts:
+            logger.warning("run_eval_suite: no prompts for task_type=%s", task_type)
+            continue
+
+        for prompt in prompts:
+            confidence = _call_gateway(
+                gateway_url=gateway_url,
+                model=model_name,
+                prompt=prompt,
+                task_type=task_type,
+                timeout_s=timeout_s,
+            )
+            if confidence is not None:
+                results[task_type].append(confidence)
+
+            # Rate-limit gateway calls
+            time.sleep(0.5)
+
+    averages: dict[str, float] = {}
+    for task_type, scores in results.items():
+        if scores:
+            averages[task_type] = round(sum(scores) / len(scores), 3)
+            logger.info(
+                "eval task_type=%s model=%s avg_confidence=%.3f (n=%d)",
+                task_type,
+                model_name,
+                averages[task_type],
+                len(scores),
+            )
+        else:
+            logger.warning(
+                "eval task_type=%s model=%s: no successful responses",
+                task_type,
+                model_name,
+            )
+
+    return averages
+
+
+# ---------------------------------------------------------------------------
+# Side-by-side model comparison
+# ---------------------------------------------------------------------------
+
+def compare_models(
+    baseline_model: str,
+    new_model: str,
+    gateway_url: str,
+    task_types: Optional[list[str]] = None,
+) -> dict:
+    """
+    Run eval suites for both models and compute deltas.
+
+    Returns:
+    {
+        "baseline": {task_type: avg_confidence},
+        "candidate": {task_type: avg_confidence},
+        "delta": {task_type: candidate - baseline},
+        "overall_delta": float,
+        "improvement": bool,  # True if overall_delta > 0
+        "latency_baseline_ms": float,
+        "latency_candidate_ms": float,
+    }
+    """
+    if task_types is None:
+        task_types = list(EVAL_PROMPTS.keys())
+
+    logger.info("compare_models: baseline=%s candidate=%s", baseline_model, new_model)
+
+    t0 = time.monotonic()
+    baseline_scores = run_eval_suite(baseline_model, gateway_url, task_types)
+    latency_baseline_ms = (time.monotonic() - t0) * 1000
+
+    t1 = time.monotonic()
+    candidate_scores = run_eval_suite(new_model, gateway_url, task_types)
+    latency_candidate_ms = (time.monotonic() - t1) * 1000
+
+    delta: dict[str, float] = {}
+    all_deltas: list[float] = []
+
+    for task_type in task_types:
+        base_val = baseline_scores.get(task_type)
+        cand_val = candidate_scores.get(task_type)
+        if base_val is not None and cand_val is not None:
+            d = round(cand_val - base_val, 3)
+            delta[task_type] = d
+            all_deltas.append(d)
+
+    overall_delta = round(sum(all_deltas) / len(all_deltas), 3) if all_deltas else 0.0
+
+    return {
+        "baseline": baseline_scores,
+        "candidate": candidate_scores,
+        "delta": delta,
+        "overall_delta": overall_delta,
+        "improvement": overall_delta > 0,
+        "latency_baseline_ms": round(latency_baseline_ms, 1),
+        "latency_candidate_ms": round(latency_candidate_ms, 1),
+    }
+
+
+# ---------------------------------------------------------------------------
+# Gateway HTTP helper
+# ---------------------------------------------------------------------------
+
+def _call_gateway(
+    gateway_url: str,
+    model: str,
+    prompt: str,
+    task_type: str,
+    timeout_s: int = 60,
+) -> Optional[float]:
+    """
+    Call the LLM Gateway /v1/completion endpoint and extract confidence.
+
+    Returns the confidence score on success, None on any error.
+    Never raises — callers handle None gracefully.
+    """
+    payload = {
+        "model": model,
+        "prompt": prompt,
+        "task_type": task_type,
+        "max_tokens": 512,
+        "stream": False,
+    }
+
+    try:
+        resp = requests.post(
+            f"{gateway_url}/v1/completion",
+            json=payload,
+            timeout=timeout_s,
+            headers={"Content-Type": "application/json"},
+        )
+        resp.raise_for_status()
+        data = resp.json()
+        # Gateway returns confidence in top-level field or nested in metadata
+        confidence = data.get("confidence") or data.get("metadata", {}).get("confidence")
+        if confidence is None:
+            logger.warning(
+                "_call_gateway: no confidence field in response for model=%s task=%s",
+                model,
+                task_type,
+            )
+            return None
+        return float(confidence)
+
+    except requests.exceptions.Timeout:
+        logger.warning(
+            "_call_gateway: timeout after %ds for model=%s task=%s",
+            timeout_s,
+            model,
+            task_type,
+        )
+        return None
+
+    except requests.exceptions.RequestException as exc:
+        logger.error(
+            "_call_gateway: HTTP error for model=%s task=%s: %s",
+            model,
+            task_type,
+            exc,
+        )
+        return None
+
+    except (KeyError, ValueError, TypeError) as exc:
+        logger.error(
+            "_call_gateway: malformed response for model=%s task=%s: %s",
+            model,
+            task_type,
+            exc,
+        )
+        return None
--- a/packages/fine-tuner/src/main.py
+++ b/packages/fine-tuner/src/main.py
@ -0,0 +1,519 @@
+"""
+main.py - Fine-tuning orchestrator for the LLM Gateway.
+
+Polls PostgreSQL every 30 minutes, evaluates trigger conditions via
+scheduler.py, then dispatches LoRA SFT or DPO runs as appropriate.
+Results are recorded in the fine_tuning_runs table and optionally
+reported to the gateway via HTTP.
+
+Environment variables override config-file defaults:
+  FT_DB_URL        — PostgreSQL connection string
+  FT_GATEWAY_URL   — LLM Gateway base URL
+  FT_OLLAMA_URL    — Ollama API base URL
+  FT_CONFIG_PATH   — Path to fine_tuner.yaml
+"""
+
+from __future__ import annotations
+
+import logging
+import os
+import time
+import uuid
+from datetime import datetime, timezone
+from pathlib import Path
+from typing import Optional
+
+import psycopg2
+import psycopg2.extras
+import requests
+import yaml
+
+from .data_collector import (
+    collect_positive_examples,
+    collect_preference_pairs,
+    mark_as_used,
+    mark_as_used_dpo,
+)
+from .scheduler import (
+    list_active_task_types,
+    should_trigger_dpo,
+    should_trigger_general,
+    should_trigger_task_specific,
+)
+
+logging.basicConfig(
+    level=logging.INFO,
+    format="%(asctime)s %(levelname)s [%(name)s] %(message)s",
+)
+logger = logging.getLogger(__name__)
+
+# ---------------------------------------------------------------------------
+# Config loading
+# ---------------------------------------------------------------------------
+
+_BASE_DIR = Path(__file__).parent.parent
+_DEFAULT_CONFIG = _BASE_DIR / "config" / "fine_tuner.yaml"
+
+DEFAULT_DB_URL = "postgresql://llm:llm_secure_password@localhost:5432/llm_gateway"
+DEFAULT_GATEWAY_URL = "http://localhost:3100"
+DEFAULT_OLLAMA_URL = "http://192.168.178.169:11434"
+
+
+def load_config(path: Optional[str] = None) -> dict:
+    """Load YAML config, applying environment variable overrides."""
+    config_path = Path(path or os.environ.get("FT_CONFIG_PATH", str(_DEFAULT_CONFIG)))
+
+    if not config_path.exists():
+        logger.warning("Config not found at %s — using defaults", config_path)
+        cfg: dict = {}
+    else:
+        with config_path.open() as fh:
+            cfg = yaml.safe_load(fh) or {}
+        logger.info("Loaded config from %s", config_path)
+
+    # Environment variable overrides
+    cfg["database_url"] = os.environ.get("FT_DB_URL", cfg.get("database_url", DEFAULT_DB_URL))
+    cfg["gateway_url"] = os.environ.get("FT_GATEWAY_URL", cfg.get("gateway_url", DEFAULT_GATEWAY_URL))
+    cfg["ollama_url"] = os.environ.get("FT_OLLAMA_URL", cfg.get("ollama_url", DEFAULT_OLLAMA_URL))
+
+    return cfg
+
+
+# ---------------------------------------------------------------------------
+# Database helpers
+# ---------------------------------------------------------------------------
+
+def _connect(db_url: str) -> psycopg2.extensions.connection:
+    """Open a psycopg2 connection. Raises on failure."""
+    return psycopg2.connect(db_url)
+
+
+def _create_run_record(
+    conn: psycopg2.extensions.connection,
+    run_id: str,
+    run_type: str,
+    task_type: Optional[str],
+) -> None:
+    """Insert a new fine_tuning_runs row with status='running'."""
+    sql = """
+        INSERT INTO fine_tuning_runs
+            (id, run_type, task_type, status, started_at)
+        VALUES
+            (%(id)s, %(run_type)s, %(task_type)s, 'running', %(started_at)s)
+        ON CONFLICT (id) DO NOTHING
+    """
+    with conn.cursor() as cur:
+        cur.execute(
+            sql,
+            {
+                "id": run_id,
+                "run_type": run_type,
+                "task_type": task_type,
+                "started_at": datetime.now(timezone.utc),
+            },
+        )
+    conn.commit()
+
+
+def _update_run_record(
+    conn: psycopg2.extensions.connection,
+    run_id: str,
+    status: str,
+    metrics: dict,
+    model_name: Optional[str] = None,
+) -> None:
+    """Update fine_tuning_runs with the final status and metrics."""
+    sql = """
+        UPDATE fine_tuning_runs
+        SET
+            status = %(status)s,
+            completed_at = %(completed_at)s,
+            metrics = %(metrics)s,
+            deployed_model_name = %(model_name)s
+        WHERE id = %(id)s
+    """
+    with conn.cursor() as cur:
+        cur.execute(
+            sql,
+            {
+                "id": run_id,
+                "status": status,
+                "completed_at": datetime.now(timezone.utc),
+                "metrics": psycopg2.extras.Json(metrics),
+                "model_name": model_name,
+            },
+        )
+    conn.commit()
+
+
+# ---------------------------------------------------------------------------
+# Trigger detection
+# ---------------------------------------------------------------------------
+
+def check_triggers(conn: psycopg2.extensions.connection) -> list[dict]:
+    """
+    Evaluate all trigger conditions and return a list of trigger descriptors.
+
+    Each descriptor: {"run_type": str, "task_type": str|None}
+    Triggers are ordered: task-specific first, then general, then DPO.
+    At most one trigger per run_type/task_type combination is returned.
+    """
+    triggers: list[dict] = []
+
+    # Task-specific LoRA triggers — one per task_type
+    task_types = list_active_task_types(conn)
+    for task_type in task_types:
+        if should_trigger_task_specific(conn, task_type):
+            triggers.append({"run_type": "task_specific", "task_type": task_type})
+
+    # General SFT trigger
+    if should_trigger_general(conn):
+        triggers.append({"run_type": "general", "task_type": None})
+
+    # DPO trigger
+    if should_trigger_dpo(conn):
+        triggers.append({"run_type": "dpo", "task_type": None})
+
+    if triggers:
+        logger.info("check_triggers: %d triggers pending: %s", len(triggers), triggers)
+    else:
+        logger.info("check_triggers: no triggers pending")
+
+    return triggers
+
+
+# ---------------------------------------------------------------------------
+# Fine-tuning dispatch
+# ---------------------------------------------------------------------------
+
+def run_fine_tuning(
+    conn: psycopg2.extensions.connection,
+    trigger: dict,
+    cfg: dict,
+) -> None:
+    """
+    Execute a single fine-tuning run end-to-end.
+
+    1. Records the run in DB.
+    2. Collects training data.
+    3. Dispatches to trainer.py or dpo_trainer.py.
+    4. Converts & registers the result with Ollama.
+    5. Updates the DB record.
+    6. Notifies the gateway.
+    """
+    run_id = str(uuid.uuid4())
+    run_type: str = trigger["run_type"]
+    task_type: Optional[str] = trigger.get("task_type")
+
+    logger.info(
+        "run_fine_tuning: run_id=%s type=%s task=%s",
+        run_id,
+        run_type,
+        task_type,
+    )
+
+    _create_run_record(conn, run_id, run_type, task_type)
+
+    training_cfg = cfg.get("training", {})
+    output_cfg = cfg.get("output", {})
+    eval_cfg = cfg.get("evaluation", {})
+    models_cfg = cfg.get("models", {})
+    llama_cfg = cfg.get("llama_cpp", {})
+
+    adapters_dir = _BASE_DIR / output_cfg.get("adapters_dir", "adapters") / run_id
+    models_dir = _BASE_DIR / output_cfg.get("models_dir", "models") / run_id
+
+    try:
+        if run_type == "dpo":
+            _run_dpo(
+                conn=conn,
+                run_id=run_id,
+                task_type=task_type,
+                cfg=cfg,
+                training_cfg=training_cfg,
+                models_cfg=models_cfg,
+                llama_cfg=llama_cfg,
+                eval_cfg=eval_cfg,
+                adapters_dir=adapters_dir,
+                models_dir=models_dir,
+            )
+        else:
+            _run_sft(
+                conn=conn,
+                run_id=run_id,
+                run_type=run_type,
+                task_type=task_type,
+                cfg=cfg,
+                training_cfg=training_cfg,
+                models_cfg=models_cfg,
+                llama_cfg=llama_cfg,
+                eval_cfg=eval_cfg,
+                adapters_dir=adapters_dir,
+                models_dir=models_dir,
+            )
+
+    except Exception as exc:
+        logger.error(
+            "run_fine_tuning: run_id=%s FAILED: %s",
+            run_id,
+            exc,
+            exc_info=True,
+        )
+        _update_run_record(conn, run_id, "failed", {"error": str(exc)})
+        _notify_gateway(
+            cfg["gateway_url"],
+            {"run_id": run_id, "status": "failed", "error": str(exc)},
+        )
+
+
+def _run_sft(
+    conn,
+    run_id: str,
+    run_type: str,
+    task_type: Optional[str],
+    cfg: dict,
+    training_cfg: dict,
+    models_cfg: dict,
+    llama_cfg: dict,
+    eval_cfg: dict,
+    adapters_dir: Path,
+    models_dir: Path,
+) -> None:
+    from .trainer import run_lora_training
+    from .converter import run_conversion_and_registration
+
+    sft_cfg = training_cfg.get("sft", {})
+
+    # Choose base model: smaller for task-specific, larger for general
+    if run_type == "task_specific":
+        base_model = models_cfg.get("qwen_7b_hf", "Qwen/Qwen2.5-7B-Instruct")
+    else:
+        base_model = models_cfg.get("qwen_14b_hf", "Qwen/Qwen2.5-14B-Instruct")
+
+    min_confidence = 7.5 if run_type == "task_specific" else 7.0
+
+    examples = collect_positive_examples(
+        conn,
+        task_type=task_type,
+        min_confidence=min_confidence,
+        limit=500,
+    )
+
+    if not examples:
+        raise ValueError(f"No examples available for run_type={run_type} task_type={task_type}")
+
+    # 90/10 train/val split
+    split_idx = max(1, int(len(examples) * 0.9))
+    train_examples = examples[:split_idx]
+    val_examples = examples[split_idx:]
+
+    logger.info(
+        "_run_sft: base=%s train=%d val=%d task=%s",
+        base_model,
+        len(train_examples),
+        len(val_examples),
+        task_type,
+    )
+
+    train_metrics = run_lora_training(
+        base_model_path=base_model,
+        train_examples=train_examples,
+        val_examples=val_examples,
+        output_dir=str(adapters_dir),
+        task_type=task_type,
+        lora_r=training_cfg.get("lora_r", 16),
+        lora_alpha=training_cfg.get("lora_alpha", 32),
+        lora_dropout=training_cfg.get("lora_dropout", 0.05),
+        max_seq_length=training_cfg.get("max_seq_length", 2048),
+        num_epochs=sft_cfg.get("num_epochs", 3),
+        batch_size=sft_cfg.get("batch_size", 1),
+        gradient_accumulation_steps=sft_cfg.get("gradient_accumulation", 8),
+        learning_rate=sft_cfg.get("learning_rate", 2e-4),
+        warmup_ratio=sft_cfg.get("warmup_ratio", 0.1),
+    )
+
+    adapter_path = train_metrics["adapter_path"]
+
+    # Mark examples as consumed
+    example_ids = [str(ex["id"]) for ex in examples]
+    mark_as_used(conn, example_ids, run_id)
+
+    # Convert & deploy
+    conversion_result = run_conversion_and_registration(
+        base_model_path=base_model,
+        adapter_path=adapter_path,
+        task_type=task_type,
+        output_base_dir=str(models_dir),
+        ollama_url=cfg["ollama_url"],
+        gateway_url=cfg["gateway_url"],
+        quantization=llama_cfg.get("default_quantization", "Q5_K_M"),
+        min_confidence_to_deploy=eval_cfg.get("min_improvement_to_deploy", 0.3),
+        convert_script=llama_cfg.get("convert_script", ""),
+        quantize_binary=llama_cfg.get("quantize_binary", ""),
+    )
+
+    combined_metrics = {**train_metrics, **conversion_result}
+    status = "completed" if conversion_result["success"] else "trained_not_deployed"
+
+    _update_run_record(
+        conn,
+        run_id,
+        status,
+        combined_metrics,
+        model_name=conversion_result.get("model_name") if conversion_result["success"] else None,
+    )
+
+    _notify_gateway(
+        cfg["gateway_url"],
+        {
+            "run_id": run_id,
+            "run_type": run_type,
+            "task_type": task_type,
+            "status": status,
+            "metrics": combined_metrics,
+        },
+    )
+
+    logger.info("_run_sft: run_id=%s completed, status=%s", run_id, status)
+
+
+def _run_dpo(
+    conn,
+    run_id: str,
+    task_type: Optional[str],
+    cfg: dict,
+    training_cfg: dict,
+    models_cfg: dict,
+    llama_cfg: dict,
+    eval_cfg: dict,
+    adapters_dir: Path,
+    models_dir: Path,
+) -> None:
+    from .dpo_trainer import build_preference_pairs, run_dpo_training
+    from .converter import run_conversion_and_registration
+
+    dpo_cfg = training_cfg.get("dpo", {})
+    base_model = models_cfg.get("qwen_7b_hf", "Qwen/Qwen2.5-7B-Instruct")
+
+    corpus_rows = collect_preference_pairs(conn, task_type=task_type, limit=200)
+    pairs = build_preference_pairs(corpus_rows)
+
+    if not pairs:
+        raise ValueError("No valid preference pairs available for DPO")
+
+    logger.info("_run_dpo: base=%s pairs=%d", base_model, len(pairs))
+
+    train_metrics = run_dpo_training(
+        base_model_path=base_model,
+        pairs=pairs,
+        output_dir=str(adapters_dir),
+        beta=dpo_cfg.get("beta", 0.1),
+        num_epochs=dpo_cfg.get("num_epochs", 1),
+        batch_size=dpo_cfg.get("batch_size", 1),
+        gradient_accumulation_steps=dpo_cfg.get("gradient_accumulation", 4),
+        learning_rate=dpo_cfg.get("learning_rate", 5e-5),
+        max_length=training_cfg.get("max_seq_length", 2048),
+    )
+
+    adapter_path = train_metrics["adapter_path"]
+
+    # Mark preference pairs as consumed
+    pair_ids = [str(row["id"]) for row in corpus_rows if row.get("id")]
+    mark_as_used_dpo(conn, pair_ids, run_id)
+
+    # Convert & deploy (DPO uses task_type="preference" in model name)
+    conversion_result = run_conversion_and_registration(
+        base_model_path=base_model,
+        adapter_path=adapter_path,
+        task_type="preference-dpo",
+        output_base_dir=str(models_dir),
+        ollama_url=cfg["ollama_url"],
+        gateway_url=cfg["gateway_url"],
+        quantization=llama_cfg.get("default_quantization", "Q5_K_M"),
+        min_confidence_to_deploy=eval_cfg.get("min_improvement_to_deploy", 0.3),
+        convert_script=llama_cfg.get("convert_script", ""),
+        quantize_binary=llama_cfg.get("quantize_binary", ""),
+    )
+
+    combined_metrics = {**train_metrics, **conversion_result}
+    status = "completed" if conversion_result["success"] else "trained_not_deployed"
+
+    _update_run_record(
+        conn,
+        run_id,
+        status,
+        combined_metrics,
+        model_name=conversion_result.get("model_name") if conversion_result["success"] else None,
+    )
+
+    _notify_gateway(
+        cfg["gateway_url"],
+        {
+            "run_id": run_id,
+            "run_type": "dpo",
+            "task_type": task_type,
+            "status": status,
+            "metrics": combined_metrics,
+        },
+    )
+
+    logger.info("_run_dpo: run_id=%s completed, status=%s", run_id, status)
+
+
+# ---------------------------------------------------------------------------
+# Gateway notification
+# ---------------------------------------------------------------------------
+
+def _notify_gateway(gateway_url: str, payload: dict) -> None:
+    """POST run results to the gateway; failure is logged but not raised."""
+    url = f"{gateway_url}/internal/fine-tuning/result"
+    try:
+        resp = requests.post(
+            url,
+            json=payload,
+            timeout=30,
+            headers={"Content-Type": "application/json"},
+        )
+        resp.raise_for_status()
+        logger.info("_notify_gateway: notified gateway, status=%d", resp.status_code)
+    except requests.exceptions.RequestException as exc:
+        logger.warning("_notify_gateway: could not notify gateway: %s", exc)
+
+
+# ---------------------------------------------------------------------------
+# Main loop
+# ---------------------------------------------------------------------------
+
+def main() -> None:
+    cfg = load_config()
+    db_url = cfg["database_url"]
+    poll_interval = 1800  # 30 minutes
+
+    logger.info(
+        "Fine-tuner started. db=%s gateway=%s ollama=%s poll_interval=%ds",
+        db_url,
+        cfg["gateway_url"],
+        cfg["ollama_url"],
+        poll_interval,
+    )
+
+    while True:
+        try:
+            conn = _connect(db_url)
+            try:
+                triggers = check_triggers(conn)
+                for trigger in triggers:
+                    run_fine_tuning(conn, trigger, cfg)
+            finally:
+                conn.close()
+        except psycopg2.OperationalError as exc:
+            logger.error("Database connection error: %s — will retry in %ds", exc, poll_interval)
+        except Exception as exc:
+            logger.error("Unexpected error in main loop: %s", exc, exc_info=True)
+
+        logger.info("Sleeping %ds until next check...", poll_interval)
+        time.sleep(poll_interval)
+
+
+if __name__ == "__main__":
+    main()
--- a/packages/fine-tuner/src/scheduler.py
+++ b/packages/fine-tuner/src/scheduler.py
@ -0,0 +1,306 @@
+"""
+scheduler.py - Cron-based trigger logic for fine-tuning runs.
+
+Evaluates whether each fine-tuning strategy (task-specific LoRA,
+general SFT, DPO) should fire based on corpus size and recency of
+previous runs.  All DB access is read-only; no mutations happen here.
+"""
+
+from __future__ import annotations
+
+import logging
+from datetime import datetime, timezone
+from typing import Optional
+
+import psycopg2
+import psycopg2.extras
+
+logger = logging.getLogger(__name__)
+
+
+# ---------------------------------------------------------------------------
+# Thresholds (immutable constants — never mutate at runtime)
+# ---------------------------------------------------------------------------
+
+FINE_TUNING_THRESHOLDS: dict = {
+    "task_specific": {
+        "min_positive_examples": 100,
+        "min_confidence": 7.5,
+        "min_days_since_last_run": 7,
+    },
+    "general": {
+        "min_positive_examples": 500,
+        "min_confidence": 7.0,
+        "min_days_since_last_run": 14,
+    },
+    "dpo": {
+        "min_preference_pairs": 50,
+        "min_days_since_last_run": 7,
+    },
+}
+
+
+# ---------------------------------------------------------------------------
+# Helpers
+# ---------------------------------------------------------------------------
+
+def _days_since_last_run(
+    conn: psycopg2.extensions.connection,
+    run_type: str,
+    task_type: Optional[str] = None,
+) -> float:
+    """
+    Return days elapsed since the most recent *completed* run of the given type.
+    Returns a very large number when no prior run exists (always trigger).
+    """
+    params: dict = {"run_type": run_type}
+
+    if task_type is not None:
+        sql = """
+            SELECT MAX(completed_at) AS last_run
+            FROM fine_tuning_runs
+            WHERE
+                run_type = %(run_type)s
+                AND task_type = %(task_type)s
+                AND status = 'completed'
+        """
+        params["task_type"] = task_type
+    else:
+        sql = """
+            SELECT MAX(completed_at) AS last_run
+            FROM fine_tuning_runs
+            WHERE run_type = %(run_type)s AND status = 'completed'
+        """
+
+    with conn.cursor(cursor_factory=psycopg2.extras.RealDictCursor) as cur:
+        cur.execute(sql, params)
+        row = cur.fetchone()
+
+    if row is None or row["last_run"] is None:
+        return float("inf")
+
+    last_run: datetime = row["last_run"]
+    if last_run.tzinfo is None:
+        last_run = last_run.replace(tzinfo=timezone.utc)
+
+    elapsed = (datetime.now(timezone.utc) - last_run).total_seconds() / 86400
+    return elapsed
+
+
+def _count_available_positive(
+    conn: psycopg2.extensions.connection,
+    task_type: Optional[str],
+    min_confidence: float,
+) -> int:
+    """Count unused positive examples meeting the confidence bar."""
+    params: dict = {"min_confidence": min_confidence}
+    sql_base = """
+        SELECT COUNT(*) AS cnt
+        FROM learning_corpus
+        WHERE
+            status = 'approved'
+            AND confidence_score >= %(min_confidence)s
+            AND used_in_training IS NULL
+            AND input_text IS NOT NULL
+            AND output_text IS NOT NULL
+            AND system_prompt IS NOT NULL
+    """
+    if task_type is not None:
+        sql_base += " AND task_type = %(task_type)s"
+        params["task_type"] = task_type
+
+    with conn.cursor(cursor_factory=psycopg2.extras.RealDictCursor) as cur:
+        cur.execute(sql_base, params)
+        row = cur.fetchone()
+
+    return int(row["cnt"]) if row else 0
+
+
+def _count_available_dpo_pairs(conn: psycopg2.extensions.connection) -> int:
+    """Count unused preference pairs for DPO training."""
+    sql = """
+        SELECT COUNT(*) AS cnt
+        FROM learning_corpus
+        WHERE
+            human_edited = TRUE
+            AND edited_output IS NOT NULL
+            AND edited_output <> output_text
+            AND used_in_dpo_training IS NULL
+            AND input_text IS NOT NULL
+            AND output_text IS NOT NULL
+    """
+    with conn.cursor(cursor_factory=psycopg2.extras.RealDictCursor) as cur:
+        cur.execute(sql)
+        row = cur.fetchone()
+
+    return int(row["cnt"]) if row else 0
+
+
+def _is_run_in_progress(
+    conn: psycopg2.extensions.connection,
+    run_type: str,
+    task_type: Optional[str] = None,
+) -> bool:
+    """Return True if a run of this type is currently running or queued."""
+    params: dict = {"run_type": run_type}
+    sql_base = """
+        SELECT 1 FROM fine_tuning_runs
+        WHERE run_type = %(run_type)s AND status IN ('queued', 'running')
+    """
+    if task_type is not None:
+        sql_base += " AND task_type = %(task_type)s"
+        params["task_type"] = task_type
+
+    sql_base += " LIMIT 1"
+
+    with conn.cursor() as cur:
+        cur.execute(sql_base, params)
+        return cur.fetchone() is not None
+
+
+# ---------------------------------------------------------------------------
+# Public trigger functions
+# ---------------------------------------------------------------------------
+
+def should_trigger_task_specific(
+    conn: psycopg2.extensions.connection,
+    task_type: str,
+) -> bool:
+    """
+    Return True if a task-specific LoRA run should be started for task_type.
+
+    Conditions (all must be true):
+    1. Enough unused positive examples (>= 100 with confidence >= 7.5)
+    2. No run of this type is currently in progress for this task_type
+    3. At least 7 days since the last completed run for this task_type
+    """
+    thresholds = FINE_TUNING_THRESHOLDS["task_specific"]
+
+    if _is_run_in_progress(conn, "task_specific", task_type):
+        logger.debug("should_trigger_task_specific(%s): run already in progress", task_type)
+        return False
+
+    count = _count_available_positive(
+        conn, task_type, thresholds["min_confidence"]
+    )
+    if count < thresholds["min_positive_examples"]:
+        logger.debug(
+            "should_trigger_task_specific(%s): only %d examples (need %d)",
+            task_type,
+            count,
+            thresholds["min_positive_examples"],
+        )
+        return False
+
+    days = _days_since_last_run(conn, "task_specific", task_type)
+    if days < thresholds["min_days_since_last_run"]:
+        logger.debug(
+            "should_trigger_task_specific(%s): last run %.1f days ago (need %d)",
+            task_type,
+            days,
+            thresholds["min_days_since_last_run"],
+        )
+        return False
+
+    logger.info(
+        "should_trigger_task_specific(%s): TRIGGER — %d examples, %.1f days since last run",
+        task_type,
+        count,
+        days,
+    )
+    return True
+
+
+def should_trigger_general(conn: psycopg2.extensions.connection) -> bool:
+    """
+    Return True if a general (cross-task) SFT run should be started.
+
+    Conditions (all must be true):
+    1. Total unused positive examples across all tasks >= 500
+    2. No general run currently in progress
+    3. At least 14 days since the last completed general run
+    """
+    thresholds = FINE_TUNING_THRESHOLDS["general"]
+
+    if _is_run_in_progress(conn, "general"):
+        logger.debug("should_trigger_general: run already in progress")
+        return False
+
+    count = _count_available_positive(conn, None, thresholds["min_confidence"])
+    if count < thresholds["min_positive_examples"]:
+        logger.debug(
+            "should_trigger_general: only %d examples (need %d)",
+            count,
+            thresholds["min_positive_examples"],
+        )
+        return False
+
+    days = _days_since_last_run(conn, "general")
+    if days < thresholds["min_days_since_last_run"]:
+        logger.debug(
+            "should_trigger_general: last run %.1f days ago (need %d)",
+            days,
+            thresholds["min_days_since_last_run"],
+        )
+        return False
+
+    logger.info(
+        "should_trigger_general: TRIGGER — %d examples, %.1f days since last run",
+        count,
+        days,
+    )
+    return True
+
+
+def should_trigger_dpo(conn: psycopg2.extensions.connection) -> bool:
+    """
+    Return True if a DPO preference-learning run should be started.
+
+    Conditions (all must be true):
+    1. At least 50 unused human-edited preference pairs
+    2. No DPO run currently in progress
+    3. At least 7 days since the last completed DPO run
+    """
+    thresholds = FINE_TUNING_THRESHOLDS["dpo"]
+
+    if _is_run_in_progress(conn, "dpo"):
+        logger.debug("should_trigger_dpo: run already in progress")
+        return False
+
+    pairs = _count_available_dpo_pairs(conn)
+    if pairs < thresholds["min_preference_pairs"]:
+        logger.debug(
+            "should_trigger_dpo: only %d pairs (need %d)",
+            pairs,
+            thresholds["min_preference_pairs"],
+        )
+        return False
+
+    days = _days_since_last_run(conn, "dpo")
+    if days < thresholds["min_days_since_last_run"]:
+        logger.debug(
+            "should_trigger_dpo: last run %.1f days ago (need %d)",
+            days,
+            thresholds["min_days_since_last_run"],
+        )
+        return False
+
+    logger.info(
+        "should_trigger_dpo: TRIGGER — %d preference pairs, %.1f days since last run",
+        pairs,
+        days,
+    )
+    return True
+
+
+def list_active_task_types(conn: psycopg2.extensions.connection) -> list[str]:
+    """Return all distinct task_types present in the learning corpus."""
+    sql = """
+        SELECT DISTINCT task_type
+        FROM learning_corpus
+        WHERE task_type IS NOT NULL
+        ORDER BY task_type
+    """
+    with conn.cursor() as cur:
+        cur.execute(sql)
+        return [row[0] for row in cur.fetchall()]
--- a/packages/fine-tuner/src/trainer.py
+++ b/packages/fine-tuner/src/trainer.py
@ -0,0 +1,310 @@
+"""
+trainer.py - LoRA / SFT fine-tuning using PEFT + TRL.
+
+Supports Apple Silicon MPS (primary) with automatic CPU fallback.
+Trains a LoRA adapter on top of Qwen2.5-Instruct using ChatML format,
+then returns training metrics for the orchestrator to evaluate and record.
+
+MPS notes (torch 2.x):
+  - device_map is NOT supported with MPS; load the full model and call
+    model.to("mps") explicitly after PEFT wrapping.
+  - gradient_checkpointing is incompatible with MPS; leave disabled.
+  - use_cache must be False during training to avoid shape conflicts.
+"""
+
+from __future__ import annotations
+
+import logging
+import os
+from pathlib import Path
+from typing import Optional
+
+import torch
+from datasets import Dataset
+from peft import LoraConfig, TaskType, get_peft_model
+from transformers import (
+    AutoModelForCausalLM,
+    AutoTokenizer,
+    BitsAndBytesConfig,
+    TrainingArguments,
+)
+from trl import SFTTrainer
+
+logger = logging.getLogger(__name__)
+
+
+# ---------------------------------------------------------------------------
+# Constants
+# ---------------------------------------------------------------------------
+
+CHATML_TEMPLATE = (
+    "<|im_start|>system\n{system}<|im_end|>\n"
+    "<|im_start|>user\n{user}<|im_end|>\n"
+    "<|im_start|>assistant\n{assistant}<|im_end|>"
+)
+
+QWEN_TARGET_MODULES = [
+    "q_proj",
+    "k_proj",
+    "v_proj",
+    "o_proj",
+    "gate_proj",
+    "up_proj",
+    "down_proj",
+]
+
+
+# ---------------------------------------------------------------------------
+# Dataset preparation
+# ---------------------------------------------------------------------------
+
+def prepare_dataset(examples: list[dict]) -> Dataset:
+    """
+    Convert learning_corpus rows to ChatML-formatted text examples.
+
+    Each example dict must have: system_prompt, input_text, output_text.
+    Rows with missing/empty fields are silently skipped.
+    """
+    formatted: list[dict] = []
+    skipped = 0
+
+    for ex in examples:
+        system = (ex.get("system_prompt") or "").strip()
+        user = (ex.get("input_text") or "").strip()
+        assistant = (ex.get("output_text") or "").strip()
+
+        if not user or not assistant:
+            skipped += 1
+            continue
+
+        if not system:
+            system = "You are a helpful assistant."
+
+        text = CHATML_TEMPLATE.format(system=system, user=user, assistant=assistant)
+        formatted.append({"text": text})
+
+    if skipped:
+        logger.warning("prepare_dataset: skipped %d rows with missing fields", skipped)
+
+    logger.info("prepare_dataset: %d examples formatted", len(formatted))
+    return Dataset.from_list(formatted)
+
+
+# ---------------------------------------------------------------------------
+# Device selection
+# ---------------------------------------------------------------------------
+
+def _select_device() -> str:
+    """Return 'mps', 'cuda', or 'cpu' depending on availability."""
+    if torch.backends.mps.is_available() and torch.backends.mps.is_built():
+        return "mps"
+    if torch.cuda.is_available():
+        return "cuda"
+    return "cpu"
+
+
+def _load_model_and_tokenizer(
+    base_model_path: str,
+    device: str,
+) -> tuple:
+    """
+    Load tokenizer and base model for LoRA training.
+
+    MPS: load in float32 (bfloat16/float16 not fully supported on MPS).
+    CPU: float32.
+    CUDA: bfloat16 with optional device_map="auto".
+    """
+    logger.info("Loading tokenizer from %s", base_model_path)
+    tokenizer = AutoTokenizer.from_pretrained(
+        base_model_path,
+        trust_remote_code=True,
+        padding_side="right",  # required for SFT with left-pad models
+    )
+    if tokenizer.pad_token is None:
+        tokenizer.pad_token = tokenizer.eos_token
+        logger.info("Set pad_token = eos_token (%s)", tokenizer.eos_token)
+
+    logger.info("Loading base model from %s on device=%s", base_model_path, device)
+
+    if device == "cuda":
+        model = AutoModelForCausalLM.from_pretrained(
+            base_model_path,
+            torch_dtype=torch.bfloat16,
+            device_map="auto",
+            trust_remote_code=True,
+        )
+    else:
+        # MPS and CPU: load in float32, move to device after PEFT wrapping
+        model = AutoModelForCausalLM.from_pretrained(
+            base_model_path,
+            torch_dtype=torch.float32,
+            trust_remote_code=True,
+        )
+
+    model.config.use_cache = False  # required for training
+    return model, tokenizer
+
+
+# ---------------------------------------------------------------------------
+# LoRA configuration
+# ---------------------------------------------------------------------------
+
+def _build_lora_config(
+    r: int = 16,
+    lora_alpha: int = 32,
+    lora_dropout: float = 0.05,
+    target_modules: Optional[list[str]] = None,
+) -> LoraConfig:
+    return LoraConfig(
+        r=r,
+        lora_alpha=lora_alpha,
+        lora_dropout=lora_dropout,
+        bias="none",
+        task_type=TaskType.CAUSAL_LM,
+        target_modules=target_modules or QWEN_TARGET_MODULES,
+        inference_mode=False,
+    )
+
+
+# ---------------------------------------------------------------------------
+# Main training entry point
+# ---------------------------------------------------------------------------
+
+def run_lora_training(
+    base_model_path: str,
+    train_examples: list[dict],
+    val_examples: list[dict],
+    output_dir: str,
+    task_type: Optional[str] = None,
+    lora_r: int = 16,
+    lora_alpha: int = 32,
+    lora_dropout: float = 0.05,
+    max_seq_length: int = 2048,
+    num_epochs: int = 3,
+    batch_size: int = 1,
+    gradient_accumulation_steps: int = 8,
+    learning_rate: float = 2e-4,
+    warmup_ratio: float = 0.1,
+) -> dict:
+    """
+    Full LoRA fine-tuning run using SFTTrainer.
+
+    Returns a metrics dict:
+    {
+        "train_loss": float,
+        "eval_loss": float,
+        "train_runtime": float,
+        "adapter_path": str,
+        "device": str,
+    }
+
+    Raises on fatal errors so the orchestrator can record failure status.
+    """
+    device = _select_device()
+    logger.info("run_lora_training: device=%s task_type=%s output_dir=%s", device, task_type, output_dir)
+
+    if len(train_examples) < 10:
+        raise ValueError(
+            f"Insufficient training data: need >= 10 examples, got {len(train_examples)}"
+        )
+
+    # Prepare datasets
+    train_dataset = prepare_dataset(train_examples)
+    eval_dataset = prepare_dataset(val_examples) if val_examples else None
+
+    if len(train_dataset) == 0:
+        raise ValueError("All training examples were invalid — dataset is empty after formatting")
+
+    # Load model
+    model, tokenizer = _load_model_and_tokenizer(base_model_path, device)
+
+    # Apply LoRA
+    lora_config = _build_lora_config(
+        r=lora_r,
+        lora_alpha=lora_alpha,
+        lora_dropout=lora_dropout,
+    )
+    model = get_peft_model(model, lora_config)
+    model.print_trainable_parameters()
+
+    # Move to device AFTER PEFT wrapping (MPS requirement)
+    if device in ("mps", "cpu"):
+        model = model.to(device)
+
+    # Training arguments
+    output_path = Path(output_dir)
+    output_path.mkdir(parents=True, exist_ok=True)
+
+    # eval_strategy requires a validation set
+    eval_strategy = "steps" if eval_dataset and len(eval_dataset) > 0 else "no"
+
+    training_args = TrainingArguments(
+        output_dir=str(output_path),
+        num_train_epochs=num_epochs,
+        per_device_train_batch_size=batch_size,
+        per_device_eval_batch_size=batch_size,
+        gradient_accumulation_steps=gradient_accumulation_steps,
+        learning_rate=learning_rate,
+        warmup_ratio=warmup_ratio,
+        eval_strategy=eval_strategy,
+        eval_steps=50 if eval_strategy == "steps" else None,
+        save_strategy="steps",
+        save_steps=100,
+        load_best_model_at_end=(eval_strategy == "steps"),
+        metric_for_best_model="eval_loss" if eval_strategy == "steps" else None,
+        greater_is_better=False,
+        logging_steps=10,
+        report_to="none",           # no WandB / HF Hub logging
+        dataloader_num_workers=0,   # MPS requires 0 (no multiprocessing)
+        fp16=False,                 # MPS does not support fp16 training
+        bf16=False,                 # MPS does not support bf16 training
+        optim="adamw_torch",        # paged_adamw_8bit requires bitsandbytes (CUDA only)
+        gradient_checkpointing=False,  # incompatible with MPS
+        remove_unused_columns=False,
+        label_names=["labels"],
+    )
+
+    # Trainer
+    trainer = SFTTrainer(
+        model=model,
+        args=training_args,
+        train_dataset=train_dataset,
+        eval_dataset=eval_dataset,
+        tokenizer=tokenizer,
+        dataset_text_field="text",
+        max_seq_length=max_seq_length,
+        packing=False,              # packing can cause issues with MPS
+    )
+
+    logger.info(
+        "Starting SFT training: %d train examples, %d val examples, %d epochs",
+        len(train_dataset),
+        len(eval_dataset) if eval_dataset else 0,
+        num_epochs,
+    )
+
+    train_result = trainer.train()
+
+    # Evaluate if possible
+    eval_metrics: dict = {}
+    if eval_dataset and len(eval_dataset) > 0:
+        eval_metrics = trainer.evaluate()
+        logger.info("Eval metrics: %s", eval_metrics)
+
+    # Save adapter (LoRA weights only — not the full model)
+    adapter_path = str(output_path / "adapter")
+    model.save_pretrained(adapter_path)
+    tokenizer.save_pretrained(adapter_path)
+    logger.info("Saved LoRA adapter to %s", adapter_path)
+
+    return {
+        "train_loss": round(train_result.training_loss, 4),
+        "eval_loss": round(eval_metrics.get("eval_loss", -1.0), 4),
+        "train_runtime": round(train_result.metrics.get("train_runtime", 0.0), 1),
+        "train_samples": len(train_dataset),
+        "val_samples": len(eval_dataset) if eval_dataset else 0,
+        "adapter_path": adapter_path,
+        "device": device,
+        "task_type": task_type,
+        "epochs": num_epochs,
+    }
--- a/packages/gateway/package.json
+++ b/packages/gateway/package.json
@ -0,0 +1,36 @@
+{
+  "name": "@llm-gateway/gateway",
+  "version": "1.0.0",
+  "type": "module",
+  "scripts": {
+    "dev": "tsx watch src/server.ts",
+    "build": "tsc",
+    "start": "node dist/server.js",
+    "test": "vitest"
+  },
+  "dependencies": {
+    "@fastify/cors": "^9.0.1",
+    "@fastify/helmet": "^11.1.1",
+    "@fastify/rate-limit": "^9.1.0",
+    "ajv": "^8.17.1",
+    "fastify": "^4.28.1",
+    "franc": "^6.2.0",
+    "js-yaml": "^4.1.0",
+    "opossum": "^8.1.3",
+    "pg": "^8.13.1",
+    "pg-boss": "^10.1.3",
+    "pino": "^9.5.0",
+    "prom-client": "^15.1.3",
+    "zod": "^3.23.8"
+  },
+  "devDependencies": {
+    "@types/js-yaml": "^4.0.9",
+    "@types/node": "^22.10.6",
+    "@types/opossum": "^8.1.9",
+    "@types/pg": "^8.11.10",
+    "pino-pretty": "^13.1.3",
+    "tsx": "^4.19.2",
+    "typescript": "^5.7.2",
+    "vitest": "^2.1.8"
+  }
+}
--- a/packages/gateway/prompts/templates/internal_ban_detect.yaml
+++ b/packages/gateway/prompts/templates/internal_ban_detect.yaml
@ -0,0 +1,50 @@
+id: internal-ban-detect
+version: "1.0.0"
+task_type: internal-ban-detect
+model_preference: "qwen2.5:14b"
+temperature: 0.2
+max_tokens: 1000
+output_format: "json"
+
+system_prompt: |
+  You analyze LLM-generated text samples to identify phrases that sound like AI-generated filler,
+  marketing speak, or buzzwords that should be banned from future outputs.
+
+  Look for:
+  - Transition phrases that add no information ("Having said that", "It's worth noting", "That being said")
+  - Marketing buzzwords ("leverage", "synergy", "cutting-edge", "state-of-the-art", "holistic", "robust")
+  - Clichéd openers ("In today's fast-paced world", "In today's digital age", "As we navigate")
+  - Clichéd closers ("In conclusion", "To summarize", "All in all", "At the end of the day")
+  - Empty intensifiers ("truly", "really", "absolutely", "certainly") used as filler
+  - Passive constructions hiding agency ("It is widely known", "It has been shown")
+  - German equivalents of all the above ("Letztendlich", "Zusammenfassend", "ganzheitlich",
+    "nachhaltig" when used as buzzword, "abschließend", "selbstverständlich")
+
+  Do NOT flag:
+  - Technical terms that happen to appear in the ban categories (e.g. "robust" in a systems context)
+  - Words that carry genuine meaning in context
+  - Short common words (< 4 characters)
+
+  Return ONLY valid JSON in this exact format:
+  {
+    "candidates": [
+      {
+        "term": "string (lowercase, the exact phrase)",
+        "language": "en" | "de" | "auto",
+        "category": "buzzword" | "filler" | "opener" | "closer" | "transition",
+        "example_context": "string (the surrounding sentence where you found it)"
+      }
+    ]
+  }
+
+  If you find no candidates, return: { "candidates": [] }
+
+user_template: |
+  Analyze these LLM output samples for AI-filler phrases and marketing buzzwords:
+
+  {{input}}
+
+  Return JSON with all identified candidates.
+
+variables:
+  - input
--- a/packages/gateway/prompts/templates/internal_prompt_improve.yaml
+++ b/packages/gateway/prompts/templates/internal_prompt_improve.yaml
@ -0,0 +1,54 @@
+id: internal-prompt-improve
+version: "1.0.0"
+task_type: internal-prompt-improve
+model_preference: "qwen2.5:32b"
+temperature: 0.4
+max_tokens: 2000
+output_format: "json"
+
+system_prompt: |
+  You are an expert prompt engineer with deep experience improving LLM system prompts.
+  Your goal is to make prompts produce consistently higher-quality, more human-sounding outputs.
+
+  You receive a JSON payload containing:
+  - current_system_prompt: The existing prompt being evaluated
+  - positive_examples: Outputs that scored >= 8.0 confidence (what we want more of)
+  - negative_examples: Outputs that scored <= 5.0 confidence (what we need to avoid)
+  - human_edits: Examples where a human corrected the output — the MOST valuable signal
+  - ban_violations: Phrases that repeatedly appeared despite being banned
+
+  Your analysis process:
+  1. Read ALL examples carefully before drawing conclusions
+  2. Identify SPECIFIC patterns in negative examples (not vague criticism)
+  3. Identify what makes positive examples succeed
+  4. Pay special attention to human_edits — they show exactly what the model gets wrong
+  5. For ban_violations: the current prompt is clearly not explicit enough about these
+
+  When writing the improved prompt:
+  - Be MORE specific, not less — vague instructions produce vague results
+  - Add explicit NEVER/DO NOT rules for patterns seen in negative examples
+  - Add explicit ALWAYS/MUST rules for patterns seen in positive examples
+  - For repeated ban violations: add them explicitly as forbidden phrases
+  - Keep the improved prompt coherent and readable (no robot-speak)
+  - The improved prompt MUST be at least as long as the current one
+
+  Return ONLY valid JSON in this exact format:
+  {
+    "analysis": {
+      "main_problems": ["specific problem 1", "specific problem 2"],
+      "main_strengths": ["strength 1", "strength 2"]
+    },
+    "improved_system_prompt": "the full improved system prompt text",
+    "changes_made": ["specific change 1", "specific change 2"],
+    "expected_improvements": ["expected improvement 1", "expected improvement 2"]
+  }
+
+user_template: |
+  Analyze this prompt and suggest improvements based on the performance data:
+
+  {{input}}
+
+  Return JSON with your analysis and the improved system prompt.
+
+variables:
+  - input
--- a/packages/gateway/prompts/templates/linkedin_post.yaml
+++ b/packages/gateway/prompts/templates/linkedin_post.yaml
@ -0,0 +1,66 @@
+id: linkedin_post
+version: "1.0.0"
+task_type: linkedin_post
+
+system_prompt: |
+  You are a professional LinkedIn content writer. Write engaging, authentic posts that sound human.
+
+  Rules:
+  - Maximum 1300 characters (LinkedIn soft limit)
+  - No hashtag spam (max 3 relevant hashtags)
+  - No engagement-bait questions at the end
+  - No "In today's fast-paced world" openings
+  - Write in first person, direct and confident tone
+  - Include a clear value point or insight
+  - Current date: {{current_date}}
+
+  {{few_shot_examples}}
+
+system_prompt_de: |
+  Du bist ein professioneller LinkedIn-Content-Writer. Schreibe authentische, menschlich klingende Beiträge.
+
+  Regeln:
+  - Maximal 1300 Zeichen (LinkedIn Soft-Limit)
+  - Keine Hashtag-Spam (max. 3 relevante Hashtags)
+  - Keine Engagement-Bait-Fragen am Ende
+  - Keine Einstiege mit "In der heutigen schnelllebigen Welt"
+  - Schreibe in der Ich-Perspektive, direkt und selbstsicher
+  - Enthalte einen klaren Mehrwert oder Einblick
+  - Aktuelles Datum: {{current_date}}
+
+user_template: |
+  Topic/Input:
+  {{input}}
+
+  Context:
+  {{user_context}}
+
+  Write a LinkedIn post:
+
+user_template_de: |
+  Thema/Eingabe:
+  {{input}}
+
+  Kontext:
+  {{user_context}}
+
+  Schreibe einen LinkedIn-Beitrag:
+
+few_shot_examples:
+  - user: "We deployed a new BGP monitoring system last week"
+    assistant: |
+      We spent 18 months dealing with BGP route leaks that cost us 4 hours of network visibility per incident.
+
+      Last week we deployed our own monitoring system. First result: caught a route leak from a transit provider within 3 minutes, before any customer noticed.
+
+      The fix wasn't sophisticated. It was systematic alerting on RPKI invalid routes combined with PeeringDB cross-referencing.
+
+      If you're running a mid-size ISP and still relying on customer tickets to discover routing issues, that's the gap worth closing.
+
+      #BGP #NetworkEngineering #ISP
+
+variables:
+  - input
+  - user_context
+  - current_date
+  - few_shot_examples
--- a/packages/gateway/prompts/templates/pre_classify.yaml
+++ b/packages/gateway/prompts/templates/pre_classify.yaml
@ -0,0 +1,62 @@
+id: pre_classify
+version: "1.0.0"
+task_type: pre_classify
+
+system_prompt: |
+  You are a task classifier for an LLM routing gateway serving multiple projects.
+  Analyze the input and classify it. Return ONLY valid JSON with this exact structure:
+  {
+    "task_type": "string",
+    "content_type": "string",
+    "language": "de|en|other",
+    "complexity": "low|medium|high",
+    "requires_facts": true|false,
+    "suggested_task_types": ["array", "of", "alternatives"]
+  }
+
+  Use these task types:
+  tip_product_description, tip_technical_summary, tip_competitor_analysis, tip_price_extraction,
+  tip_market_analysis, tip_hype_cycle, tip_faq_generation, tip_vendor_profile, tip_blog_post, tip_spec_extraction,
+  eo_member_summary, eo_meeting_notes, eo_chapter_report, eo_learning_recommendation, eo_forum_moderation,
+  eo_event_agenda, eo_travel_brief,
+  peercortex_asn_analysis, peercortex_routing_summary, peercortex_ix_report, peercortex_health_report, peercortex_rpki_analysis,
+  switchblade_incident_summary, switchblade_config_review, switchblade_peering_recommendation,
+  switchblade_blacklist_report, switchblade_rack_documentation, switchblade_csrd_report,
+  switchblade_transceiver_advisor, switchblade_bgp_policy,
+  nognet_event_description, nognet_sponsor_proposal, nognet_program_committee, nognet_recap_article,
+  ctxevent_agenda_builder, ctxevent_attendee_communication,
+  shieldx_threat_classification, shieldx_attack_analysis, shieldx_defense_recommendation,
+  shieldx_pattern_extraction, shieldx_red_team_simulate,
+  linkedin_post, linkedin_comment, linkedin_article,
+  blog_post_de, blog_post_en, newsletter_section, social_media_thread, press_release,
+  content_translation_de_en, content_translation_en_de,
+  generic_summarize, generic_extract, generic_classify, generic_rewrite, generic_qa,
+  code_review, code_generate, data_enrichment
+
+  Return ONLY the JSON object, no other text.
+
+user_template: |
+  Classify this input:
+
+  {{input}}
+
+output_schema:
+  type: object
+  required: [task_type, content_type, language, complexity, requires_facts, suggested_task_types]
+  properties:
+    task_type:
+      type: string
+    content_type:
+      type: string
+    language:
+      type: string
+      enum: [de, en, other]
+    complexity:
+      type: string
+      enum: [low, medium, high]
+    requires_facts:
+      type: boolean
+    suggested_task_types:
+      type: array
+      items:
+        type: string
--- a/packages/gateway/prompts/templates/shieldx_threat_classification.yaml
+++ b/packages/gateway/prompts/templates/shieldx_threat_classification.yaml
@ -0,0 +1,51 @@
+id: shieldx_threat_classification
+version: "1.0.0"
+task_type: shieldx_threat_classification
+
+system_prompt: |
+  You are a security classifier for the ShieldX LLM prompt injection defense system.
+  Analyze the input for prompt injection attempts, jailbreak attempts, and other LLM security threats.
+
+  Return ONLY valid JSON:
+  {
+    "threat_detected": true|false,
+    "threat_type": "prompt_injection|jailbreak|data_extraction|role_confusion|system_override|none",
+    "confidence": 0.0-1.0,
+    "attack_stage": "reconnaissance|initial_access|execution|persistence|none",
+    "mitre_atlas_technique": "string or null",
+    "explanation": "brief explanation"
+  }
+
+  Threat types:
+  - prompt_injection: Attempting to override system instructions
+  - jailbreak: Attempting to bypass safety guidelines
+  - data_extraction: Attempting to extract system prompts or training data
+  - role_confusion: Attempting to confuse the AI about its role
+  - system_override: Attempting to assume admin/system privileges
+  - none: No threat detected
+
+user_template: |
+  Analyze this input for security threats:
+
+  {{input}}
+
+output_schema:
+  type: object
+  required: [threat_detected, threat_type, confidence, attack_stage, explanation]
+  properties:
+    threat_detected:
+      type: boolean
+    threat_type:
+      type: string
+      enum: [prompt_injection, jailbreak, data_extraction, role_confusion, system_override, none]
+    confidence:
+      type: number
+      minimum: 0
+      maximum: 1
+    attack_stage:
+      type: string
+      enum: [reconnaissance, initial_access, execution, persistence, none]
+    mitre_atlas_technique:
+      type: [string, "null"]
+    explanation:
+      type: string
--- a/packages/gateway/prompts/templates/tip_blog_generator.yaml
+++ b/packages/gateway/prompts/templates/tip_blog_generator.yaml
@ -0,0 +1,147 @@
+id: tip_blog_generator
+version: "1.0.0"
+task_type: tip_blog_generator
+description: Generate technical blog posts about optical transceiver market trends for network engineers and procurement professionals
+model_preference: qwen2.5:14b
+model_minimum: qwen2.5:7b
+temperature: 0.65
+max_tokens: 4096
+output_format: markdown
+
+system_prompt: |
+  You are a technical author for the Transceiver Intelligence Platform (TIP) blog.
+  Your audience: network engineers, data center architects, and procurement professionals at ISPs, hyperscalers, and enterprises.
+
+  Voice:
+  - Technical authority — you know the specs, the standards, the market numbers
+  - Data-driven — every claim is backed by a number or a reference
+  - No marketing fluff — never "revolutionary", "game-changing", "cutting-edge"
+  - Direct — state what matters, skip what doesn't
+  - Practical — what does this mean for someone buying or deploying hardware today?
+
+  Post structure (always follow this):
+  ## TL;DR
+  - Bullet 1: key finding or number
+  - Bullet 2: key implication
+  - Bullet 3: what to do about it
+
+  ## [Main Topic Section 1]
+  Content with data, market numbers, technical specifics.
+
+  ## [Main Topic Section 2]
+  Dive deeper. Explain the technical or market dynamics.
+
+  ## [Main Topic Section 3]
+  Edge cases, caveats, what the headline misses.
+
+  ## Market Numbers
+  Concrete figures: market size, shipment volumes, price trends, technology adoption rates.
+  Cite sources where known (LightCounting, Dell'Oro, Cignal AI, vendor announcements).
+
+  ## Practical Implications
+  What does an engineer or procurement manager do with this information today?
+
+  Rules:
+  - 1500-2500 words
+  - Use real part numbers and model numbers when relevant
+  - No question at the end ("What do you think?" is forbidden)
+  - No intro like "In today's rapidly evolving networking landscape..."
+  - Start with a concrete fact or number
+  - Use markdown headers, tables where data warrants it
+  - Current date context: {{current_date}}
+
+  {{few_shot_examples}}
+
+user_template: |
+  Blog post topic: {{topic}}
+
+  Background data / market context:
+  {{market_data}}
+
+  Key points to cover:
+  {{key_points}}
+
+  Write the full blog post:
+
+few_shot_examples:
+  - user: |
+      Blog post topic: 800G QSFP-DD Market 2026 — Pricing Trends and Deployment Reality
+
+      Background data / market context:
+      - 800G QSFP-DD prices dropped from $2,400 avg in Q1 2024 to $890 avg in Q4 2025
+      - LightCounting projects 8.2M units shipped in 2026
+      - Major adopters: hyperscalers (AWS, Meta, Google) driving 70% of volume
+      - White-box/third-party share now 34% of 800G market (up from 8% in 2023)
+      - Key form factors: QSFP-DD800 (dominant), OSFP (Nvidia GPU clusters)
+
+      Key points to cover:
+      - Price trajectory and when ISPs can realistically justify 800G
+      - OSFP vs QSFP-DD split
+      - Third-party compatibility status on Arista 7800 and Cisco 8000 series
+    assistant: |
+      ## TL;DR
+      - 800G QSFP-DD prices have fallen 63% in 24 months — from $2,400 to $890 per module
+      - Hyperscalers consume 70% of 2026 volume; ISP deployment is 12-18 months behind
+      - Third-party compatibility on Arista 7800R3 is production-ready; Cisco 8000 requires IOS XR 7.9.1+
+
+      ## The Price Collapse That's Reshaping 800G Timelines
+
+      800G QSFP-DD modules cost $2,400 apiece in Q1 2024. They cost $890 in Q4 2025. That 63% decline in 24 months is not a market anomaly — it is the same curve that 400G ran from 2020 to 2022, compressed by higher manufacturing volumes and aggressive white-box competition.
+
+      LightCounting projects 8.2 million 800G units shipped in 2026. For context: the entire 400G market shipped 6.1 million units in its first full year. 800G is scaling faster, and the pricing will follow.
+
+      The inflection point for ISPs — where 800G port cost per Gbps undercuts 2x400G — is approximately Q3 2026 at current decline rates. Procurement teams planning 3-year refresh cycles should be running lab evaluations now, not waiting for list prices to drop further.
+
+      ## QSFP-DD vs OSFP: The Form Factor Split Is Real
+
+      The market is not converging on a single 800G form factor. QSFP-DD800 dominates carrier and enterprise deployments at roughly 68% of units. OSFP holds 31%, concentrated almost entirely in GPU cluster interconnects — primarily Nvidia InfiniBand and Spectrum-X deployments where the higher power ceiling (15W vs 10W for QSFP-DD) is necessary for 800G-DR8 optics over longer distances.
+
+      If you're deploying Arista 7800, Cisco 8000, or Juniper PTX10000, you are in QSFP-DD territory. If you're building GPU fabric with Nvidia Quantum-2 or Spectrum-4 switches, OSFP is your form factor.
+
+      The overlap is small. Do not let vendor marketing confuse this.
+
+      ## Third-Party Compatibility: Where Things Actually Stand
+
+      Third-party 800G modules now hold 34% market share, up from 8% in 2023. The driving factor: lead times. OEM 800G QSFP-DD modules from Cisco and Juniper carried 16-24 week lead times through most of 2024. Third-party vendors like II-VI (now Coherent), InnoLight, and Hisense shipped in 4-6 weeks.
+
+      Compatibility status as of Q1 2026:
+
+      | Platform | Third-Party Status | Notes |
+      |---|---|---|
+      | Arista 7800R3 | Production-ready | EOS 4.30.0+ required |
+      | Cisco 8201-32FH | Compatible | IOS XR 7.9.1+, no `service unsupported-transceiver` warning |
+      | Cisco 8808 | Compatible | Same IOS XR requirement |
+      | Juniper PTX10016 | Compatible | Junos 23.2R1+, DOM works |
+      | Nokia 7750 SR-s | Partial | SR OS 22.10+ needed, some DOM limitations |
+
+      "Compatible" here means tested and shipping in production networks, not just lab-validated.
+
+      ## Market Numbers
+
+      - Total 800G addressable market 2026: $7.3 billion (Dell'Oro Group estimate)
+      - Hyperscaler share of unit volume: ~70% (AWS, Meta, Google, Microsoft combined)
+      - ISP/carrier share: ~18%
+      - Enterprise: ~12%
+      - Average selling price trend: expect sub-$700 by Q4 2026 for standard 800G-DR1 QSFP-DD
+      - 800G-LR1 (10 km SMF): still at $1,200-1,800 range — coherent DSP cost not yet compressed
+
+      ## Practical Implications
+
+      For network engineers: Start your 800G lab qualification now if you're on Arista 7800 or Cisco 8000. The IOS XR version requirement is the main gotcha — check your current version before ordering. Use DOM monitoring from day one; 800G optics run hotter and the thermal headroom is tighter than 400G.
+
+      For procurement: Build vendor diversity into your 800G BOM. Single-source on OEM at 800G creates lead time risk. Third-party from Coherent or InnoLight with a compatibility verification (TIP database has 2,400+ tested entries) is a defensible procurement strategy.
+
+      The 800G market is past the early-adopter phase. It is now a procurement and logistics problem, not a technology risk problem.
+
+variables:
+  - topic
+  - market_data
+  - key_points
+  - current_date
+  - few_shot_examples
+
+validation_rules:
+  no_question_closer: true
+  word_count_min: 1500
+  word_count_max: 2500
+  output_format_check: markdown
--- a/packages/gateway/prompts/templates/tip_compatibility_parse.yaml
+++ b/packages/gateway/prompts/templates/tip_compatibility_parse.yaml
@ -0,0 +1,133 @@
+id: tip_compatibility_parse
+version: "1.0.0"
+task_type: tip_compatibility_parse
+description: Parse transceiver compatibility tables from HTML, Excel exports, or prose into normalized entries. Critical — compatibility errors directly affect customers.
+model_preference: qwen2.5:14b
+model_minimum: qwen2.5:14b
+temperature: 0.1
+max_tokens: 4096
+output_format: json
+
+system_prompt: |
+  You are a compatibility data specialist for the Transceiver Intelligence Platform (TIP).
+  Your task is to parse transceiver compatibility information from vendor compatibility tables, spreadsheets, or prose descriptions and normalize them into structured entries.
+
+  CRITICAL: Compatibility errors directly affect customer purchasing decisions. Never guess or infer compatibility. Only record what is explicitly stated.
+
+  Return ONLY valid JSON array:
+  [
+    {
+      "transceiver_part_number": "string",
+      "transceiver_vendor": "string or null",
+      "switch_vendor": "Cisco|Juniper|Arista|Nokia|Huawei|Dell|HPE|Extreme|Brocade|Ericsson|other",
+      "switch_model": "string",
+      "switch_platform": "string or null",
+      "switch_port_type": "SFP+|QSFP+|QSFP28|QSFP-DD|OSFP|null",
+      "ios_version_min": "string or null",
+      "ios_version_max": "string or null",
+      "tested": true|false,
+      "certification_level": "certified|compatible|community-tested|vendor-claimed|unknown",
+      "notes": "string or null",
+      "source_table": "string"
+    }
+  ]
+
+  Parsing rules:
+  - Each row in a compatibility table becomes one JSON entry.
+  - If a table header says "Tested with Nexus 9300/9500" — that is switch_model.
+  - "Plug and Play" or "Certified" → certification_level: "certified", tested: true
+  - "Compatible" without test evidence → certification_level: "compatible", tested: false
+  - Community reports → certification_level: "community-tested"
+  - If a row covers multiple switch models, split into separate entries.
+  - Port type: derive from transceiver form factor if not explicitly stated.
+  - Do NOT add entries for untested combinations that seem plausible.
+  - source_table: brief identifier of where this entry came from (e.g., "Cisco TMG matrix row 42").
+
+  {{few_shot_examples}}
+
+user_template: |
+  Parse compatibility data from this source:
+
+  Source type: {{source_type}}
+  Source description: {{source_description}}
+  Transceiver vendor context: {{vendor_context}}
+
+  Raw compatibility data:
+  {{compatibility_data}}
+
+  Return JSON array only:
+
+few_shot_examples:
+  - user: |
+      Source type: HTML table
+      Source description: Cisco Transceiver Module Group (TMG) compatibility page for SFP-10G-SR
+      Transceiver vendor context: Cisco OEM
+
+      Raw compatibility data:
+      <table>
+      <thead><tr><th>Transceiver</th><th>Switch Platform</th><th>Switch Model</th><th>Port</th><th>IOS Version</th><th>Status</th><th>Notes</th></tr></thead>
+      <tbody>
+      <tr><td>SFP-10G-SR</td><td>Nexus 9000</td><td>N9K-C9300-EX</td><td>SFP+</td><td>7.0(3)I7(1)+</td><td>Certified</td><td>Requires DOM enabled</td></tr>
+      <tr><td>SFP-10G-SR</td><td>Nexus 9000</td><td>N9K-C9500</td><td>SFP+</td><td>7.0(3)I4(1)+</td><td>Certified</td><td></td></tr>
+      <tr><td>SFP-10G-SR</td><td>Catalyst 9000</td><td>C9300-48P</td><td>SFP+</td><td>16.9.1+</td><td>Certified</td><td></td></tr>
+      </tbody>
+      </table>
+    assistant: |
+      [
+        {
+          "transceiver_part_number": "SFP-10G-SR",
+          "transceiver_vendor": "Cisco",
+          "switch_vendor": "Cisco",
+          "switch_model": "N9K-C9300-EX",
+          "switch_platform": "Nexus 9000",
+          "switch_port_type": "SFP+",
+          "ios_version_min": "7.0(3)I7(1)",
+          "ios_version_max": null,
+          "tested": true,
+          "certification_level": "certified",
+          "notes": "Requires DOM enabled",
+          "source_table": "Cisco TMG SFP-10G-SR compatibility matrix"
+        },
+        {
+          "transceiver_part_number": "SFP-10G-SR",
+          "transceiver_vendor": "Cisco",
+          "switch_vendor": "Cisco",
+          "switch_model": "N9K-C9500",
+          "switch_platform": "Nexus 9000",
+          "switch_port_type": "SFP+",
+          "ios_version_min": "7.0(3)I4(1)",
+          "ios_version_max": null,
+          "tested": true,
+          "certification_level": "certified",
+          "notes": null,
+          "source_table": "Cisco TMG SFP-10G-SR compatibility matrix"
+        },
+        {
+          "transceiver_part_number": "SFP-10G-SR",
+          "transceiver_vendor": "Cisco",
+          "switch_vendor": "Cisco",
+          "switch_model": "C9300-48P",
+          "switch_platform": "Catalyst 9000",
+          "switch_port_type": "SFP+",
+          "ios_version_min": "16.9.1",
+          "ios_version_max": null,
+          "tested": true,
+          "certification_level": "certified",
+          "notes": null,
+          "source_table": "Cisco TMG SFP-10G-SR compatibility matrix"
+        }
+      ]
+
+variables:
+  - source_type
+  - source_description
+  - vendor_context
+  - compatibility_data
+  - few_shot_examples
+
+validation_rules:
+  tip_validator: true
+  min_confidence: 8.5
+  required_fields: ["transceiver_part_number", "switch_vendor", "switch_model", "tested", "certification_level"]
+  output_must_be_json: true
+  note: "Critical — compatibility errors hurt customers. Only record explicitly stated compatibility."
--- a/packages/gateway/prompts/templates/tip_datasheet_extract.yaml
+++ b/packages/gateway/prompts/templates/tip_datasheet_extract.yaml
@ -0,0 +1,172 @@
+id: tip_datasheet_extract
+version: "1.0.0"
+task_type: tip_datasheet_extract
+description: Extract complete transceiver specifications from PDF datasheets converted to Markdown by Docling, including min/typ/max electrical values
+model_preference: qwen2.5:14b
+model_minimum: qwen2.5:7b
+temperature: 0.05
+max_tokens: 3000
+output_format: json
+
+system_prompt: |
+  You are a precision data extraction specialist for the Transceiver Intelligence Platform (TIP).
+  You receive transceiver datasheet content that has been converted from PDF to Markdown by Docling.
+  Your task is to extract ALL technical specifications with maximum accuracy.
+
+  Return ONLY valid JSON with this structure:
+  {
+    "part_number": "string",
+    "vendor": "string",
+    "form_factor": "SFP|SFP+|SFP28|QSFP+|QSFP28|QSFP-DD|OSFP|CFP|CFP2|CFP4|XFP|other",
+    "data_rate_gbps": number,
+    "protocol": "Ethernet|Fibre Channel|SONET/SDH|OTN|other",
+    "wavelength_nm": number or null,
+    "wavelength_channels": [numbers] or null,
+    "reach_m": number or null,
+    "connector": "LC|SC|MPO|RJ45|other",
+    "fiber_type": "SMF|MMF|DAC|AOC|null",
+    "electrical": {
+      "tx_power_min_dbm": number or null,
+      "tx_power_typ_dbm": number or null,
+      "tx_power_max_dbm": number or null,
+      "rx_sensitivity_min_dbm": number or null,
+      "rx_sensitivity_typ_dbm": number or null,
+      "rx_sensitivity_max_dbm": number or null,
+      "extinction_ratio_min_db": number or null,
+      "oma_sensitivity_dbm": number or null,
+      "center_wavelength_min_nm": number or null,
+      "center_wavelength_max_nm": number or null
+    },
+    "supply_voltage_v": number or null,
+    "supply_current_ma_max": number or null,
+    "power_consumption_w_max": number or null,
+    "temperature_case_min_c": number or null,
+    "temperature_case_max_c": number or null,
+    "temperature_operating_min_c": number,
+    "temperature_operating_max_c": number,
+    "storage_temp_min_c": number or null,
+    "storage_temp_max_c": number or null,
+    "humidity_operating_pct_max": number or null,
+    "dom_support": true|false,
+    "standards_compliance": ["string"],
+    "certifications": ["CE", "FCC", "RoHS", ...],
+    "mtbf_hours": number or null,
+    "description": "string",
+    "notes": ["important notes from the datasheet"],
+    "extraction_confidence": 1-10
+  }
+
+  Extraction rules:
+  - Tables with Min/Typ/Max columns: extract all three values separately.
+  - Docling may create markdown tables with | separators — parse these carefully.
+  - Do NOT confuse TX power with RX sensitivity.
+  - For multi-channel transceivers (LR4, CWDM4, etc.): list all channels in wavelength_channels.
+  - If a field has multiple values (per channel), use the representative or average value and note in notes[].
+  - Absolute maximum ratings are NOT the same as operating ranges — use operating specs.
+  - Always extract MTBF if listed (often in reliability section at the end).
+  - Set extraction_confidence: 10 = full datasheet with all tables intact, 7 = partial extraction, 4 = only partial specs found.
+
+  {{few_shot_examples}}
+
+user_template: |
+  Extract all specifications from this datasheet:
+
+  Part identifier (if known): {{part_hint}}
+  Vendor (if known): {{vendor_hint}}
+
+  Docling-converted Markdown:
+  {{datasheet_markdown}}
+
+  Return JSON only:
+
+few_shot_examples:
+  - user: |
+      Part identifier (if known): QSFP-100G-LR4
+      Vendor (if known): Cisco Systems
+
+      Docling-converted Markdown:
+      # QSFP-100G-LR4 Product Specification
+
+      ## General Description
+      The QSFP-100G-LR4 is a 4-lane optical transceiver module for 100 Gigabit Ethernet applications.
+      It uses four CWDM lanes at 1295.56, 1300.05, 1304.58, and 1309.14 nm.
+
+      ## Optical Specifications
+
+      | Parameter | Min | Typ | Max | Unit |
+      |-----------|-----|-----|-----|------|
+      | TX Power per Lane | -6.5 | -3.0 | 2.5 | dBm |
+      | RX Sensitivity | — | — | -12.6 | dBm |
+      | Center Wavelength Lane 1 | 1294.53 | 1295.56 | 1296.59 | nm |
+      | Extinction Ratio | 4.0 | — | — | dB |
+
+      ## Electrical / Environmental
+
+      | Parameter | Value |
+      |-----------|-------|
+      | Supply Voltage | 3.3V |
+      | Max Power Consumption | 4.0W |
+      | Operating Temperature | 0°C to 70°C |
+      | Storage Temperature | -40°C to 85°C |
+      | MTBF | 1,000,000 hours |
+
+      ## Standards
+      - IEEE 802.3ba 100GBASE-LR4
+      - SFF-8665 (QSFP28 MSA)
+      - SFF-8472 DOM
+
+      ## Certifications: CE, FCC, RoHS
+    assistant: |
+      {
+        "part_number": "QSFP-100G-LR4",
+        "vendor": "Cisco Systems",
+        "form_factor": "QSFP28",
+        "data_rate_gbps": 100,
+        "protocol": "Ethernet",
+        "wavelength_nm": 1300,
+        "wavelength_channels": [1295.56, 1300.05, 1304.58, 1309.14],
+        "reach_m": 10000,
+        "connector": "LC",
+        "fiber_type": "SMF",
+        "electrical": {
+          "tx_power_min_dbm": -6.5,
+          "tx_power_typ_dbm": -3.0,
+          "tx_power_max_dbm": 2.5,
+          "rx_sensitivity_min_dbm": null,
+          "rx_sensitivity_typ_dbm": null,
+          "rx_sensitivity_max_dbm": -12.6,
+          "extinction_ratio_min_db": 4.0,
+          "oma_sensitivity_dbm": null,
+          "center_wavelength_min_nm": 1294.53,
+          "center_wavelength_max_nm": 1296.59
+        },
+        "supply_voltage_v": 3.3,
+        "supply_current_ma_max": null,
+        "power_consumption_w_max": 4.0,
+        "temperature_case_min_c": null,
+        "temperature_case_max_c": null,
+        "temperature_operating_min_c": 0,
+        "temperature_operating_max_c": 70,
+        "storage_temp_min_c": -40,
+        "storage_temp_max_c": 85,
+        "humidity_operating_pct_max": null,
+        "dom_support": true,
+        "standards_compliance": ["IEEE 802.3ba 100GBASE-LR4", "SFF-8665", "SFF-8472"],
+        "certifications": ["CE", "FCC", "RoHS"],
+        "mtbf_hours": 1000000,
+        "description": "4-lane QSFP28 100GBASE-LR4 transceiver using CWDM lanes at 1295-1309 nm for 100 Gbps Ethernet over single-mode fiber up to 10 km.",
+        "notes": ["TX power spec is per-lane value", "RX sensitivity is aggregate 100G value"],
+        "extraction_confidence": 9
+      }
+
+variables:
+  - part_hint
+  - vendor_hint
+  - datasheet_markdown
+  - few_shot_examples
+
+validation_rules:
+  tip_validator: true
+  min_confidence: 8.0
+  required_fields: ["part_number", "form_factor", "data_rate_gbps", "electrical"]
+  output_must_be_json: true
--- a/packages/gateway/prompts/templates/tip_faq_answer.yaml
+++ b/packages/gateway/prompts/templates/tip_faq_answer.yaml
@ -0,0 +1,99 @@
+id: tip_faq_answer
+version: "1.0.0"
+task_type: tip_faq_answer
+description: Answer transceiver technical questions using TIP database context. Specific, accurate, with real part numbers and compatibility data.
+model_preference: qwen2.5:14b
+model_minimum: qwen2.5:7b
+temperature: 0.3
+max_tokens: 800
+output_format: markdown
+
+system_prompt: |
+  You are a transceiver technical expert for the Transceiver Intelligence Platform (TIP), backed by a database of 5,000+ transceivers from 351 vendors.
+  Answer technical questions about optical transceivers with precision and specificity.
+
+  Rules:
+  - Maximum 400 words
+  - Use exact part numbers (e.g., SFP-10G-SR, not "a 10G short-range module")
+  - Reference real specifications: wavelengths in nm, power in dBm, reach in meters
+  - If compatibility data is available in context, cite it explicitly
+  - State caveats clearly: firmware version requirements, DOM limitations, temperature class differences
+  - If you don't have enough context to answer with certainty, say so explicitly — do NOT guess
+  - Never say "it depends" without immediately explaining what it depends on
+  - No marketing language about products
+
+  Database context will be injected as {{tip_context}}. Use it as your primary source.
+
+  {{few_shot_examples}}
+
+user_template: |
+  TIP Database Context:
+  {{tip_context}}
+
+  Question: {{question}}
+
+  Answer (max 400 words):
+
+few_shot_examples:
+  - user: |
+      TIP Database Context:
+      SFP-10G-SR: form_factor=SFP+, wavelength=850nm, reach=300m (OM3) / 400m (OM4), fiber=MMF, DOM=yes
+      Compatibility: Cisco Nexus 9300-EX (certified, NX-OS 7.0(3)I7+), Cisco Nexus 9500 (certified), Catalyst 9300 (certified, IOS-XE 16.9+)
+      Note: FLEXOPTIX FXC-SFP10G-SR is a compatible third-party alternative at 40% lower cost, identical specs, tested on same platforms.
+
+      Question: Does SFP-10G-SR work in Cisco Nexus 9K?
+
+      Answer (max 400 words):
+    assistant: |
+      Yes — the SFP-10G-SR works in Cisco Nexus 9K switches, with two platform-specific firmware requirements.
+
+      **Nexus 9300-EX**: Requires NX-OS 7.0(3)I7 or later. The module is Cisco TMG-certified for this platform.
+
+      **Nexus 9500**: Certified from NX-OS 7.0(3)I4 onward. No additional configuration needed.
+
+      **Technical specs**: 850 nm, multimode fiber (MMF), LC duplex connector. Reach is 300 m on OM3 or 400 m on OM4. DOM is supported — you get real-time TX/RX power, temperature, and voltage readings in NX-OS.
+
+      **Third-party option**: FLEXOPTIX FXC-SFP10G-SR is tested and compatible on the same Nexus platforms at approximately 40% lower cost. Same 850 nm / MMF specs, same DOM support.
+
+      **Caveats**:
+      - If you're running NX-OS versions below 7.0(3)I4, upgrade before deploying — the module won't initialize on older firmware.
+      - For Nexus 9500 line cards specifically, verify the line card generation (N9K-X9700 vs N9K-X97xx) — both work, but double-check your specific line card in the compatibility matrix.
+      - OM1/OM2 fiber: reach drops significantly (33 m on OM1). The SFP-10G-SR is not rated for OM1.
+
+  - user: |
+      TIP Database Context:
+      QSFP-100G-CWDM4: form_factor=QSFP28, wavelengths=[1271,1291,1311,1331]nm, reach=2000m, fiber=SMF, DOM=yes
+      QSFP-100G-LR4: form_factor=QSFP28, wavelengths=[1295.56,1300.05,1304.58,1309.14]nm, reach=10000m, fiber=SMF, DOM=yes
+      Note: CWDM4 and LR4 are NOT interchangeable despite both using SMF and LC duplex.
+
+      Question: What is the difference between QSFP-100G-CWDM4 and QSFP-100G-LR4?
+
+      Answer (max 400 words):
+    assistant: |
+      Both are 100G QSFP28 modules for single-mode fiber with LC duplex connectors — but they are not interchangeable.
+
+      **QSFP-100G-CWDM4**
+      - Wavelengths: 1271, 1291, 1311, 1331 nm (4 CWDM lanes, 20 nm spacing)
+      - Max reach: 2 km on SMF
+      - Power: typically 2.5–3.5W
+      - Use case: data center interconnect at 500 m to 2 km
+
+      **QSFP-100G-LR4**
+      - Wavelengths: 1295.56, 1300.05, 1304.58, 1309.14 nm (4 LAN-WDM lanes, ~4.5 nm spacing)
+      - Max reach: 10 km on SMF
+      - Power: typically 3.5–4.5W
+      - Use case: campus or metro interconnect at 2–10 km
+
+      **Critical difference**: CWDM4 uses CWDM wavelength plan (20 nm spacing); LR4 uses LAN-WDM (4.5 nm spacing). They are optically incompatible — inserting a CWDM4 in one end and LR4 in the other will not produce a working link.
+
+      **Decision guide**: If your distance is under 2 km, CWDM4 costs less and draws less power. Beyond 2 km, LR4 is your only option in these form factors.
+
+variables:
+  - tip_context
+  - question
+  - few_shot_examples
+
+validation_rules:
+  tip_validator: true
+  min_confidence: 7.5
+  word_count_max: 400
--- a/packages/gateway/prompts/templates/tip_hype_cycle_narrative.yaml
+++ b/packages/gateway/prompts/templates/tip_hype_cycle_narrative.yaml
@ -0,0 +1,136 @@
+id: tip_hype_cycle_narrative
+version: "1.0.0"
+task_type: tip_hype_cycle_narrative
+description: Convert Bass Diffusion Model output and LightCounting market data into a readable transceiver technology market report
+model_preference: qwen2.5:14b
+model_minimum: qwen2.5:7b
+temperature: 0.5
+max_tokens: 2500
+output_format: markdown
+
+system_prompt: |
+  You are a market analyst for the Transceiver Intelligence Platform (TIP), specializing in optical transceiver technology adoption curves.
+  Your input is structured output from a Norton-Bass Diffusion Model combined with LightCounting market data.
+  Convert this quantitative data into a readable, structured market report for network engineers and technology strategists.
+
+  Report structure (always follow this):
+  ## Technology Adoption Phase
+  State clearly: Peak of Inflated Expectations / Trough of Disillusionment / Slope of Enlightenment / Plateau of Productivity
+  Justify with the diffusion model parameters.
+
+  ## Market Trajectory
+  Current adoption rate, peak adoption projection, time-to-mainstream estimate.
+  Use the Bass model p/q/m parameters to derive these narratives.
+
+  ## Adoption Drivers
+  What is accelerating adoption? Be specific — cite hyperscaler deployments, standards ratification dates, price thresholds.
+
+  ## Inhibitors and Risks
+  What is slowing adoption? Supply chain constraints, firmware maturity, ecosystem fragmentation.
+
+  ## Price Trajectory
+  Current ASP, historical trend, projected 18-month trajectory based on provided data.
+
+  ## Market Size Data
+  Unit shipments, revenue, vendor share if provided. Always cite LightCounting or other sources.
+
+  ## Recommendation for Buyers
+  When to deploy: now, 6 months, 12 months, or wait. Based on data, not opinion.
+
+  Rules:
+  - 800-1200 words
+  - No hype — if technology is overhyped, say so with the evidence
+  - Cite data points explicitly: "LightCounting Q4 2025 report projects..."
+  - Always quantify the Bass model interpretation: "p=0.003, q=0.38 indicates strong word-of-mouth adoption..."
+  - Current date: {{current_date}}
+
+user_template: |
+  Technology: {{technology_name}}
+
+  Bass Diffusion Model Parameters:
+  {{bass_model_output}}
+
+  LightCounting Market Data:
+  {{lightcounting_data}}
+
+  Additional market context:
+  {{market_context}}
+
+  Generate market report:
+
+few_shot_examples:
+  - user: |
+      Technology: 400G QSFP-DD
+
+      Bass Diffusion Model Parameters:
+      p (innovation coefficient): 0.002
+      q (imitation coefficient): 0.41
+      m (market potential, units): 45,000,000
+      current_adopters: 18,200,000
+      adoption_rate_current: 3.8% quarterly growth
+      peak_adoption_quarter: Q2 2025 (historical)
+      time_to_90pct_market: reached Q3 2026 (projected)
+
+      LightCounting Market Data:
+      Q3 2025 shipments: 4.2M units
+      2025 full year revenue: $4.1B
+      ASP trend: $340 (Q1 2024) → $220 (Q3 2025) → $165 projected (Q4 2026)
+      Top vendors: InnoLight 22%, II-VI/Coherent 19%, Hisense 14%, Cisco OEM 11%
+
+      Additional market context:
+      Hyperscaler deployments driving volume. Arista 7800 and Cisco 8000 fully compatible.
+      Third-party share grew from 15% (2023) to 41% (2025).
+    assistant: |
+      ## Technology Adoption Phase: Plateau of Productivity
+
+      400G QSFP-DD has cleared the Slope of Enlightenment and entered the Plateau of Productivity. The Bass model parameters confirm this: p=0.002 (low innovation-driven adoption — no longer early-adopter territory) and q=0.41 (high imitation coefficient — the majority are buying because their peers already did). With 18.2 million units deployed against a total addressable market of 45 million, adoption stands at 40% — past the early majority inflection but with substantial remaining volume to capture.
+
+      The peak adoption quarter was Q2 2025, meaning deployment velocity is now decelerating but volume remains high. This is the typical shape of a mature technology in mainstream deployment.
+
+      ## Market Trajectory
+
+      Q3 2025 shipments of 4.2 million units represent the sustained plateau phase. The Bass model projects 90% market penetration by Q3 2026, meaning the technology is two years from saturation. After that, the remaining market is replacement cycles and new capacity additions, not greenfield adoption.
+
+      Annual revenue at $4.1 billion (2025) will decline as ASP compresses — projected to $3.1 billion in 2027 despite higher unit volumes. This is classic commodity trajectory.
+
+      ## Adoption Drivers
+
+      Three factors sustain current velocity:
+      1. **Hyperscaler capacity expansion**: AWS, Meta, and Google are driving approximately 65% of unit volume, adding 400G ports in AI/GPU cluster deployments where 100G is the bottleneck.
+      2. **Price threshold breach**: At $220 ASP (Q3 2025), 400G is now at cost parity with 2x100G on a per-Gbps basis for most ISP deployments. The economic case is closed.
+      3. **Third-party ecosystem maturity**: Third-party vendors now hold 41% share, providing lead time alternatives (4-6 weeks vs 16-20 weeks for OEM) that removed the supply chain barrier.
+
+      ## Inhibitors and Risks
+
+      The main remaining inhibitor is **firmware fragmentation on the long tail of switch platforms**. Major platforms (Arista 7800, Cisco 8000, Juniper PTX10000) are fully compatible. But mid-market platforms — Extreme Networks, Brocade, older Cisco Catalyst generations — have partial compatibility or require platform-specific firmware updates that create friction.
+
+      ## Price Trajectory
+
+      ASP: $340 (Q1 2024) → $220 (Q3 2025) → $165 projected (Q4 2026).
+
+      The decline rate is approximately 10-12% per quarter, consistent with 100G pricing from 2020-2022. Expect sub-$150 by mid-2027. Procurement teams locking in large volume contracts today at $200-220 are getting fair market pricing — there is no benefit to waiting 12 months unless you can defer deployment entirely.
+
+      ## Market Size Data
+
+      - 2025 shipments: ~16 million units (full year estimate)
+      - 2025 revenue: $4.1B (LightCounting)
+      - Vendor share: InnoLight 22%, Coherent 19%, Hisense 14%, Cisco OEM 11%, others 34%
+      - Third-party share: 41% (up from 15% in 2023)
+
+      ## Recommendation for Buyers
+
+      **Deploy now.** 400G QSFP-DD is fully mainstream. Compatibility is resolved on all major platforms. Price is at or near the economic crossover with 2x100G. Waiting 12 months saves approximately $55 per module — roughly 25% — but delays operational benefits by 12 months and risks running into lead time crunches if market demand spikes.
+
+      Exception: if your platform is not on the major compatibility list (Arista, Cisco 8000, Juniper PTX), verify compatibility before ordering at scale.
+
+variables:
+  - technology_name
+  - bass_model_output
+  - lightcounting_data
+  - market_context
+  - current_date
+
+validation_rules:
+  word_count_min: 800
+  word_count_max: 1200
+  output_format_check: markdown
--- a/packages/gateway/prompts/templates/tip_market_analysis.yaml
+++ b/packages/gateway/prompts/templates/tip_market_analysis.yaml
@ -0,0 +1,163 @@
+id: tip_market_analysis
+version: "1.0.0"
+task_type: tip_market_analysis
+description: Analyze optical transceiver market data and produce structured market intelligence for a given segment or form factor
+model_preference: qwen2.5:14b
+model_minimum: qwen2.5:7b
+temperature: 0.4
+max_tokens: 3000
+output_format: json
+
+system_prompt: |
+  You are a market intelligence analyst for the Transceiver Intelligence Platform (TIP).
+  Analyze transceiver market data and produce structured market intelligence.
+
+  Return ONLY valid JSON:
+  {
+    "segment": "string — e.g. '100G QSFP28 SMF' or '800G QSFP-DD'",
+    "analysis_date": "YYYY-MM-DD",
+    "market_size_usd": number or null,
+    "market_size_confidence": "high|medium|low",
+    "unit_shipments_annual": number or null,
+    "asp_current_usd": number or null,
+    "asp_trend_12m_pct": number or null,
+    "adoption_phase": "emerging|early_adoption|early_majority|late_majority|mature|declining",
+    "trends": [
+      {
+        "trend": "string",
+        "direction": "positive|negative|neutral",
+        "impact": "high|medium|low",
+        "timeframe": "now|6m|12m|24m"
+      }
+    ],
+    "growth_drivers": ["string"],
+    "risks": [
+      {
+        "risk": "string",
+        "probability": "high|medium|low",
+        "impact": "high|medium|low"
+      }
+    ],
+    "technology_phases": {
+      "current_dominant": "string — e.g. 100G CWDM4",
+      "emerging": "string — e.g. 400G QSFP-DD",
+      "declining": "string or null"
+    },
+    "vendor_landscape": {
+      "oem_dominant_vendors": ["string"],
+      "third_party_share_pct": number or null,
+      "market_concentration": "concentrated|moderate|fragmented"
+    },
+    "buyer_recommendation": "string — when and how to buy"
+  }
+
+  Analysis rules:
+  - Base all analysis on the provided data, not general knowledge
+  - When data is insufficient for a field, use null with a note in buyer_recommendation
+  - asp_trend_12m_pct: positive = prices rising, negative = prices falling
+  - adoption_phase: use Bass diffusion model framing where possible
+  - risks: include supply chain, geopolitical, technology, and ecosystem risks
+
+user_template: |
+  Market segment: {{segment}}
+
+  Input data:
+  {{market_data}}
+
+  Additional context:
+  {{context}}
+
+  Produce market analysis:
+
+few_shot_examples:
+  - user: |
+      Market segment: 400G QSFP-DD Multimode (SR4/SR8)
+
+      Input data:
+      ASP Q4 2025: $180
+      ASP Q4 2024: $240
+      Estimated annual units 2025: 3.2M
+      Vendors active: InnoLight, II-VI/Coherent, Lumentum, FS.com, Cisco OEM, Juniper OEM
+      Third-party share: 38%
+      Key use case: intra-data-center, hyperscaler GPU cluster interconnects
+      Compatibility: Broadly supported on Arista 7050X4, Cisco Nexus 9300-GX, Nvidia Quantum-2
+
+      Additional context:
+      Q1 2026 early data shows demand acceleration from AI cluster buildouts.
+      NVIDIA H200/B200 GPU nodes require 400G connectivity between compute racks.
+      Intel Gaudi 3 clusters also deploying 400G SR8 specifically.
+    assistant: |
+      {
+        "segment": "400G QSFP-DD Multimode (SR4/SR8)",
+        "analysis_date": "2026-01-15",
+        "market_size_usd": 576000000,
+        "market_size_confidence": "medium",
+        "unit_shipments_annual": 3200000,
+        "asp_current_usd": 180,
+        "asp_trend_12m_pct": -25,
+        "adoption_phase": "early_majority",
+        "trends": [
+          {
+            "trend": "AI GPU cluster deployments accelerating 400G SR8 demand",
+            "direction": "positive",
+            "impact": "high",
+            "timeframe": "now"
+          },
+          {
+            "trend": "ASP declining at 25% annual rate due to third-party competition",
+            "direction": "neutral",
+            "impact": "medium",
+            "timeframe": "12m"
+          },
+          {
+            "trend": "800G beginning to compete for new hyperscaler capacity",
+            "direction": "negative",
+            "impact": "low",
+            "timeframe": "12m"
+          }
+        ],
+        "growth_drivers": [
+          "NVIDIA H200/B200 GPU cluster deployments requiring 400G SR4/SR8 per node",
+          "Intel Gaudi 3 AI accelerator clusters specifying SR8",
+          "Cost parity with 2x200G now achieved",
+          "Broad platform compatibility resolved on tier-1 switches"
+        ],
+        "risks": [
+          {
+            "risk": "800G QSFP-DD OSFP capturing new AI cluster orders from 400G",
+            "probability": "medium",
+            "impact": "medium"
+          },
+          {
+            "risk": "Geopolitical supply chain risk from concentration in Chinese manufacturing",
+            "probability": "low",
+            "impact": "high"
+          },
+          {
+            "risk": "Quality differentiation between certified and uncertified third-party modules",
+            "probability": "medium",
+            "impact": "medium"
+          }
+        ],
+        "technology_phases": {
+          "current_dominant": "400G QSFP-DD SR4 (850nm, MMF, MPO-12)",
+          "emerging": "800G OSFP SR8 for high-density GPU clusters",
+          "declining": "100G QSFP28 SR4 in hyperscaler new builds"
+        },
+        "vendor_landscape": {
+          "oem_dominant_vendors": ["InnoLight", "II-VI/Coherent", "Lumentum"],
+          "third_party_share_pct": 38,
+          "market_concentration": "moderate"
+        },
+        "buyer_recommendation": "Deploy now for AI cluster interconnects — compatibility is mature on Arista 7050X4 and Cisco Nexus 9300-GX. Buy from third-party (InnoLight, Hisense) with verified compatibility for 35-40% savings vs OEM. Lock volume pricing at current $180 ASP — further declines expected but AI demand may create short-term shortage in H1 2026."
+      }
+
+variables:
+  - segment
+  - market_data
+  - context
+  - few_shot_examples
+
+validation_rules:
+  output_must_be_json: true
+  required_fields: ["segment", "adoption_phase", "trends", "growth_drivers", "risks", "buyer_recommendation"]
--- a/packages/gateway/prompts/templates/tip_price_anomaly.yaml
+++ b/packages/gateway/prompts/templates/tip_price_anomaly.yaml
@ -0,0 +1,111 @@
+id: tip_price_anomaly
+version: "1.0.0"
+task_type: tip_price_anomaly
+description: Classify and explain price anomalies detected in transceiver pricing data across vendors
+model_preference: qwen2.5:14b
+model_minimum: qwen2.5:7b
+temperature: 0.2
+max_tokens: 1024
+output_format: json
+
+system_prompt: |
+  You are a pricing analyst for the Transceiver Intelligence Platform (TIP), which monitors transceiver prices across 60+ vendors in real-time.
+  Your task is to classify and explain detected price anomalies.
+
+  Return ONLY valid JSON:
+  {
+    "anomaly_type": "price_spike|price_drop|outlier_high|outlier_low|market_divergence|lead_time_premium|shortage_signal|dumping_signal|data_error",
+    "severity": "critical|high|medium|low|informational",
+    "likely_cause": "string — specific explanation of the probable cause",
+    "confidence": 1-10,
+    "market_context": "string — what the broader market is doing",
+    "affected_vendors": ["list of vendors showing the anomaly"],
+    "baseline_price_usd": number or null,
+    "anomaly_price_usd": number or null,
+    "deviation_pct": number or null,
+    "recommendation": {
+      "action": "monitor|investigate|alert_buyer|alert_procurement|flag_for_review|ignore",
+      "details": "string — specific next steps"
+    },
+    "auto_ticket": true|false
+  }
+
+  Anomaly types:
+  - price_spike: >25% above 30-day moving average, single vendor or market-wide
+  - price_drop: >20% below 30-day MA — could be clearance, shortage end, or new competitor
+  - outlier_high: single vendor 40%+ above market median
+  - outlier_low: single vendor 30%+ below market median — possible quality concern
+  - market_divergence: OEM and compatible-third-party prices moving in opposite directions
+  - lead_time_premium: price spike correlated with extended lead times
+  - shortage_signal: multiple vendors simultaneously raising prices >15%
+  - dumping_signal: new vendor pricing 50%+ below market without explanation
+  - data_error: impossible value (negative price, price in wrong currency, duplicate entry)
+
+  Severity:
+  - critical: affects procurement decisions, possible shortage, >40% deviation
+  - high: significant market movement, >25% deviation
+  - medium: notable anomaly, 15-25% deviation
+  - low: minor anomaly, <15% deviation, single vendor
+  - informational: within normal variance but worth noting
+
+  auto_ticket: true only for critical or high severity.
+
+  {{few_shot_examples}}
+
+user_template: |
+  Part number: {{part_number}}
+  Form factor: {{form_factor}}
+
+  Price data:
+  {{price_data}}
+
+  Market context:
+  {{market_context}}
+
+  Classify this anomaly:
+
+few_shot_examples:
+  - user: |
+      Part number: QSFP-100G-LR4
+      Form factor: QSFP28
+
+      Price data:
+      30-day moving average: $285
+      Current market prices:
+        fs.com: $280
+        FiberStore: $278
+        Optcore: $277
+        FLEXOPTIX: $290
+        Cisco OEM: $1,840
+      Anomaly: New vendor "ShenzenOptic" listed at $89
+
+      Market context:
+      No known shortage. Q4 2025, market stable. 400G LR4 is mature technology.
+    assistant: |
+      {
+        "anomaly_type": "dumping_signal",
+        "severity": "high",
+        "likely_cause": "New vendor ShenzenOptic pricing at $89 vs market median of $280 (68% below market) suggests possible gray market inventory, uncertified product without DOM, or bait-and-switch pricing. Could also indicate a data entry error in currency (e.g., CNY instead of USD).",
+        "confidence": 8,
+        "market_context": "QSFP-100G-LR4 is a mature, commoditized product. Market is stable at $275-295 for quality third-party modules. No shortage signals. The $89 price is technically possible only for modules without DOM or with reduced temperature range specs.",
+        "affected_vendors": ["ShenzenOptic"],
+        "baseline_price_usd": 285,
+        "anomaly_price_usd": 89,
+        "deviation_pct": -68.8,
+        "recommendation": {
+          "action": "investigate",
+          "details": "Verify: (1) Is price in USD or CNY (89 CNY = ~$12, which rules out legitimate module). (2) Does the listing include DOM support and temperature class? (3) Check vendor reputation — new vendor with no track record. Flag to procurement team before any purchase. Do not display this price to end-users without verification badge."
+        },
+        "auto_ticket": true
+      }
+
+variables:
+  - part_number
+  - form_factor
+  - price_data
+  - market_context
+  - few_shot_examples
+
+validation_rules:
+  output_must_be_json: true
+  required_fields: ["anomaly_type", "severity", "likely_cause", "recommendation", "auto_ticket"]
--- a/packages/gateway/prompts/templates/tip_product_description.yaml
+++ b/packages/gateway/prompts/templates/tip_product_description.yaml
@ -0,0 +1,62 @@
+id: tip_product_description
+version: "1.0.0"
+task_type: tip_product_description
+
+system_prompt: |
+  You are a technical writer specializing in optical transceiver products for the Transceiver Intelligence Platform (TIP).
+  Write precise, factual product descriptions based on the provided specifications.
+
+  Rules:
+  - Use exact technical terminology (form factors, data rates, wavelengths)
+  - Do not invent specifications not present in the source data
+  - Keep descriptions between 100-200 words
+  - Focus on: form factor, data rate, wavelength, reach, application
+  - Current date: {{current_date}}
+
+  {{few_shot_examples}}
+
+system_prompt_de: |
+  Du bist ein technischer Redakteur für optische Transceiver-Produkte bei der Transceiver Intelligence Platform (TIP).
+  Schreibe präzise, sachliche Produktbeschreibungen auf Basis der bereitgestellten Spezifikationen.
+
+  Regeln:
+  - Verwende exakte technische Terminologie (Formfaktoren, Datenraten, Wellenlängen)
+  - Erfinde keine Spezifikationen, die nicht in den Quelldaten vorhanden sind
+  - Halte Beschreibungen zwischen 100-200 Wörtern
+  - Schwerpunkt: Formfaktor, Datenrate, Wellenlänge, Reichweite, Anwendung
+  - Aktuelles Datum: {{current_date}}
+
+user_template: |
+  Product specifications:
+  {{source_data}}
+
+  Additional context:
+  {{user_context}}
+
+  Write a technical product description:
+
+user_template_de: |
+  Produktspezifikationen:
+  {{source_data}}
+
+  Zusätzlicher Kontext:
+  {{user_context}}
+
+  Schreibe eine technische Produktbeschreibung:
+
+few_shot_examples:
+  - user: |
+      form_factor: QSFP28
+      data_rate: 100G
+      wavelength: 1310nm
+      reach: 10km
+      connector: LC
+      fiber: SMF
+    assistant: |
+      The QSFP28 100GBASE-LR4 transceiver delivers 100 Gbps Ethernet connectivity over single-mode fiber (SMF) at a reach of up to 10 km. Operating on four CWDM lanes at 1295, 1300, 1305, and 1310 nm, it uses LC duplex connectors and implements IEEE 802.3ba 100GBASE-LR4. The module operates within a standard 0°C to 70°C temperature range and draws a maximum power of 3.5W. Suitable for data center interconnect, enterprise core networks, and carrier aggregation applications requiring 100 Gbps over medium-distance single-mode fiber links.
+
+variables:
+  - source_data
+  - user_context
+  - current_date
+  - few_shot_examples
--- a/packages/gateway/prompts/templates/tip_transceiver_enrich.yaml
+++ b/packages/gateway/prompts/templates/tip_transceiver_enrich.yaml
@ -0,0 +1,115 @@
+id: tip_transceiver_enrich
+version: "1.0.0"
+task_type: tip_transceiver_enrich
+description: Extract structured transceiver specs from scraped HTML or raw JSON product pages into normalized TIP database format
+model_preference: qwen2.5:14b
+model_minimum: qwen2.5:7b
+temperature: 0.1
+max_tokens: 2048
+output_format: json
+
+system_prompt: |
+  You are a data extraction specialist for the Transceiver Intelligence Platform (TIP), a database of 5000+ optical transceivers from 351 vendors.
+  Your task is to extract structured transceiver specifications from raw HTML, JSON, or text scraped from vendor product pages.
+
+  Return ONLY valid JSON with this exact structure:
+  {
+    "part_number": "string",
+    "form_factor": "SFP|SFP+|SFP28|QSFP+|QSFP28|QSFP-DD|OSFP|CFP|CFP2|CFP4|CXP|DWDM-SFP|XFP|X2|XENPAK|other",
+    "data_rate_gbps": number,
+    "wavelength_nm": number or null,
+    "wavelength_channels": ["list if CWDM/DWDM, e.g. 1295,1300,1305,1310"] or null,
+    "reach_m": number or null,
+    "connector": "LC|SC|MPO|RJ45|DAC|AOC|other",
+    "fiber_type": "SMF|MMF|DAC|AOC|copper|null",
+    "tx_power_min_dbm": number or null,
+    "tx_power_max_dbm": number or null,
+    "rx_sensitivity_dbm": number or null,
+    "temperature_min_c": number,
+    "temperature_max_c": number,
+    "power_consumption_w": number or null,
+    "vendor_compatibility": ["Cisco", "Juniper", "Arista", ...],
+    "dom_support": true|false,
+    "standards_compliance": ["IEEE 802.3ae", "SFF-8472", ...],
+    "description": "1-2 sentence technical description",
+    "extraction_confidence": 1-10
+  }
+
+  Rules:
+  - Extract ONLY what is explicitly stated. Do NOT infer or guess specs.
+  - If a value is not present in the source, set it to null.
+  - Normalize units: always meters for reach, dBm for power, Celsius for temperature, Gbps for data rate.
+  - For CWDM/DWDM transceivers, list all wavelength channels in wavelength_channels.
+  - DOM support: true only if explicitly stated. false if absent.
+  - Set extraction_confidence: 10 = complete spec sheet, 7 = partial, 4 = very sparse.
+  - part_number must be the exact manufacturer part number, not a model name.
+
+  {{few_shot_examples}}
+
+user_template: |
+  Extract transceiver specifications from this source:
+
+  Source URL: {{source_url}}
+  Source type: {{source_type}}
+
+  Raw content:
+  {{source_data}}
+
+  Return JSON only:
+
+few_shot_examples:
+  - user: |
+      Source URL: https://www.fs.com/products/11318.html
+      Source type: HTML product page
+
+      Raw content:
+      <h1>10GBASE-LR SFP+ Transceiver Module, 1310nm SMF, 10km, LC Duplex, DOM</h1>
+      <table>
+      <tr><td>Part Number</td><td>SFP-10G-LR</td></tr>
+      <tr><td>Data Rate</td><td>10Gbps</td></tr>
+      <tr><td>Wavelength</td><td>1310nm</td></tr>
+      <tr><td>Cable Type</td><td>SMF (OS2)</td></tr>
+      <tr><td>Max Distance</td><td>10km</td></tr>
+      <tr><td>TX Power</td><td>-8.2 to 0.5dBm</td></tr>
+      <tr><td>RX Sensitivity</td><td>-14.4dBm</td></tr>
+      <tr><td>Temperature Range</td><td>0°C to 70°C</td></tr>
+      <tr><td>Power Consumption</td><td>1.0W (max)</td></tr>
+      <tr><td>Connector</td><td>LC Duplex</td></tr>
+      <tr><td>DDM/DOM Support</td><td>Yes</td></tr>
+      <tr><td>Standards</td><td>IEEE 802.3ae, SFF-8431, SFF-8472</td></tr>
+      </table>
+      Compatible with: Cisco, Juniper, Arista, Brocade, Dell
+    assistant: |
+      {
+        "part_number": "SFP-10G-LR",
+        "form_factor": "SFP+",
+        "data_rate_gbps": 10,
+        "wavelength_nm": 1310,
+        "wavelength_channels": null,
+        "reach_m": 10000,
+        "connector": "LC",
+        "fiber_type": "SMF",
+        "tx_power_min_dbm": -8.2,
+        "tx_power_max_dbm": 0.5,
+        "rx_sensitivity_dbm": -14.4,
+        "temperature_min_c": 0,
+        "temperature_max_c": 70,
+        "power_consumption_w": 1.0,
+        "vendor_compatibility": ["Cisco", "Juniper", "Arista", "Brocade", "Dell"],
+        "dom_support": true,
+        "standards_compliance": ["IEEE 802.3ae", "SFF-8431", "SFF-8472"],
+        "description": "10GBASE-LR SFP+ transceiver for 10 Gbps Ethernet over single-mode fiber at distances up to 10 km, operating at 1310 nm with LC duplex connectors and DOM support.",
+        "extraction_confidence": 9
+      }
+
+variables:
+  - source_url
+  - source_type
+  - source_data
+  - few_shot_examples
+
+validation_rules:
+  tip_validator: true
+  min_confidence: 7.0
+  required_fields: ["part_number", "form_factor", "data_rate_gbps", "connector"]
+  output_must_be_json: true
--- a/packages/gateway/prompts/templates/tip_vendor_classify.yaml
+++ b/packages/gateway/prompts/templates/tip_vendor_classify.yaml
@ -0,0 +1,92 @@
+id: tip_vendor_classify
+version: "1.0.0"
+task_type: tip_vendor_classify
+description: Classify transceiver vendor type and tier from name, website, and description for TIP database vendor profiles
+model_preference: qwen2.5:7b
+model_minimum: qwen2.5:3b
+temperature: 0.1
+max_tokens: 512
+output_format: json
+
+system_prompt: |
+  You are a vendor classification specialist for the Transceiver Intelligence Platform (TIP), which tracks 351 transceiver vendors globally.
+  Classify vendors based on their name, website, and any available description.
+
+  Return ONLY valid JSON:
+  {
+    "vendor_type": "OEM|white-label|generic|certified-compatible|ODM|reseller",
+    "tier": "tier1|tier2|tier3|unknown",
+    "primary_market": ["hyperscaler", "carrier", "enterprise", "ISP", "IXP", "reseller", "OEM-supply"],
+    "geography": "US|EU|CN|TW|JP|KR|other|unknown",
+    "product_focus": ["SFP+", "QSFP28", "QSFP-DD", "coherent", "active-cables", "passive-cables", "all"],
+    "brand_confidence": 1-10,
+    "notes": "string or null"
+  }
+
+  Vendor type definitions:
+  - OEM: Original Equipment Manufacturer — designs and manufactures modules (Cisco, Juniper, Arista brand optics). Price premium 200-400% over market.
+  - certified-compatible: Third-party manufacturer selling FLEXOPTIX-style programmed modules, compatible with major OEM platforms. Verified via coding. (FLEXOPTIX, FiberStore, Optcore, Accelink)
+  - ODM: Original Design Manufacturer — makes modules for others to brand (InnoLight, II-VI/Coherent, Lumentum, Hisense)
+  - white-label: Resells ODM product under own brand without own manufacturing (most small "optics" companies)
+  - generic: Unbranded or low-cost modules, no clear OEM coding, quality varies
+  - reseller: Distributes other vendors' products, no own manufacturing
+
+  Tier definitions:
+  - tier1: Major market player, >$500M revenue, global presence (InnoLight, II-VI/Coherent, Lumentum, Oclaro/Lumentum, Cisco OEM, Juniper OEM)
+  - tier2: Established player, $50-500M revenue, regional presence (Optcore, FiberStore/fs.com, FLEXOPTIX, Accelink)
+  - tier3: Small/niche player, <$50M revenue, limited product range
+  - unknown: Cannot determine from available data
+
+  {{few_shot_examples}}
+
+user_template: |
+  Vendor name: {{vendor_name}}
+  Website: {{vendor_website}}
+  Description: {{vendor_description}}
+  Known products: {{known_products}}
+
+  Classify this vendor:
+
+few_shot_examples:
+  - user: |
+      Vendor name: FLEXOPTIX GmbH
+      Website: https://www.flexoptix.net
+      Description: German company specializing in programmable transceivers. Offers in-house programming service to match any vendor's OEM coding. Known for compatibility guarantee with Cisco, Juniper, Arista, Nokia, Huawei.
+      Known products: SFP+, SFP28, QSFP+, QSFP28, QSFP-DD, coherent modules
+    assistant: |
+      {
+        "vendor_type": "certified-compatible",
+        "tier": "tier2",
+        "primary_market": ["ISP", "carrier", "enterprise", "IXP"],
+        "geography": "EU",
+        "product_focus": ["SFP+", "QSFP28", "QSFP-DD", "coherent"],
+        "brand_confidence": 9,
+        "notes": "FLEXOPTIX is the leading European certified-compatible vendor. Their FLEXBOX programming service is a key differentiator. Modules are sourced from ODMs (primarily InnoLight, II-VI) and programmed/certified in-house."
+      }
+
+  - user: |
+      Vendor name: InnoLight Technology
+      Website: https://www.innolight.com
+      Description: Leading optical transceiver manufacturer with own chip design and manufacturing. Supplies OEM to Cisco, Arista, HPE. Direct sales via distributor network.
+      Known products: 100G, 400G, 800G modules, active optical cables
+    assistant: |
+      {
+        "vendor_type": "ODM",
+        "tier": "tier1",
+        "primary_market": ["hyperscaler", "carrier", "OEM-supply"],
+        "geography": "CN",
+        "product_focus": ["QSFP28", "QSFP-DD", "active-cables"],
+        "brand_confidence": 10,
+        "notes": "InnoLight is one of the top 3 global transceiver ODMs. They manufacture for Cisco, Arista, and other OEMs while also selling direct. Approximately 22% market share in 400G QSFP-DD as of 2025."
+      }
+
+variables:
+  - vendor_name
+  - vendor_website
+  - vendor_description
+  - known_products
+  - few_shot_examples
+
+validation_rules:
+  output_must_be_json: true
+  required_fields: ["vendor_type", "tier", "primary_market", "geography"]
--- a/packages/gateway/src/banlists/auto-detected.ts
+++ b/packages/gateway/src/banlists/auto-detected.ts
@ -0,0 +1,63 @@
+// Auto-detected ban list — language-agnostic patterns that indicate LLM output
+// These are detected regardless of content language
+
+export interface AutoDetectedEntry {
+  term: string;
+  category: 'structural' | 'ai_pattern' | 'formatting';
+  wholeWord: boolean;
+  isRegex: boolean;
+}
+
+export const AUTO_DETECTED_BANLIST: AutoDetectedEntry[] = [
+  // Structural AI patterns
+  { term: 'In conclusion,', category: 'structural', wholeWord: false, isRegex: false },
+  { term: 'To summarize,', category: 'structural', wholeWord: false, isRegex: false },
+  { term: 'In summary,', category: 'structural', wholeWord: false, isRegex: false },
+  { term: 'Overall,', category: 'structural', wholeWord: false, isRegex: false },
+  { term: 'Firstly,', category: 'structural', wholeWord: false, isRegex: false },
+  { term: 'Secondly,', category: 'structural', wholeWord: false, isRegex: false },
+  { term: 'Thirdly,', category: 'structural', wholeWord: false, isRegex: false },
+  { term: 'Furthermore,', category: 'structural', wholeWord: false, isRegex: false },
+  { term: 'Moreover,', category: 'structural', wholeWord: false, isRegex: false },
+  { term: 'Additionally,', category: 'structural', wholeWord: false, isRegex: false },
+  { term: 'Notably,', category: 'structural', wholeWord: false, isRegex: false },
+  { term: 'Importantly,', category: 'structural', wholeWord: false, isRegex: false },
+  { term: 'Interestingly,', category: 'structural', wholeWord: false, isRegex: false },
+
+  // AI self-referential patterns
+  { term: 'as an AI language model', category: 'ai_pattern', wholeWord: false, isRegex: false },
+  { term: 'I\'m an AI', category: 'ai_pattern', wholeWord: false, isRegex: false },
+  { term: 'I am an AI', category: 'ai_pattern', wholeWord: false, isRegex: false },
+  { term: 'my training data', category: 'ai_pattern', wholeWord: false, isRegex: false },
+  { term: 'my knowledge cutoff', category: 'ai_pattern', wholeWord: false, isRegex: false },
+  { term: 'I don\'t have access to real-time', category: 'ai_pattern', wholeWord: false, isRegex: false },
+  { term: 'I cannot browse the internet', category: 'ai_pattern', wholeWord: false, isRegex: false },
+
+  // Formatting anti-patterns (often sign of AI over-structuring)
+  { term: '**Important:**', category: 'formatting', wholeWord: false, isRegex: false },
+  { term: '**Note:**', category: 'formatting', wholeWord: false, isRegex: false },
+  { term: '**Key takeaway:**', category: 'formatting', wholeWord: false, isRegex: false },
+  { term: '**Bottom line:**', category: 'formatting', wholeWord: false, isRegex: false },
+  { term: '**TL;DR:**', category: 'formatting', wholeWord: false, isRegex: false },
+
+  // Closing questions (unwanted in most content)
+  { term: 'What do you think?', category: 'ai_pattern', wholeWord: false, isRegex: false },
+  { term: 'What are your thoughts?', category: 'ai_pattern', wholeWord: false, isRegex: false },
+  { term: 'Let me know in the comments', category: 'ai_pattern', wholeWord: false, isRegex: false },
+  { term: 'Feel free to reach out', category: 'ai_pattern', wholeWord: false, isRegex: false },
+  { term: 'Drop a comment below', category: 'ai_pattern', wholeWord: false, isRegex: false },
+  { term: 'Share your thoughts', category: 'ai_pattern', wholeWord: false, isRegex: false },
+  { term: 'I\'d love to hear from you', category: 'ai_pattern', wholeWord: false, isRegex: false },
+  { term: 'Follow for more', category: 'ai_pattern', wholeWord: false, isRegex: false },
+  { term: 'Like and share', category: 'ai_pattern', wholeWord: false, isRegex: false },
+  { term: 'Don\'t forget to', category: 'ai_pattern', wholeWord: false, isRegex: false },
+
+  // German equivalents
+  { term: 'Wie seht ihr das?', category: 'ai_pattern', wholeWord: false, isRegex: false },
+  { term: 'Was denkt ihr?', category: 'ai_pattern', wholeWord: false, isRegex: false },
+  { term: 'Schreibt es in die Kommentare', category: 'ai_pattern', wholeWord: false, isRegex: false },
+  { term: 'Teilt eure Gedanken', category: 'ai_pattern', wholeWord: false, isRegex: false },
+  { term: 'Folgt mir für mehr', category: 'ai_pattern', wholeWord: false, isRegex: false },
+  { term: 'Schreibt mir gerne', category: 'ai_pattern', wholeWord: false, isRegex: false },
+  { term: 'Ich freue mich auf eure', category: 'ai_pattern', wholeWord: false, isRegex: false },
+];
--- a/packages/gateway/src/banlists/de.ts
+++ b/packages/gateway/src/banlists/de.ts
@ -0,0 +1,94 @@
+// German ban list — Marketing-Sprache, KI-Erkennungszeichen, Klischees
+// Category tags: 'marketing' | 'ai_tell' | 'cliche' | 'filler'
+
+export interface BanEntryDe {
+  term: string;
+  category: 'marketing' | 'ai_tell' | 'cliche' | 'filler';
+  wholeWord: boolean;
+}
+
+export const DE_BANLIST: BanEntryDe[] = [
+  // Marketing-Buzzwords
+  { term: 'zukunftsweisend', category: 'marketing', wholeWord: true },
+  { term: 'wegweisend', category: 'marketing', wholeWord: true },
+  { term: 'revolutionär', category: 'marketing', wholeWord: true },
+  { term: 'innovativ', category: 'marketing', wholeWord: true },
+  { term: 'nachhaltig', category: 'marketing', wholeWord: true },
+  { term: 'ganzheitlich', category: 'marketing', wholeWord: true },
+  { term: 'synergetisch', category: 'marketing', wholeWord: true },
+  { term: 'Synergie', category: 'marketing', wholeWord: true },
+  { term: 'Synergien', category: 'marketing', wholeWord: true },
+  { term: 'disruptiv', category: 'marketing', wholeWord: true },
+  { term: 'bahnbrechend', category: 'marketing', wholeWord: true },
+  { term: 'Ökosystem', category: 'marketing', wholeWord: true },
+  { term: 'Mehrwert schaffen', category: 'marketing', wholeWord: false },
+  { term: 'Mehrwert bieten', category: 'marketing', wholeWord: false },
+  { term: 'state of the art', category: 'marketing', wholeWord: false },
+  { term: 'Best Practices', category: 'marketing', wholeWord: false },
+  { term: 'Thought Leadership', category: 'marketing', wholeWord: false },
+  { term: 'nahtlos', category: 'marketing', wholeWord: true },
+  { term: 'skalierbar', category: 'marketing', wholeWord: true },
+  { term: 'robust', category: 'marketing', wholeWord: true },
+  { term: 'transformativ', category: 'marketing', wholeWord: true },
+  { term: 'ermächtigen', category: 'marketing', wholeWord: true },
+  { term: 'Paradigmenwechsel', category: 'marketing', wholeWord: true },
+  { term: 'Wettbewerbsvorteil', category: 'marketing', wholeWord: true },
+  { term: 'Alleinstellungsmerkmal', category: 'marketing', wholeWord: true },
+  { term: 'digitale Transformation', category: 'marketing', wholeWord: false },
+  { term: 'Digitalisierung vorantreiben', category: 'marketing', wholeWord: false },
+  { term: 'fit für die Zukunft', category: 'marketing', wholeWord: false },
+  { term: 'zukunftsfähig', category: 'marketing', wholeWord: true },
+  { term: 'agil', category: 'marketing', wholeWord: true },
+  { term: 'New Work', category: 'marketing', wholeWord: false },
+  { term: 'Out of the Box', category: 'marketing', wholeWord: false },
+
+  // KI-Erkennungszeichen
+  { term: 'Als KI', category: 'ai_tell', wholeWord: false },
+  { term: 'Als Sprachmodell', category: 'ai_tell', wholeWord: false },
+  { term: 'Ich kann keine', category: 'ai_tell', wholeWord: false },
+  { term: 'Es ist zu beachten', category: 'ai_tell', wholeWord: false },
+  { term: 'Es sei darauf hingewiesen', category: 'ai_tell', wholeWord: false },
+  { term: 'Es ist erwähnenswert', category: 'ai_tell', wholeWord: false },
+  { term: 'Es sei angemerkt', category: 'ai_tell', wholeWord: false },
+  { term: 'Es sei erwähnt', category: 'ai_tell', wholeWord: false },
+  { term: 'Lassen Sie uns', category: 'ai_tell', wholeWord: false },
+  { term: 'Tauchen wir ein', category: 'ai_tell', wholeWord: false },
+  { term: 'Zunächst einmal', category: 'ai_tell', wholeWord: false },
+  { term: 'Nicht zuletzt', category: 'ai_tell', wholeWord: false },
+  { term: 'einerseits… andererseits', category: 'ai_tell', wholeWord: false },
+
+  // Klischees
+  { term: 'Zum Schluss', category: 'cliche', wholeWord: false },
+  { term: 'Zusammenfassend', category: 'cliche', wholeWord: true },
+  { term: 'Zusammenfassend lässt sich sagen', category: 'cliche', wholeWord: false },
+  { term: 'Abschließend', category: 'cliche', wholeWord: true },
+  { term: 'Abschließend lässt sich festhalten', category: 'cliche', wholeWord: false },
+  { term: 'Im heutigen schnelllebigen', category: 'cliche', wholeWord: false },
+  { term: 'In der heutigen Zeit', category: 'cliche', wholeWord: false },
+  { term: 'In der modernen Welt', category: 'cliche', wholeWord: false },
+  { term: 'im Zeitalter der', category: 'cliche', wholeWord: false },
+  { term: 'Im Kern geht es', category: 'cliche', wholeWord: false },
+  { term: 'auf den Punkt gebracht', category: 'cliche', wholeWord: false },
+  { term: 'auf den Punkt', category: 'cliche', wholeWord: false },
+  { term: 'die Reise', category: 'cliche', wholeWord: false },
+  { term: 'Reise beginnt', category: 'cliche', wholeWord: false },
+
+  // Füllwörter / Floskel
+  { term: 'nicht vergessen', category: 'filler', wholeWord: false },
+  { term: 'im Endeffekt', category: 'filler', wholeWord: false },
+  { term: 'letztendlich', category: 'filler', wholeWord: true },
+  { term: 'letztlich', category: 'filler', wholeWord: true },
+  { term: 'ganz klar', category: 'filler', wholeWord: false },
+  { term: 'auf jeden Fall', category: 'filler', wholeWord: false },
+  { term: 'definitiv', category: 'filler', wholeWord: true },
+  { term: 'selbstverständlich', category: 'filler', wholeWord: true },
+  { term: 'natürlich', category: 'filler', wholeWord: true },
+  { term: 'offensichtlich', category: 'filler', wholeWord: true },
+  { term: 'grundsätzlich', category: 'filler', wholeWord: true },
+  { term: 'im Grunde genommen', category: 'filler', wholeWord: false },
+  { term: 'ohne Frage', category: 'filler', wholeWord: false },
+  { term: 'zweifellos', category: 'filler', wholeWord: true },
+  { term: 'zweifelsohne', category: 'filler', wholeWord: true },
+];
+
+export const DE_TERMS_SET: Set<string> = new Set(DE_BANLIST.map((e) => e.term.toLowerCase()));
--- a/packages/gateway/src/banlists/en.ts
+++ b/packages/gateway/src/banlists/en.ts
@ -0,0 +1,106 @@
+// English ban list — marketing speak, AI clichés, and overused phrases
+// Category tags: 'marketing' | 'ai_tell' | 'cliche' | 'filler'
+
+export interface BanEntry {
+  term: string;
+  category: 'marketing' | 'ai_tell' | 'cliche' | 'filler';
+  wholeWord: boolean;
+}
+
+export const EN_BANLIST: BanEntry[] = [
+  // Marketing buzzwords
+  { term: 'leverage', category: 'marketing', wholeWord: true },
+  { term: 'cutting-edge', category: 'marketing', wholeWord: false },
+  { term: 'innovative', category: 'marketing', wholeWord: true },
+  { term: 'game-changer', category: 'marketing', wholeWord: false },
+  { term: 'game changer', category: 'marketing', wholeWord: false },
+  { term: 'disruptive', category: 'marketing', wholeWord: true },
+  { term: 'synergy', category: 'marketing', wholeWord: true },
+  { term: 'synergies', category: 'marketing', wholeWord: true },
+  { term: 'paradigm shift', category: 'marketing', wholeWord: false },
+  { term: 'holistic', category: 'marketing', wholeWord: true },
+  { term: 'seamless', category: 'marketing', wholeWord: true },
+  { term: 'robust', category: 'marketing', wholeWord: true },
+  { term: 'scalable', category: 'marketing', wholeWord: true },
+  { term: 'best-in-class', category: 'marketing', wholeWord: false },
+  { term: 'world-class', category: 'marketing', wholeWord: false },
+  { term: 'transformative', category: 'marketing', wholeWord: true },
+  { term: 'empower', category: 'marketing', wholeWord: true },
+  { term: 'empowers', category: 'marketing', wholeWord: true },
+  { term: 'empowering', category: 'marketing', wholeWord: true },
+  { term: 'unlock', category: 'marketing', wholeWord: true },
+  { term: 'unlocks', category: 'marketing', wholeWord: true },
+  { term: 'unlocking', category: 'marketing', wholeWord: true },
+  { term: 'reimagine', category: 'marketing', wholeWord: true },
+  { term: 'revolutionize', category: 'marketing', wholeWord: true },
+  { term: 'revolutionizing', category: 'marketing', wholeWord: true },
+  { term: 'elevate', category: 'marketing', wholeWord: true },
+  { term: 'streamline', category: 'marketing', wholeWord: true },
+  { term: 'harness', category: 'marketing', wholeWord: true },
+  { term: 'ecosystem', category: 'marketing', wholeWord: true },
+  { term: 'next-generation', category: 'marketing', wholeWord: false },
+  { term: 'next generation', category: 'marketing', wholeWord: false },
+  { term: 'state-of-the-art', category: 'marketing', wholeWord: false },
+  { term: 'state of the art', category: 'marketing', wholeWord: false },
+  { term: 'best practices', category: 'marketing', wholeWord: false },
+  { term: 'thought leader', category: 'marketing', wholeWord: false },
+  { term: 'thought leadership', category: 'marketing', wholeWord: false },
+  { term: 'value proposition', category: 'marketing', wholeWord: false },
+  { term: 'competitive advantage', category: 'marketing', wholeWord: false },
+  { term: 'bleeding edge', category: 'marketing', wholeWord: false },
+  { term: 'move the needle', category: 'marketing', wholeWord: false },
+  { term: 'low-hanging fruit', category: 'marketing', wholeWord: false },
+  { term: 'circle back', category: 'marketing', wholeWord: false },
+
+  // AI tell-tales
+  { term: 'delve', category: 'ai_tell', wholeWord: true },
+  { term: 'delves', category: 'ai_tell', wholeWord: true },
+  { term: 'delving', category: 'ai_tell', wholeWord: true },
+  { term: 'crucial', category: 'ai_tell', wholeWord: true },
+  { term: 'vital', category: 'ai_tell', wholeWord: true },
+  { term: 'it\'s worth noting', category: 'ai_tell', wholeWord: false },
+  { term: 'it is worth noting', category: 'ai_tell', wholeWord: false },
+  { term: 'having said that', category: 'ai_tell', wholeWord: false },
+  { term: 'at the end of the day', category: 'ai_tell', wholeWord: false },
+  { term: 'dive into', category: 'ai_tell', wholeWord: false },
+  { term: 'dive deep', category: 'ai_tell', wholeWord: false },
+  { term: 'let\'s explore', category: 'ai_tell', wholeWord: false },
+  { term: "let's unpack", category: 'ai_tell', wholeWord: false },
+  { term: 'it\'s important to note', category: 'ai_tell', wholeWord: false },
+  { term: 'it is important to note', category: 'ai_tell', wholeWord: false },
+  { term: 'first and foremost', category: 'ai_tell', wholeWord: false },
+  { term: 'last but not least', category: 'ai_tell', wholeWord: false },
+  { term: 'as an AI', category: 'ai_tell', wholeWord: false },
+  { term: 'as a language model', category: 'ai_tell', wholeWord: false },
+  { term: 'I cannot provide', category: 'ai_tell', wholeWord: false },
+  { term: 'I\'m unable to', category: 'ai_tell', wholeWord: false },
+
+  // Clichés
+  { term: 'journey', category: 'cliche', wholeWord: true },
+  { term: 'In today\'s fast-paced', category: 'cliche', wholeWord: false },
+  { term: 'In today\'s rapidly evolving', category: 'cliche', wholeWord: false },
+  { term: 'As we navigate', category: 'cliche', wholeWord: false },
+  { term: 'In conclusion', category: 'cliche', wholeWord: false },
+  { term: 'To summarize', category: 'cliche', wholeWord: false },
+  { term: 'In summary', category: 'cliche', wholeWord: false },
+  { term: 'The bottom line', category: 'cliche', wholeWord: false },
+  { term: 'At its core', category: 'cliche', wholeWord: false },
+  { term: 'At the forefront', category: 'cliche', wholeWord: false },
+  { term: 'In the realm of', category: 'cliche', wholeWord: false },
+  { term: 'In the ever-changing', category: 'cliche', wholeWord: false },
+  { term: 'the landscape of', category: 'cliche', wholeWord: false },
+
+  // Filler
+  { term: 'simply put', category: 'filler', wholeWord: false },
+  { term: 'needless to say', category: 'filler', wholeWord: false },
+  { term: 'of course', category: 'filler', wholeWord: false },
+  { term: 'obviously', category: 'filler', wholeWord: true },
+  { term: 'clearly', category: 'filler', wholeWord: true },
+  { term: 'certainly', category: 'filler', wholeWord: true },
+  { term: 'absolutely', category: 'filler', wholeWord: true },
+  { term: 'undoubtedly', category: 'filler', wholeWord: true },
+  { term: 'essentially', category: 'filler', wholeWord: true },
+  { term: 'basically', category: 'filler', wholeWord: true },
+];
+
+export const EN_TERMS_SET: Set<string> = new Set(EN_BANLIST.map((e) => e.term.toLowerCase()));
--- a/packages/gateway/src/banlists/sync-from-gitea.ts
+++ b/packages/gateway/src/banlists/sync-from-gitea.ts
@ -0,0 +1,113 @@
+// Sync ban list additions from Gitea CSV
+// CSV format: term,category,language,wholeWord
+// URL: http://gitea.context-x.org/rene/llm-gateway/raw/branch/main/banlists/
+
+import { logger } from '../observability/logger.js';
+
+const GITEA_BASE =
+  'http://gitea.context-x.org/rene/llm-gateway/raw/branch/main/banlists/';
+
+export interface GiteaBanEntry {
+  term: string;
+  category: string;
+  language: 'en' | 'de' | 'auto';
+  wholeWord: boolean;
+}
+
+let syncedEntries: GiteaBanEntry[] = [];
+let lastSyncAt: Date | null = null;
+const SYNC_INTERVAL_MS = 30 * 60 * 1000; // 30 minutes
+
+function parseCSV(raw: string): GiteaBanEntry[] {
+  const lines = raw.split('\n').filter((l) => l.trim() && !l.startsWith('#'));
+  const entries: GiteaBanEntry[] = [];
+
+  for (const line of lines) {
+    const parts = line.split(',');
+    if (parts.length < 4) continue;
+
+    const term = (parts[0] ?? '').trim().replace(/^"|"$/g, '');
+    const category = (parts[1] ?? '').trim();
+    const language = (parts[2] ?? '').trim() as 'en' | 'de' | 'auto';
+    const wholeWord = (parts[3] ?? '').trim().toLowerCase() === 'true';
+
+    if (term && ['en', 'de', 'auto'].includes(language)) {
+      entries.push({ term, category, language, wholeWord });
+    }
+  }
+
+  return entries;
+}
+
+async function fetchCsv(filename: string): Promise<string> {
+  const url = `${GITEA_BASE}${filename}`;
+  const controller = new AbortController();
+  const timer = setTimeout(() => controller.abort(), 10_000);
+
+  try {
+    const response = await fetch(url, {
+      signal: controller.signal,
+      headers: { Accept: 'text/plain' },
+    });
+    if (!response.ok) {
+      throw new Error(`HTTP ${response.status} from Gitea`);
+    }
+    return await response.text();
+  } finally {
+    clearTimeout(timer);
+  }
+}
+
+export async function syncBanlistsFromGitea(): Promise<GiteaBanEntry[]> {
+  const now = new Date();
+  if (lastSyncAt && now.getTime() - lastSyncAt.getTime() < SYNC_INTERVAL_MS) {
+    return syncedEntries;
+  }
+
+  try {
+    const [enCsv, deCsv, autoCsv] = await Promise.allSettled([
+      fetchCsv('en-additions.csv'),
+      fetchCsv('de-additions.csv'),
+      fetchCsv('auto-additions.csv'),
+    ]);
+
+    const entries: GiteaBanEntry[] = [];
+
+    if (enCsv.status === 'fulfilled') {
+      entries.push(...parseCSV(enCsv.value));
+    } else {
+      logger.warn({ reason: enCsv.reason }, 'Failed to fetch en-additions.csv from Gitea');
+    }
+
+    if (deCsv.status === 'fulfilled') {
+      entries.push(...parseCSV(deCsv.value));
+    } else {
+      logger.warn({ reason: deCsv.reason }, 'Failed to fetch de-additions.csv from Gitea');
+    }
+
+    if (autoCsv.status === 'fulfilled') {
+      entries.push(...parseCSV(autoCsv.value));
+    } else {
+      logger.warn({ reason: autoCsv.reason }, 'Failed to fetch auto-additions.csv from Gitea');
+    }
+
+    syncedEntries = entries;
+    lastSyncAt = now;
+    logger.info({ count: entries.length }, 'Ban list synced from Gitea');
+  } catch (err) {
+    logger.error({ err }, 'Failed to sync ban lists from Gitea');
+  }
+
+  return syncedEntries;
+}
+
+export function getGiteaEntries(): GiteaBanEntry[] {
+  return syncedEntries;
+}
+
+// Trigger background sync without blocking
+export function triggerBackgroundSync(): void {
+  syncBanlistsFromGitea().catch((err) => {
+    logger.warn({ err }, 'Background ban list sync failed');
+  });
+}
--- a/packages/gateway/src/circuit-breaker/ollama-breaker.ts
+++ b/packages/gateway/src/circuit-breaker/ollama-breaker.ts
@ -0,0 +1,90 @@
+import CircuitBreaker from 'opossum';
+import { logger } from '../observability/logger.js';
+import { recordCircuitBreakerState } from '../observability/metrics.js';
+
+export type ModelTier = 'fast' | 'medium' | 'large';
+
+interface TierOptions {
+  timeout: number;
+  errorThresholdPercentage: number;
+  resetTimeout: number;
+}
+
+const TIER_OPTIONS: Record<ModelTier, TierOptions> = {
+  fast: {
+    timeout: 10_000,
+    errorThresholdPercentage: 50,
+    resetTimeout: 15_000,
+  },
+  medium: {
+    timeout: 30_000,
+    errorThresholdPercentage: 50,
+    resetTimeout: 20_000,
+  },
+  large: {
+    timeout: 120_000,
+    errorThresholdPercentage: 30,
+    resetTimeout: 45_000,
+  },
+};
+
+// eslint-disable-next-line @typescript-eslint/no-explicit-any
+const breakerRegistry = new Map<string, CircuitBreaker<any[], any>>();
+
+type AsyncFn<A extends unknown[], R> = (...args: A) => Promise<R>;
+
+export function getBreaker<A extends unknown[], R>(
+  model: string,
+  tier: ModelTier,
+  fn: AsyncFn<A, R>,
+): CircuitBreaker<A, R> {
+  const existing = breakerRegistry.get(model) as CircuitBreaker<A, R> | undefined;
+  if (existing) return existing;
+
+  const opts = TIER_OPTIONS[tier] ?? TIER_OPTIONS['medium'];
+  const breaker = new CircuitBreaker(fn, {
+    timeout: opts.timeout,
+    errorThresholdPercentage: opts.errorThresholdPercentage,
+    resetTimeout: opts.resetTimeout,
+    volumeThreshold: 3,
+    name: `ollama-${model}`,
+  });
+
+  breaker.on('open', () => {
+    logger.warn({ model, tier }, 'Circuit breaker opened');
+    recordCircuitBreakerState(model, 'open');
+  });
+
+  breaker.on('halfOpen', () => {
+    logger.info({ model, tier }, 'Circuit breaker half-open');
+    recordCircuitBreakerState(model, 'half-open');
+  });
+
+  breaker.on('close', () => {
+    logger.info({ model, tier }, 'Circuit breaker closed');
+    recordCircuitBreakerState(model, 'closed');
+  });
+
+  breaker.on('fallback', (result) => {
+    logger.warn({ model, result }, 'Circuit breaker fallback triggered');
+  });
+
+  breakerRegistry.set(model, breaker as CircuitBreaker<unknown[], unknown>);
+  return breaker;
+}
+
+export function getBreakerState(model: string): 'closed' | 'open' | 'half-open' {
+  const breaker = breakerRegistry.get(model);
+  if (!breaker) return 'closed';
+  if (breaker.opened) return 'open';
+  if (breaker.halfOpen) return 'half-open';
+  return 'closed';
+}
+
+export function getAllBreakerStates(): Record<string, 'closed' | 'open' | 'half-open'> {
+  const states: Record<string, 'closed' | 'open' | 'half-open'> = {};
+  for (const [model] of breakerRegistry) {
+    states[model] = getBreakerState(model);
+  }
+  return states;
+}
--- a/packages/gateway/src/config/models.yaml
+++ b/packages/gateway/src/config/models.yaml
@ -0,0 +1,88 @@
+# LLM Gateway Model Configuration
+# Ollama base URL: http://192.168.178.169:11434
+
+ollama_base_url: "http://192.168.178.169:11434"
+
+tiers:
+  fast:
+    timeout_ms: 10000
+    error_threshold_percent: 50
+    circuit_breaker_reset_ms: 15000
+  medium:
+    timeout_ms: 30000
+    error_threshold_percent: 50
+    circuit_breaker_reset_ms: 20000
+  large:
+    timeout_ms: 120000
+    error_threshold_percent: 30
+    circuit_breaker_reset_ms: 45000
+
+models:
+  # Fast tier
+  qwen2.5:3b:
+    tier: fast
+    context_length: 32768
+    strengths: [classification, short_text, routing]
+    max_tokens_default: 512
+
+  phi3.5:3.8b:
+    tier: fast
+    context_length: 128000
+    strengths: [classification, summarization]
+    max_tokens_default: 512
+
+  # Medium tier
+  qwen2.5:14b:
+    tier: medium
+    context_length: 131072
+    strengths: [general, writing, analysis, coding]
+    max_tokens_default: 2048
+
+  mistral:7b:
+    tier: medium
+    context_length: 32768
+    strengths: [general, writing]
+    max_tokens_default: 2048
+
+  llama3.2:8b:
+    tier: medium
+    context_length: 128000
+    strengths: [general, chat, analysis]
+    max_tokens_default: 2048
+
+  deepseek-r1:8b:
+    tier: medium
+    context_length: 65536
+    strengths: [reasoning, analysis, coding]
+    max_tokens_default: 2048
+
+  # Large tier
+  qwen2.5:32b:
+    tier: large
+    context_length: 131072
+    strengths: [complex_writing, deep_analysis, technical]
+    max_tokens_default: 4096
+
+  llama3.3:70b:
+    tier: large
+    context_length: 128000
+    strengths: [complex_reasoning, long_form, research]
+    max_tokens_default: 4096
+
+  deepseek-r1:32b:
+    tier: large
+    context_length: 131072
+    strengths: [chain_of_thought, complex_reasoning]
+    max_tokens_default: 4096
+
+# Fallback chains per tier
+fallback_chains:
+  fast: [qwen2.5:3b, phi3.5:3.8b]
+  medium: [qwen2.5:14b, mistral:7b, llama3.2:8b]
+  large: [qwen2.5:32b, llama3.3:70b, deepseek-r1:32b]
+
+# Cross-tier fallback when primary tier fails
+tier_fallback:
+  large: medium
+  medium: fast
+  fast: null
--- a/packages/gateway/src/config/routing-rules.yaml
+++ b/packages/gateway/src/config/routing-rules.yaml
@ -0,0 +1,704 @@
+# LLM Gateway Routing Rules
+# Maps task_type → model + prompt template + validation config
+
+routing_rules:
+
+  # ─── PRE-CLASSIFICATION ────────────────────────────────────────────────────
+  pre_classify:
+    model: qwen2.5:3b
+    tier: fast
+    prompt_template: pre_classify
+    temperature: 0.1
+    max_tokens: 256
+    output_format: json
+    requires_fact_check: false
+    validators: []
+    callers: [all]
+
+  # ─── TIP: TRANSCEIVER INTELLIGENCE PLATFORM ────────────────────────────────
+  tip_product_description:
+    model: qwen2.5:14b
+    tier: medium
+    prompt_template: tip_product_description
+    temperature: 0.3
+    max_tokens: 1024
+    output_format: text
+    requires_fact_check: false
+    validators: [schema, banlist, language, tip_validator, length]
+    callers: [tip-scraper, internal]
+
+  tip_technical_summary:
+    model: qwen2.5:14b
+    tier: medium
+    prompt_template: tip_technical_summary
+    temperature: 0.2
+    max_tokens: 512
+    output_format: json
+    requires_fact_check: true
+    validators: [schema, tip_validator, length]
+    callers: [tip-scraper, internal]
+
+  tip_competitor_analysis:
+    model: qwen2.5:32b
+    tier: large
+    prompt_template: tip_competitor_analysis
+    temperature: 0.4
+    max_tokens: 2048
+    output_format: text
+    requires_fact_check: false
+    validators: [banlist, tip_validator, length]
+    callers: [tip-scraper, internal, n8n]
+
+  tip_price_extraction:
+    model: qwen2.5:14b
+    tier: medium
+    prompt_template: tip_price_extraction
+    temperature: 0.0
+    max_tokens: 256
+    output_format: json
+    requires_fact_check: false
+    validators: [schema, tip_validator]
+    callers: [tip-scraper, internal]
+
+  tip_market_analysis:
+    model: qwen2.5:32b
+    tier: large
+    prompt_template: tip_market_analysis
+    temperature: 0.5
+    max_tokens: 3072
+    output_format: text
+    requires_fact_check: false
+    validators: [banlist, length]
+    callers: [tip-scraper, internal, n8n]
+
+  tip_hype_cycle:
+    model: deepseek-r1:32b
+    tier: large
+    prompt_template: tip_hype_cycle
+    temperature: 0.3
+    max_tokens: 2048
+    output_format: json
+    requires_fact_check: false
+    validators: [schema, tip_validator, length]
+    callers: [tip-scraper, internal]
+
+  tip_faq_generation:
+    model: qwen2.5:14b
+    tier: medium
+    prompt_template: tip_faq_generation
+    temperature: 0.4
+    max_tokens: 1024
+    output_format: json
+    requires_fact_check: false
+    validators: [schema, banlist, tip_validator]
+    callers: [tip-scraper, internal]
+
+  tip_vendor_profile:
+    model: qwen2.5:14b
+    tier: medium
+    prompt_template: tip_vendor_profile
+    temperature: 0.3
+    max_tokens: 1024
+    output_format: text
+    requires_fact_check: false
+    validators: [banlist, tip_validator, length]
+    callers: [tip-scraper, internal]
+
+  tip_blog_post:
+    model: qwen2.5:32b
+    tier: large
+    prompt_template: tip_blog_post
+    temperature: 0.6
+    max_tokens: 3072
+    output_format: text
+    requires_fact_check: false
+    validators: [banlist, language, length]
+    callers: [tip-scraper, internal, n8n]
+
+  tip_spec_extraction:
+    model: qwen2.5:14b
+    tier: medium
+    prompt_template: tip_spec_extraction
+    temperature: 0.1
+    max_tokens: 512
+    output_format: json
+    requires_fact_check: false
+    validators: [schema, tip_validator]
+    callers: [tip-scraper, internal]
+
+  # ─── EO GLOBAL PULSE ────────────────────────────────────────────────────────
+  eo_member_summary:
+    model: qwen2.5:14b
+    tier: medium
+    prompt_template: eo_member_summary
+    temperature: 0.4
+    max_tokens: 1024
+    output_format: text
+    requires_fact_check: false
+    validators: [banlist, language, length]
+    callers: [eo-global-pulse, internal]
+
+  eo_meeting_notes:
+    model: qwen2.5:14b
+    tier: medium
+    prompt_template: eo_meeting_notes
+    temperature: 0.3
+    max_tokens: 2048
+    output_format: text
+    requires_fact_check: false
+    validators: [banlist, length]
+    callers: [eo-global-pulse, internal]
+
+  eo_chapter_report:
+    model: qwen2.5:32b
+    tier: large
+    prompt_template: eo_chapter_report
+    temperature: 0.4
+    max_tokens: 3072
+    output_format: text
+    requires_fact_check: false
+    validators: [banlist, language, length]
+    callers: [eo-global-pulse, internal]
+
+  eo_learning_recommendation:
+    model: qwen2.5:14b
+    tier: medium
+    prompt_template: eo_learning_recommendation
+    temperature: 0.5
+    max_tokens: 1024
+    output_format: json
+    requires_fact_check: false
+    validators: [schema, banlist]
+    callers: [eo-global-pulse, internal]
+
+  eo_forum_moderation:
+    model: qwen2.5:14b
+    tier: medium
+    prompt_template: eo_forum_moderation
+    temperature: 0.2
+    max_tokens: 256
+    output_format: json
+    requires_fact_check: false
+    validators: [schema, banlist]
+    callers: [eo-global-pulse, internal]
+
+  eo_event_agenda:
+    model: qwen2.5:14b
+    tier: medium
+    prompt_template: eo_event_agenda
+    temperature: 0.5
+    max_tokens: 2048
+    output_format: text
+    requires_fact_check: false
+    validators: [banlist, length]
+    callers: [eo-global-pulse, internal]
+
+  eo_travel_brief:
+    model: qwen2.5:14b
+    tier: medium
+    prompt_template: eo_travel_brief
+    temperature: 0.4
+    max_tokens: 1024
+    output_format: text
+    requires_fact_check: false
+    validators: [banlist, language]
+    callers: [eo-global-pulse, internal]
+
+  # ─── PEERCORTEX ──────────────────────────────────────────────────────────────
+  peercortex_asn_analysis:
+    model: qwen2.5:14b
+    tier: medium
+    prompt_template: peercortex_asn_analysis
+    temperature: 0.2
+    max_tokens: 1024
+    output_format: text
+    requires_fact_check: true
+    validators: [banlist, fact_checker, length]
+    callers: [peercortex, internal]
+
+  peercortex_routing_summary:
+    model: qwen2.5:14b
+    tier: medium
+    prompt_template: peercortex_routing_summary
+    temperature: 0.3
+    max_tokens: 1024
+    output_format: text
+    requires_fact_check: true
+    validators: [fact_checker, length]
+    callers: [peercortex, internal]
+
+  peercortex_ix_report:
+    model: qwen2.5:14b
+    tier: medium
+    prompt_template: peercortex_ix_report
+    temperature: 0.3
+    max_tokens: 1024
+    output_format: text
+    requires_fact_check: true
+    validators: [fact_checker, length]
+    callers: [peercortex, internal]
+
+  peercortex_health_report:
+    model: qwen2.5:14b
+    tier: medium
+    prompt_template: peercortex_health_report
+    temperature: 0.2
+    max_tokens: 2048
+    output_format: json
+    requires_fact_check: false
+    validators: [schema, length]
+    callers: [peercortex, internal]
+
+  peercortex_rpki_analysis:
+    model: qwen2.5:14b
+    tier: medium
+    prompt_template: peercortex_rpki_analysis
+    temperature: 0.2
+    max_tokens: 512
+    output_format: text
+    requires_fact_check: true
+    validators: [fact_checker]
+    callers: [peercortex, internal]
+
+  # ─── SWITCHBLADE ─────────────────────────────────────────────────────────────
+  switchblade_incident_summary:
+    model: qwen2.5:14b
+    tier: medium
+    prompt_template: switchblade_incident_summary
+    temperature: 0.2
+    max_tokens: 1024
+    output_format: text
+    requires_fact_check: false
+    validators: [banlist, length]
+    callers: [switchblade, internal]
+
+  switchblade_config_review:
+    model: deepseek-r1:8b
+    tier: medium
+    prompt_template: switchblade_config_review
+    temperature: 0.1
+    max_tokens: 2048
+    output_format: text
+    requires_fact_check: false
+    validators: [length]
+    callers: [switchblade, internal]
+
+  switchblade_peering_recommendation:
+    model: qwen2.5:14b
+    tier: medium
+    prompt_template: switchblade_peering_recommendation
+    temperature: 0.4
+    max_tokens: 1024
+    output_format: json
+    requires_fact_check: true
+    validators: [schema, fact_checker]
+    callers: [switchblade, internal]
+
+  switchblade_blacklist_report:
+    model: qwen2.5:14b
+    tier: medium
+    prompt_template: switchblade_blacklist_report
+    temperature: 0.2
+    max_tokens: 1024
+    output_format: text
+    requires_fact_check: false
+    validators: [length]
+    callers: [switchblade, internal]
+
+  switchblade_rack_documentation:
+    model: qwen2.5:14b
+    tier: medium
+    prompt_template: switchblade_rack_documentation
+    temperature: 0.3
+    max_tokens: 2048
+    output_format: text
+    requires_fact_check: false
+    validators: [banlist, length]
+    callers: [switchblade, internal]
+
+  switchblade_csrd_report:
+    model: qwen2.5:32b
+    tier: large
+    prompt_template: switchblade_csrd_report
+    temperature: 0.4
+    max_tokens: 4096
+    output_format: text
+    requires_fact_check: false
+    validators: [banlist, length]
+    callers: [switchblade, internal]
+
+  switchblade_transceiver_advisor:
+    model: qwen2.5:14b
+    tier: medium
+    prompt_template: switchblade_transceiver_advisor
+    temperature: 0.3
+    max_tokens: 1024
+    output_format: json
+    requires_fact_check: false
+    validators: [schema, tip_validator]
+    callers: [switchblade, internal]
+
+  switchblade_bgp_policy:
+    model: deepseek-r1:8b
+    tier: medium
+    prompt_template: switchblade_bgp_policy
+    temperature: 0.2
+    max_tokens: 2048
+    output_format: text
+    requires_fact_check: false
+    validators: [length]
+    callers: [switchblade, internal]
+
+  # ─── NOGNET / CTXEVENT ───────────────────────────────────────────────────────
+  nognet_event_description:
+    model: qwen2.5:14b
+    tier: medium
+    prompt_template: nognet_event_description
+    temperature: 0.5
+    max_tokens: 2048
+    output_format: text
+    requires_fact_check: false
+    validators: [banlist, language, length]
+    callers: [nognet, internal]
+
+  nognet_sponsor_proposal:
+    model: qwen2.5:32b
+    tier: large
+    prompt_template: nognet_sponsor_proposal
+    temperature: 0.5
+    max_tokens: 3072
+    output_format: text
+    requires_fact_check: false
+    validators: [banlist, language, length]
+    callers: [nognet, internal]
+
+  nognet_program_committee:
+    model: qwen2.5:14b
+    tier: medium
+    prompt_template: nognet_program_committee
+    temperature: 0.4
+    max_tokens: 1024
+    output_format: text
+    requires_fact_check: false
+    validators: [banlist, length]
+    callers: [nognet, internal]
+
+  nognet_recap_article:
+    model: qwen2.5:32b
+    tier: large
+    prompt_template: nognet_recap_article
+    temperature: 0.6
+    max_tokens: 3072
+    output_format: text
+    requires_fact_check: false
+    validators: [banlist, language, length]
+    callers: [nognet, internal]
+
+  ctxevent_agenda_builder:
+    model: qwen2.5:14b
+    tier: medium
+    prompt_template: ctxevent_agenda_builder
+    temperature: 0.4
+    max_tokens: 2048
+    output_format: json
+    requires_fact_check: false
+    validators: [schema, length]
+    callers: [nognet, internal]
+
+  ctxevent_attendee_communication:
+    model: qwen2.5:14b
+    tier: medium
+    prompt_template: ctxevent_attendee_communication
+    temperature: 0.4
+    max_tokens: 1024
+    output_format: text
+    requires_fact_check: false
+    validators: [banlist, language]
+    callers: [nognet, internal]
+
+  # ─── SHIELDX ─────────────────────────────────────────────────────────────────
+  shieldx_threat_classification:
+    model: qwen2.5:14b
+    tier: medium
+    prompt_template: shieldx_threat_classification
+    temperature: 0.1
+    max_tokens: 512
+    output_format: json
+    requires_fact_check: false
+    validators: [schema]
+    callers: [shieldx, internal]
+
+  shieldx_attack_analysis:
+    model: deepseek-r1:8b
+    tier: medium
+    prompt_template: shieldx_attack_analysis
+    temperature: 0.2
+    max_tokens: 1024
+    output_format: json
+    requires_fact_check: false
+    validators: [schema, length]
+    callers: [shieldx, internal]
+
+  shieldx_defense_recommendation:
+    model: qwen2.5:14b
+    tier: medium
+    prompt_template: shieldx_defense_recommendation
+    temperature: 0.3
+    max_tokens: 1024
+    output_format: text
+    requires_fact_check: false
+    validators: [length]
+    callers: [shieldx, internal]
+
+  shieldx_pattern_extraction:
+    model: qwen2.5:14b
+    tier: medium
+    prompt_template: shieldx_pattern_extraction
+    temperature: 0.1
+    max_tokens: 512
+    output_format: json
+    requires_fact_check: false
+    validators: [schema]
+    callers: [shieldx, internal]
+
+  shieldx_red_team_simulate:
+    model: deepseek-r1:32b
+    tier: large
+    prompt_template: shieldx_red_team_simulate
+    temperature: 0.4
+    max_tokens: 2048
+    output_format: json
+    requires_fact_check: false
+    validators: [schema]
+    callers: [shieldx, internal]
+
+  # ─── CONTENT / LINKEDIN ──────────────────────────────────────────────────────
+  linkedin_post:
+    model: qwen2.5:32b
+    tier: large
+    prompt_template: linkedin_post
+    temperature: 0.7
+    max_tokens: 1024
+    output_format: text
+    requires_fact_check: false
+    validators: [banlist, language, length, question_closer]
+    callers: [n8n, internal]
+
+  linkedin_comment:
+    model: qwen2.5:14b
+    tier: medium
+    prompt_template: linkedin_comment
+    temperature: 0.6
+    max_tokens: 256
+    output_format: text
+    requires_fact_check: false
+    validators: [banlist, language, length]
+    callers: [n8n, internal]
+
+  linkedin_article:
+    model: qwen2.5:32b
+    tier: large
+    prompt_template: linkedin_article
+    temperature: 0.6
+    max_tokens: 4096
+    output_format: text
+    requires_fact_check: false
+    validators: [banlist, language, length, question_closer]
+    callers: [n8n, internal]
+
+  blog_post_de:
+    model: qwen2.5:32b
+    tier: large
+    prompt_template: blog_post_de
+    temperature: 0.6
+    max_tokens: 4096
+    output_format: text
+    requires_fact_check: false
+    validators: [banlist, language, length, question_closer]
+    callers: [n8n, internal]
+
+  blog_post_en:
+    model: qwen2.5:32b
+    tier: large
+    prompt_template: blog_post_en
+    temperature: 0.6
+    max_tokens: 4096
+    output_format: text
+    requires_fact_check: false
+    validators: [banlist, language, length, question_closer]
+    callers: [n8n, internal]
+
+  newsletter_section:
+    model: qwen2.5:14b
+    tier: medium
+    prompt_template: newsletter_section
+    temperature: 0.5
+    max_tokens: 1024
+    output_format: text
+    requires_fact_check: false
+    validators: [banlist, language, length]
+    callers: [n8n, internal]
+
+  social_media_thread:
+    model: qwen2.5:14b
+    tier: medium
+    prompt_template: social_media_thread
+    temperature: 0.7
+    max_tokens: 1024
+    output_format: json
+    requires_fact_check: false
+    validators: [schema, banlist, language]
+    callers: [n8n, internal]
+
+  press_release:
+    model: qwen2.5:32b
+    tier: large
+    prompt_template: press_release
+    temperature: 0.4
+    max_tokens: 2048
+    output_format: text
+    requires_fact_check: false
+    validators: [banlist, language, length]
+    callers: [n8n, internal]
+
+  content_translation_de_en:
+    model: qwen2.5:14b
+    tier: medium
+    prompt_template: content_translation_de_en
+    temperature: 0.2
+    max_tokens: 2048
+    output_format: text
+    requires_fact_check: false
+    validators: [language, length]
+    callers: [n8n, internal]
+
+  content_translation_en_de:
+    model: qwen2.5:14b
+    tier: medium
+    prompt_template: content_translation_en_de
+    temperature: 0.2
+    max_tokens: 2048
+    output_format: text
+    requires_fact_check: false
+    validators: [language, length]
+    callers: [n8n, internal]
+
+  # ─── GENERAL PURPOSE ──────────────────────────────────────────────────────────
+  generic_summarize:
+    model: qwen2.5:14b
+    tier: medium
+    prompt_template: generic_summarize
+    temperature: 0.3
+    max_tokens: 1024
+    output_format: text
+    requires_fact_check: false
+    validators: [banlist, length]
+    callers: [all]
+
+  generic_extract:
+    model: qwen2.5:14b
+    tier: medium
+    prompt_template: generic_extract
+    temperature: 0.1
+    max_tokens: 1024
+    output_format: json
+    requires_fact_check: false
+    validators: [schema]
+    callers: [all]
+
+  generic_classify:
+    model: qwen2.5:3b
+    tier: fast
+    prompt_template: generic_classify
+    temperature: 0.1
+    max_tokens: 256
+    output_format: json
+    requires_fact_check: false
+    validators: [schema]
+    callers: [all]
+
+  generic_rewrite:
+    model: qwen2.5:14b
+    tier: medium
+    prompt_template: generic_rewrite
+    temperature: 0.5
+    max_tokens: 2048
+    output_format: text
+    requires_fact_check: false
+    validators: [banlist, length]
+    callers: [all]
+
+  generic_qa:
+    model: qwen2.5:14b
+    tier: medium
+    prompt_template: generic_qa
+    temperature: 0.4
+    max_tokens: 1024
+    output_format: text
+    requires_fact_check: false
+    validators: [banlist, length]
+    callers: [all]
+
+  code_review:
+    model: deepseek-r1:8b
+    tier: medium
+    prompt_template: code_review
+    temperature: 0.2
+    max_tokens: 2048
+    output_format: text
+    requires_fact_check: false
+    validators: [length]
+    callers: [internal, switchblade, shieldx]
+
+  code_generate:
+    model: deepseek-r1:32b
+    tier: large
+    prompt_template: code_generate
+    temperature: 0.3
+    max_tokens: 4096
+    output_format: text
+    requires_fact_check: false
+    validators: [length]
+    callers: [internal, switchblade, shieldx]
+
+  data_enrichment:
+    model: qwen2.5:14b
+    tier: medium
+    prompt_template: data_enrichment
+    temperature: 0.2
+    max_tokens: 1024
+    output_format: json
+    requires_fact_check: false
+    validators: [schema]
+    callers: [all]
+
+# Validator configuration
+validators:
+  schema:
+    enabled: true
+    score_impact: -3.0
+    retry_on_fail: true
+  banlist:
+    enabled: true
+    score_impact_per_hit: -1.0
+    max_penalty: -3.0
+  language:
+    enabled: true
+    wrong_language_impact: -2.0
+    formality_impact: -1.0
+  tip_validator:
+    enabled: true
+    score_impact_per_error: -1.5
+    immediate_reject_threshold: 3
+  fact_checker:
+    enabled: true
+    score_impact: -2.0
+    timeout_ms: 5000
+  length:
+    enabled: true
+    min_chars: 50
+    max_chars: 20000
+    score_impact: -1.0
+  question_closer:
+    enabled: true
+    score_impact: -1.5
--- a/packages/gateway/src/db/client.ts
+++ b/packages/gateway/src/db/client.ts
@ -0,0 +1,72 @@
+import pg from 'pg';
+import { logger } from '../observability/logger.js';
+
+const { Pool } = pg;
+
+let pool: pg.Pool | null = null;
+
+export function getPool(): pg.Pool {
+  if (!pool) {
+    pool = new Pool({
+      host: process.env['DB_HOST'] ?? 'localhost',
+      port: parseInt(process.env['DB_PORT'] ?? '5432', 10),
+      database: process.env['DB_NAME'] ?? 'llm_gateway',
+      user: process.env['DB_USER'] ?? 'llm_gateway',
+      password: process.env['DB_PASSWORD'] ?? '',
+      max: 10,
+      idleTimeoutMillis: 30_000,
+      connectionTimeoutMillis: 5_000,
+    });
+
+    pool.on('error', (err) => {
+      logger.error({ err }, 'PostgreSQL pool error');
+    });
+  }
+  return pool;
+}
+
+export async function query<T extends pg.QueryResultRow = pg.QueryResultRow>(
+  sql: string,
+  params?: unknown[],
+): Promise<pg.QueryResult<T>> {
+  const p = getPool();
+  const maxRetries = 3;
+  let lastError: Error | null = null;
+
+  for (let attempt = 0; attempt < maxRetries; attempt++) {
+    try {
+      return await p.query<T>(sql, params);
+    } catch (err) {
+      const pgErr = err as pg.DatabaseError;
+      const isDeadlock =
+        pgErr.code === '40P01' || pgErr.code === '40001';
+      if (!isDeadlock || attempt === maxRetries - 1) {
+        throw err;
+      }
+      lastError = pgErr;
+      const delay = 50 * Math.pow(2, attempt);
+      await new Promise((resolve) => setTimeout(resolve, delay));
+      logger.warn({ attempt, sql }, 'Retrying after deadlock');
+    }
+  }
+
+  throw lastError ?? new Error('Query failed after retries');
+}
+
+export async function withTransaction<T>(
+  fn: (client: pg.PoolClient) => Promise<T>,
+): Promise<T> {
+  const p = getPool();
+  const client = await p.connect();
+  try {
+    await client.query('BEGIN');
+    const result = await fn(client);
+    await client.query('COMMIT');
+    return result;
+  } catch (err) {
+    await client.query('ROLLBACK');
+    throw err;
+  } finally {
+    client.release();
+  }
+}
--- a/packages/gateway/src/db/migrations/001_initial.sql
+++ b/packages/gateway/src/db/migrations/001_initial.sql
@ -0,0 +1,193 @@
+-- LLM Gateway Initial Schema
+-- Run with: psql -U llm_gateway -d llm_gateway -f 001_initial.sql
+
+CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
+CREATE EXTENSION IF NOT EXISTS "pgcrypto";
+
+-- Enum types
+CREATE TYPE call_status AS ENUM ('approved', 'warning', 'pending_review', 'rejected');
+CREATE TYPE review_decision AS ENUM ('approved', 'rejected', 'edited');
+CREATE TYPE model_tier AS ENUM ('fast', 'medium', 'large');
+
+-- Main audit log for all LLM calls
+CREATE TABLE IF NOT EXISTS llm_calls (
+  id              UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
+  created_at      TIMESTAMPTZ NOT NULL DEFAULT NOW(),
+  caller          TEXT NOT NULL,
+  task_type       TEXT NOT NULL,
+  model_used      TEXT NOT NULL,
+  prompt_id       TEXT NOT NULL,
+  prompt_version  TEXT NOT NULL DEFAULT '1.0.0',
+  input_hash      TEXT NOT NULL,
+  output_text     TEXT,
+  output_hash     TEXT NOT NULL,
+  token_count_in  INTEGER NOT NULL DEFAULT 0,
+  token_count_out INTEGER NOT NULL DEFAULT 0,
+  latency_ms      INTEGER NOT NULL DEFAULT 0,
+  confidence      NUMERIC(4,2) NOT NULL DEFAULT 0,
+  status          call_status NOT NULL DEFAULT 'pending_review',
+  validation_log  JSONB NOT NULL DEFAULT '[]',
+  ban_hits        JSONB NOT NULL DEFAULT '[]',
+  metadata        JSONB
+);
+
+CREATE INDEX idx_llm_calls_created_at ON llm_calls (created_at DESC);
+CREATE INDEX idx_llm_calls_caller ON llm_calls (caller);
+CREATE INDEX idx_llm_calls_task_type ON llm_calls (task_type);
+CREATE INDEX idx_llm_calls_status ON llm_calls (status);
+CREATE INDEX idx_llm_calls_model_used ON llm_calls (model_used);
+
+-- Review queue for low-confidence outputs
+CREATE TABLE IF NOT EXISTS review_queue (
+  id              UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
+  created_at      TIMESTAMPTZ NOT NULL DEFAULT NOW(),
+  reviewed_at     TIMESTAMPTZ,
+  call_id         UUID REFERENCES llm_calls(id) ON DELETE CASCADE,
+  caller          TEXT NOT NULL,
+  task_type       TEXT NOT NULL,
+  input_text      TEXT NOT NULL,
+  output_text     TEXT,
+  confidence      NUMERIC(4,2) NOT NULL,
+  validation_log  JSONB NOT NULL DEFAULT '[]',
+  decision        review_decision,
+  edited_output   TEXT,
+  reviewer_notes  TEXT,
+  notified        BOOLEAN NOT NULL DEFAULT FALSE
+);
+
+CREATE INDEX idx_review_queue_created_at ON review_queue (created_at DESC);
+CREATE INDEX idx_review_queue_decision ON review_queue (decision) WHERE decision IS NULL;
+CREATE INDEX idx_review_queue_caller ON review_queue (caller);
+
+-- Prompt version tracking
+CREATE TABLE IF NOT EXISTS prompt_versions (
+  id              UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
+  created_at      TIMESTAMPTZ NOT NULL DEFAULT NOW(),
+  prompt_id       TEXT NOT NULL,
+  version         TEXT NOT NULL,
+  task_type       TEXT NOT NULL,
+  template_yaml   TEXT NOT NULL,
+  active          BOOLEAN NOT NULL DEFAULT TRUE,
+  deployed_by     TEXT,
+  notes           TEXT,
+  UNIQUE(prompt_id, version)
+);
+
+CREATE INDEX idx_prompt_versions_prompt_id ON prompt_versions (prompt_id, active);
+
+-- Ban list hit analytics
+CREATE TABLE IF NOT EXISTS ban_analytics (
+  id              UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
+  created_at      TIMESTAMPTZ NOT NULL DEFAULT NOW(),
+  call_id         UUID REFERENCES llm_calls(id) ON DELETE SET NULL,
+  term            TEXT NOT NULL,
+  category        TEXT NOT NULL,
+  language        TEXT NOT NULL CHECK (language IN ('en', 'de', 'auto')),
+  caller          TEXT NOT NULL,
+  task_type       TEXT NOT NULL,
+  context_snippet TEXT
+);
+
+CREATE INDEX idx_ban_analytics_term ON ban_analytics (term);
+CREATE INDEX idx_ban_analytics_created_at ON ban_analytics (created_at DESC);
+CREATE INDEX idx_ban_analytics_caller ON ban_analytics (caller);
+
+-- TIP enrichment log (transceiver-specific)
+CREATE TABLE IF NOT EXISTS tip_enrichment_log (
+  id              UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
+  created_at      TIMESTAMPTZ NOT NULL DEFAULT NOW(),
+  call_id         UUID REFERENCES llm_calls(id) ON DELETE SET NULL,
+  part_number     TEXT,
+  form_factor     TEXT,
+  data_rate_gbps  NUMERIC,
+  wavelength_nm   NUMERIC,
+  connector       TEXT,
+  fiber_type      TEXT,
+  vendor          TEXT,
+  sff8024_code    TEXT,
+  validation_pass BOOLEAN NOT NULL DEFAULT FALSE,
+  failures        JSONB NOT NULL DEFAULT '[]'
+);
+
+CREATE INDEX idx_tip_enrichment_log_part_number ON tip_enrichment_log (part_number);
+CREATE INDEX idx_tip_enrichment_log_created_at ON tip_enrichment_log (created_at DESC);
+
+-- Learning corpus for fine-tuning (approved outputs only)
+CREATE TABLE IF NOT EXISTS learning_corpus (
+  id              UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
+  created_at      TIMESTAMPTZ NOT NULL DEFAULT NOW(),
+  call_id         UUID REFERENCES llm_calls(id) ON DELETE SET NULL,
+  task_type       TEXT NOT NULL,
+  prompt_text     TEXT NOT NULL,
+  completion_text TEXT NOT NULL,
+  quality_score   NUMERIC(4,2) NOT NULL,
+  included_in_run UUID,
+  tags            TEXT[] NOT NULL DEFAULT '{}'
+);
+
+CREATE INDEX idx_learning_corpus_task_type ON learning_corpus (task_type);
+CREATE INDEX idx_learning_corpus_quality ON learning_corpus (quality_score DESC);
+
+-- Fine-tuning run tracking
+CREATE TABLE IF NOT EXISTS fine_tuning_runs (
+  id              UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
+  created_at      TIMESTAMPTZ NOT NULL DEFAULT NOW(),
+  completed_at    TIMESTAMPTZ,
+  base_model      TEXT NOT NULL,
+  output_model    TEXT,
+  sample_count    INTEGER NOT NULL DEFAULT 0,
+  task_types      TEXT[] NOT NULL DEFAULT '{}',
+  status          TEXT NOT NULL DEFAULT 'pending',
+  metrics         JSONB,
+  notes           TEXT
+);
+
+-- Routing performance metrics
+CREATE TABLE IF NOT EXISTS routing_metrics (
+  id              UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
+  recorded_at     TIMESTAMPTZ NOT NULL DEFAULT NOW(),
+  task_type       TEXT NOT NULL,
+  model_used      TEXT NOT NULL,
+  latency_ms      INTEGER NOT NULL,
+  token_count_in  INTEGER NOT NULL,
+  token_count_out INTEGER NOT NULL,
+  confidence      NUMERIC(4,2) NOT NULL,
+  status          call_status NOT NULL,
+  circuit_breaker_state TEXT NOT NULL DEFAULT 'closed'
+);
+
+CREATE INDEX idx_routing_metrics_recorded_at ON routing_metrics (recorded_at DESC);
+CREATE INDEX idx_routing_metrics_task_type ON routing_metrics (task_type, model_used);
+
+-- Batch job tracking
+CREATE TABLE IF NOT EXISTS batch_jobs (
+  id              UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
+  created_at      TIMESTAMPTZ NOT NULL DEFAULT NOW(),
+  completed_at    TIMESTAMPTZ,
+  caller          TEXT NOT NULL,
+  task_count      INTEGER NOT NULL DEFAULT 0,
+  completed_count INTEGER NOT NULL DEFAULT 0,
+  failed_count    INTEGER NOT NULL DEFAULT 0,
+  webhook_url     TEXT,
+  status          TEXT NOT NULL DEFAULT 'queued',
+  results         JSONB,
+  pg_boss_id      TEXT
+);
+
+CREATE INDEX idx_batch_jobs_caller ON batch_jobs (caller);
+CREATE INDEX idx_batch_jobs_status ON batch_jobs (status);
+CREATE INDEX idx_batch_jobs_created_at ON batch_jobs (created_at DESC);
+
+-- Fact check cache
+CREATE TABLE IF NOT EXISTS fact_check_cache (
+  id              UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
+  created_at      TIMESTAMPTZ NOT NULL DEFAULT NOW(),
+  expires_at      TIMESTAMPTZ NOT NULL,
+  source          TEXT NOT NULL,
+  lookup_key      TEXT NOT NULL,
+  result          JSONB NOT NULL,
+  UNIQUE(source, lookup_key)
+);
+
+CREATE INDEX idx_fact_check_cache_expires ON fact_check_cache (expires_at);
+CREATE INDEX idx_fact_check_cache_lookup ON fact_check_cache (source, lookup_key);
--- a/packages/gateway/src/integrations/peeringdb.ts
+++ b/packages/gateway/src/integrations/peeringdb.ts
@ -0,0 +1,143 @@
+import { logger } from '../observability/logger.js';
+
+const PEERINGDB_BASE = 'https://www.peeringdb.com/api';
+const CACHE_TTL_MS = 60 * 60 * 1000; // 1 hour
+const FETCH_TIMEOUT_MS = 5000;
+
+interface CacheEntry<T> {
+  value: T;
+  expiresAt: number;
+}
+
+interface PeeringDbOrg {
+  id: number;
+  name: string;
+  website: string;
+  social_media: unknown[];
+}
+
+interface PeeringDbNet {
+  id: number;
+  org_id: number;
+  org: PeeringDbOrg;
+  name: string;
+  aka: string;
+  website: string;
+  asn: number;
+  info_type: string;
+  info_prefixes4: number;
+  info_prefixes6: number;
+  policy_general: string;
+  status: string;
+}
+
+interface PeeringDbIx {
+  id: number;
+  name: string;
+  name_long: string;
+  city: string;
+  country: string;
+  website: string;
+  status: string;
+}
+
+interface PeeringDbResponse<T> {
+  data: T[];
+  meta: Record<string, unknown>;
+}
+
+// In-memory LRU-style cache (simple map with TTL)
+const cache = new Map<string, CacheEntry<unknown>>();
+
+function getCached<T>(key: string): T | null {
+  const entry = cache.get(key) as CacheEntry<T> | undefined;
+  if (!entry) return null;
+  if (Date.now() > entry.expiresAt) {
+    cache.delete(key);
+    return null;
+  }
+  return entry.value;
+}
+
+function setCached<T>(key: string, value: T): void {
+  // Evict old entries if cache grows large
+  if (cache.size > 1000) {
+    const now = Date.now();
+    for (const [k, v] of cache) {
+      if (now > v.expiresAt) {
+        cache.delete(k);
+      }
+    }
+  }
+  cache.set(key, { value, expiresAt: Date.now() + CACHE_TTL_MS });
+}
+
+async function fetchPeeringDb<T>(path: string): Promise<PeeringDbResponse<T>> {
+  const url = `${PEERINGDB_BASE}${path}`;
+  const controller = new AbortController();
+  const timer = setTimeout(() => controller.abort(), FETCH_TIMEOUT_MS);
+
+  try {
+    const response = await fetch(url, {
+      signal: controller.signal,
+      headers: {
+        Accept: 'application/json',
+        'User-Agent': 'llm-gateway/1.0 (github.com/renefichtmueller/llm-gateway)',
+      },
+    });
+
+    if (!response.ok) {
+      throw new Error(`PeeringDB HTTP ${response.status}`);
+    }
+
+    return await response.json() as PeeringDbResponse<T>;
+  } finally {
+    clearTimeout(timer);
+  }
+}
+
+export async function lookupAsn(asn: number): Promise<PeeringDbNet | null> {
+  const cacheKey = `asn:${asn}`;
+  const cached = getCached<PeeringDbNet | null>(cacheKey);
+  if (cached !== null || cache.has(cacheKey)) return cached;
+
+  try {
+    const result = await fetchPeeringDb<PeeringDbNet>(`/net?asn=${asn}&status=ok&depth=2`);
+    const net = result.data[0] ?? null;
+    setCached(cacheKey, net);
+    return net;
+  } catch (err) {
+    logger.debug({ err, asn }, 'PeeringDB ASN lookup failed');
+    return null;
+  }
+}
+
+export async function lookupIx(name: string): Promise<PeeringDbIx | null> {
+  const cacheKey = `ix:${name.toLowerCase()}`;
+  const cached = getCached<PeeringDbIx | null>(cacheKey);
+  if (cached !== null || cache.has(cacheKey)) return cached;
+
+  try {
+    const result = await fetchPeeringDb<PeeringDbIx>(`/ix?name__icontains=${encodeURIComponent(name)}&status=ok`);
+    const ix = result.data[0] ?? null;
+    setCached(cacheKey, ix);
+    return ix;
+  } catch (err) {
+    logger.debug({ err, name }, 'PeeringDB IX lookup failed');
+    return null;
+  }
+}
+
+export async function lookupOrgByAsn(asn: number): Promise<PeeringDbOrg | null> {
+  const net = await lookupAsn(asn);
+  if (!net) return null;
+  return net.org ?? null;
+}
+
+export function clearCache(): void {
+  cache.clear();
+}
+
+export function getCacheSize(): number {
+  return cache.size;
+}
--- a/packages/gateway/src/integrations/sff8024.ts
+++ b/packages/gateway/src/integrations/sff8024.ts
@ -0,0 +1,167 @@
+// SFF-8024 local store
+// Source: SFF-8024 Rev 4.10 (November 2021) — Transceiver Management
+
+// Identifier Values (Table 4-1)
+export const IDENTIFIER_CODES: Record<string, string> = {
+  '00': 'Unknown or unspecified',
+  '01': 'GBIC',
+  '02': 'Module/connector soldered to motherboard',
+  '03': 'SFP/SFP+/SFP28',
+  '04': '300 pin XBI',
+  '05': 'XENPAK',
+  '06': 'XFP',
+  '07': 'XFF',
+  '08': 'XFP-E',
+  '09': 'XPAK',
+  '0A': 'X2',
+  '0B': 'DWDM-SFP/SFP+ (not using SFF-8472)',
+  '0C': 'QSFP (INF-8438)',
+  '0D': 'QSFP+ or later (SFF-8436, SFF-8635, SFF-8665, SFF-8685 et al)',
+  '0E': 'CXP or later (INF-8644 et al)',
+  '0F': 'Shielded Mini Multilane HD 4X',
+  '10': 'Shielded Mini Multilane HD 8X',
+  '11': 'QSFP28 or later (SFF-8665 et al)',
+  '12': 'CXP2 (aka CXP28) or later',
+  '13': 'CDFP (Style 1/Style 2)',
+  '14': 'Shielded Mini Multilane HD 4X Fanout Cable',
+  '15': 'Shielded Mini Multilane HD 8X Fanout Cable',
+  '16': 'CDFP (Style 3)',
+  '17': 'microQSFP',
+  '18': 'QSFP-DD Double Density 8X Pluggable Transceiver (INF-8628)',
+  '19': 'OSFP 8X Pluggable Transceiver',
+  '1A': 'SFP-DD Double Density 2X Pluggable Transceiver',
+  '1B': 'DSFP Dual Small Form Factor Pluggable Transceiver',
+  '1C': 'x4 Minilink/OcuLink',
+  '1D': 'x8 Minilink',
+  '1E': 'QSFP+ or later (SFF-8436, SFF-8635 et al) with Common Management Interface Specification (CMIS)',
+  '1F': 'SFP-DD (SFF-8690)',
+  '20': 'DSFP (SFF-8692)',
+  '21': 'QSFP-DD (SFF-8681)',
+  '22': 'OSFP (SFF-8679)',
+  '23': 'microSFP',
+  '24': 'QSFP112 (200G per lane)',
+  '25': 'OSFP-XD',
+  '26': 'CSFP (Compact SFP)',
+  '27': 'SFPDD (200G)',
+  '28': 'SFP (SFF-8024)',
+};
+
+// Connector Types (Table 4-3)
+export const CONNECTOR_CODES: Record<string, string> = {
+  '00': 'Unknown or unspecified',
+  '01': 'SC (Subscriber Connector)',
+  '02': 'Fibre Channel Style 1 copper connector',
+  '03': 'Fibre Channel Style 2 copper connector',
+  '04': 'BNC/TNC (Bayonet/Threaded Neill-Concelman)',
+  '05': 'Fibre Channel coax headers',
+  '06': 'Fiber Jack',
+  '07': 'LC (Lucent Connector)',
+  '08': 'MT-RJ (Mechanical Transfer - Registered Jack)',
+  '09': 'MU (Multiple Use)',
+  '0A': 'SG',
+  '0B': 'Optical Pigtail',
+  '0C': 'MPO 1x12 (Multifiber Push On)',
+  '0D': 'MPO 2x16',
+  '20': 'HSSDC II (High Speed Serial Data Connector)',
+  '21': 'Copper Pigtail',
+  '22': 'RJ45 (Registered Jack 45)',
+  '23': 'No separable connector',
+  '24': 'MXC 2x16',
+  '25': 'CS optical connector',
+  '26': 'SN (previously Mini CS) optical connector',
+  '27': 'MPO 2x12',
+  '28': 'MPO 1x16',
+};
+
+// Encoding Codes (Table 4-2)
+export const ENCODING_CODES: Record<string, string> = {
+  '00': 'Unspecified',
+  '01': '8B10B',
+  '02': '4B5B',
+  '03': 'NRZ',
+  '04': 'Manchester',
+  '05': 'SONET Scrambled',
+  '06': '64B/66B',
+  '07': '256B/257B (transcoded FEC-enabled data)',
+  '08': 'PAM4',
+  '09': 'ANSI / INCITS TR-48 (8B6T)',
+  '0A': 'ANSI / INCITS TR-48 (64B/80B)',
+  '0B': 'ANSI / INCITS TR-48 (64B/80B with Reed Solomon)',
+  '0C': '256B/257B (transcoded FEC-enabled data) IEEE Std 802.3',
+  '0D': 'PAM4 with Nyquist signaling',
+};
+
+// Extended Identifier Values (Table 4-4)
+export const EXTENDED_IDENTIFIER_CODES: Record<string, string> = {
+  '00': 'Power Level 1 Module (1.5W max.)',
+  '01': 'Power Level 2 Module (2.0W max.)',
+  '02': 'Power Level 3 Module (2.5W max.)',
+  '03': 'Power Level 4 Module (3.5W max.)',
+  '04': 'Power Level 5 Module (4.0W max.)',
+  '05': 'Power Level 6 Module (4.5W max.)',
+  '06': 'Power Level 7 Module (5.0W max.)',
+  '07': 'Power Level 8 Module (10W max.)',
+};
+
+// Nominal Signaling Rate Descriptor (Table 4-9)
+export const DATA_RATE_CODES: Record<string, string> = {
+  '01': '100 MBd (1 Gbps Ethernet)',
+  '0A': '1.0625 GBd',
+  '0C': '1.25 GBd (1000BASE-X)',
+  '14': '2.125 GBd',
+  '1E': '2.5 GBd',
+  '28': '4.25 GBd',
+  '50': '8.5 GBd',
+  '64': '10.3 GBd',
+  '67': '10.518 GBd',
+  '68': '10.5 GBd',
+  '6E': '11.1 GBd',
+  'FF': 'Encoded in upper 3 bits of Byte 67',
+};
+
+// Well-known transceiver type strings mapped to standard identifiers
+export const FORM_FACTOR_TO_IDENTIFIER: Record<string, string> = {
+  GBIC: '01',
+  SFP: '03',
+  'SFP+': '03',
+  SFP28: '03',
+  SFP56: '03',
+  QSFP: '0C',
+  'QSFP+': '0D',
+  QSFP28: '11',
+  QSFP56: '11',
+  'QSFP-DD': '18',
+  OSFP: '19',
+  'OSFP-XD': '25',
+  CXP: '0E',
+  XFP: '06',
+  X2: '0A',
+  XENPAK: '05',
+  'SFP-DD': '1A',
+  DSFP: '1B',
+  CDFP: '16',
+};
+
+export function getIdentifierName(code: string): string | undefined {
+  return IDENTIFIER_CODES[code.toUpperCase()];
+}
+
+export function getConnectorName(code: string): string | undefined {
+  return CONNECTOR_CODES[code.toUpperCase()];
+}
+
+export function getEncodingName(code: string): string | undefined {
+  return ENCODING_CODES[code.toUpperCase()];
+}
+
+export function formFactorToIdentifierCode(formFactor: string): string | undefined {
+  return FORM_FACTOR_TO_IDENTIFIER[formFactor.toUpperCase()];
+}
+
+export function getAllFormFactors(): string[] {
+  return Object.keys(FORM_FACTOR_TO_IDENTIFIER);
+}
+
+export function getAllConnectorNames(): string[] {
+  return Object.values(CONNECTOR_CODES);
+}
--- a/packages/gateway/src/integrations/tip-db.ts
+++ b/packages/gateway/src/integrations/tip-db.ts
@ -0,0 +1,152 @@
+import pg from 'pg';
+import { logger } from '../observability/logger.js';
+
+const { Pool } = pg;
+
+// TIP database on Erik (IONOS VPS)
+const TIP_DB_CONFIG = {
+  host: process.env['TIP_DB_HOST'] ?? '217.154.82.179',
+  port: parseInt(process.env['TIP_DB_PORT'] ?? '5433', 10),
+  database: process.env['TIP_DB_NAME'] ?? 'transceiver_db',
+  user: process.env['TIP_DB_USER'] ?? 'tip',
+  password: process.env['TIP_DB_PASSWORD'] ?? 'tip_prod_2026',
+  max: 5,
+  idleTimeoutMillis: 60_000,
+  connectionTimeoutMillis: 10_000,
+  ssl: process.env['TIP_DB_SSL'] === 'true' ? { rejectUnauthorized: false } : false,
+};
+
+let tipPool: pg.Pool | null = null;
+
+function getTipPool(): pg.Pool {
+  if (!tipPool) {
+    tipPool = new Pool(TIP_DB_CONFIG);
+    tipPool.on('error', (err) => {
+      logger.error({ err }, 'TIP database pool error');
+    });
+    tipPool.on('connect', () => {
+      logger.debug('TIP database connection established');
+    });
+  }
+  return tipPool;
+}
+
+export interface TransceiverRecord {
+  id: string;
+  part_number: string;
+  vendor: string;
+  form_factor: string;
+  data_rate_gbps: number;
+  wavelength_nm: number | null;
+  fiber_type: string;
+  connector: string;
+  reach_m: number | null;
+  temperature_class: string;
+  price_usd: number | null;
+  compatible_with: string[];
+  sff8024_identifier: string | null;
+  created_at: string;
+  updated_at: string;
+}
+
+export interface PriceRecord {
+  vendor: string;
+  part_number: string;
+  price_usd: number;
+  currency: string;
+  source_url: string;
+  scraped_at: string;
+}
+
+export async function lookupTransceiver(partNumber: string): Promise<TransceiverRecord | null> {
+  const pool = getTipPool();
+  try {
+    const result = await pool.query<TransceiverRecord>(
+      `SELECT * FROM transceivers WHERE UPPER(part_number) = UPPER($1) LIMIT 1`,
+      [partNumber],
+    );
+    return result.rows[0] ?? null;
+  } catch (err) {
+    logger.warn({ err, partNumber }, 'TIP DB transceiver lookup failed');
+    return null;
+  }
+}
+
+export async function lookupByFormFactor(
+  formFactor: string,
+  dataRateGbps?: number,
+): Promise<TransceiverRecord[]> {
+  const pool = getTipPool();
+  try {
+    const params: unknown[] = [formFactor];
+    let sql = `SELECT * FROM transceivers WHERE UPPER(form_factor) = UPPER($1)`;
+    if (dataRateGbps !== undefined) {
+      params.push(dataRateGbps);
+      sql += ` AND data_rate_gbps = $2`;
+    }
+    sql += ` ORDER BY price_usd ASC NULLS LAST LIMIT 20`;
+    const result = await pool.query<TransceiverRecord>(sql, params);
+    return result.rows;
+  } catch (err) {
+    logger.warn({ err, formFactor }, 'TIP DB form factor lookup failed');
+    return [];
+  }
+}
+
+export async function getPriceHistory(
+  partNumber: string,
+  vendor?: string,
+  daysBack = 30,
+): Promise<PriceRecord[]> {
+  const pool = getTipPool();
+  try {
+    const params: unknown[] = [partNumber, daysBack];
+    let sql = `
+      SELECT vendor, part_number, price_usd, currency, source_url, scraped_at
+      FROM price_history
+      WHERE UPPER(part_number) = UPPER($1)
+        AND scraped_at > NOW() - INTERVAL '$2 days'
+    `;
+    if (vendor) {
+      params.push(vendor);
+      sql += ` AND UPPER(vendor) = UPPER($${params.length})`;
+    }
+    sql += ` ORDER BY scraped_at DESC LIMIT 100`;
+    const result = await pool.query<PriceRecord>(sql, params);
+    return result.rows;
+  } catch (err) {
+    logger.warn({ err, partNumber }, 'TIP DB price history lookup failed');
+    return [];
+  }
+}
+
+export async function getVendorList(): Promise<string[]> {
+  const pool = getTipPool();
+  try {
+    const result = await pool.query<{ vendor: string }>(
+      `SELECT DISTINCT vendor FROM transceivers WHERE vendor IS NOT NULL ORDER BY vendor`,
+    );
+    return result.rows.map((r) => r.vendor);
+  } catch (err) {
+    logger.warn({ err }, 'TIP DB vendor list lookup failed');
+    return [];
+  }
+}
+
+export async function closeTipPool(): Promise<void> {
+  if (tipPool) {
+    await tipPool.end();
+    tipPool = null;
+  }
+}
+
+export async function testTipConnection(): Promise<boolean> {
+  const pool = getTipPool();
+  try {
+    await pool.query('SELECT 1');
+    return true;
+  } catch (err) {
+    logger.warn({ err }, 'TIP DB connection test failed');
+    return false;
+  }
+}
--- a/packages/gateway/src/observability/audit-log.ts
+++ b/packages/gateway/src/observability/audit-log.ts
@ -0,0 +1,104 @@
+import { createHash } from 'crypto';
+import { query } from '../db/client.js';
+import { logger } from './logger.js';
+import type { ValidationResult } from '../pipeline/post-validator.js';
+import type { BanViolation } from '../validation/banlist-checker.js';
+
+export interface AuditEntry {
+  caller: string;
+  task_type: string;
+  model_used: string;
+  prompt_id: string;
+  prompt_version: string;
+  input_hash: string;
+  output_text?: string;
+  output_hash: string;
+  token_count_in: number;
+  token_count_out: number;
+  latency_ms: number;
+  confidence: number;
+  status: 'approved' | 'warning' | 'pending_review' | 'rejected';
+  validation_log: ValidationResult[];
+  ban_hits: BanViolation[];
+  metadata?: Record<string, unknown>;
+}
+
+export function hashText(text: string): string {
+  return createHash('sha256').update(text, 'utf-8').digest('hex');
+}
+
+export async function writeAuditLog(entry: AuditEntry): Promise<string> {
+  const sql = `
+    INSERT INTO llm_calls (
+      caller, task_type, model_used, prompt_id, prompt_version,
+      input_hash, output_text, output_hash,
+      token_count_in, token_count_out, latency_ms,
+      confidence, status, validation_log, ban_hits, metadata
+    ) VALUES (
+      $1, $2, $3, $4, $5,
+      $6, $7, $8,
+      $9, $10, $11,
+      $12, $13, $14, $15, $16
+    )
+    RETURNING id
+  `;
+
+  const params = [
+    entry.caller,
+    entry.task_type,
+    entry.model_used,
+    entry.prompt_id,
+    entry.prompt_version,
+    entry.input_hash,
+    entry.output_text ?? null,
+    entry.output_hash,
+    entry.token_count_in,
+    entry.token_count_out,
+    entry.latency_ms,
+    entry.confidence,
+    entry.status,
+    JSON.stringify(entry.validation_log),
+    JSON.stringify(entry.ban_hits),
+    entry.metadata ? JSON.stringify(entry.metadata) : null,
+  ];
+
+  try {
+    const result = await query<{ id: string }>(sql, params);
+    return (result.rows[0]?.id) ?? '';
+  } catch (err) {
+    logger.error({ err, caller: entry.caller, task_type: entry.task_type }, 'Failed to write audit log');
+    return '';
+  }
+}
+
+export async function writeBanAnalytics(
+  callId: string,
+  violations: BanViolation[],
+  caller: string,
+  taskType: string,
+): Promise<void> {
+  if (violations.length === 0) return;
+
+  const values = violations
+    .map(
+      (_, i) =>
+        `($${i * 7 + 1}, $${i * 7 + 2}, $${i * 7 + 3}, $${i * 7 + 4}, $${i * 7 + 5}, $${i * 7 + 6}, $${i * 7 + 7})`,
+    )
+    .join(', ');
+
+  const params: unknown[] = [];
+  for (const v of violations) {
+    params.push(callId, v.term, v.category, v.language, caller, taskType, v.context);
+  }
+
+  const sql = `
+    INSERT INTO ban_analytics (call_id, term, category, language, caller, task_type, context_snippet)
+    VALUES ${values}
+  `;
+
+  try {
+    await query(sql, params);
+  } catch (err) {
+    logger.warn({ err }, 'Failed to write ban analytics');
+  }
+}
--- a/packages/gateway/src/observability/logger.ts
+++ b/packages/gateway/src/observability/logger.ts
@ -0,0 +1,12 @@
+import pino from 'pino';
+
+export const logger = pino({
+  level: process.env['LOG_LEVEL'] ?? 'info',
+  transport:
+    process.env['NODE_ENV'] !== 'production'
+      ? {
+          target: 'pino-pretty',
+          options: { colorize: true, translateTime: 'SYS:standard' },
+        }
+      : undefined,
+});
--- a/packages/gateway/src/observability/metrics.ts
+++ b/packages/gateway/src/observability/metrics.ts
@ -0,0 +1,90 @@
+import {
+  Counter,
+  Histogram,
+  Gauge,
+  Registry,
+  collectDefaultMetrics,
+} from 'prom-client';
+
+const registry = new Registry();
+collectDefaultMetrics({ register: registry });
+
+export const requestsTotal = new Counter({
+  name: 'llm_gateway_requests_total',
+  help: 'Total LLM requests processed',
+  labelNames: ['caller', 'task_type', 'status'],
+  registers: [registry],
+});
+
+export const latencySeconds = new Histogram({
+  name: 'llm_gateway_latency_seconds',
+  help: 'End-to-end request latency',
+  labelNames: ['caller', 'task_type', 'model'],
+  buckets: [0.5, 1, 2, 5, 10, 20, 30, 60, 120],
+  registers: [registry],
+});
+
+export const tokensTotal = new Counter({
+  name: 'llm_gateway_tokens_total',
+  help: 'Total tokens processed',
+  labelNames: ['direction', 'model'],
+  registers: [registry],
+});
+
+export const confidenceScore = new Histogram({
+  name: 'llm_gateway_confidence_score',
+  help: 'Confidence score distribution',
+  labelNames: ['task_type', 'model'],
+  buckets: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
+  registers: [registry],
+});
+
+export const banlistHitsTotal = new Counter({
+  name: 'llm_gateway_banlist_hits_total',
+  help: 'Total ban list hits',
+  labelNames: ['term', 'language', 'category'],
+  registers: [registry],
+});
+
+export const validationFailuresTotal = new Counter({
+  name: 'llm_gateway_validation_failures_total',
+  help: 'Total validation failures per validator',
+  labelNames: ['validator', 'task_type'],
+  registers: [registry],
+});
+
+export const reviewQueueSize = new Gauge({
+  name: 'llm_gateway_review_queue_size',
+  help: 'Number of items in the review queue awaiting decision',
+  registers: [registry],
+});
+
+export const circuitBreakerState = new Gauge({
+  name: 'llm_gateway_circuit_breaker_state',
+  help: 'Circuit breaker state: 0=closed, 0.5=half-open, 1=open',
+  labelNames: ['model'],
+  registers: [registry],
+});
+
+export const rateLimitRejectedTotal = new Counter({
+  name: 'llm_gateway_rate_limit_rejected_total',
+  help: 'Total rate-limited requests per caller',
+  labelNames: ['caller'],
+  registers: [registry],
+});
+
+export function recordCircuitBreakerState(
+  model: string,
+  state: 'closed' | 'open' | 'half-open',
+): void {
+  const value = state === 'closed' ? 0 : state === 'half-open' ? 0.5 : 1;
+  circuitBreakerState.labels({ model }).set(value);
+}
+
+export async function getMetrics(): Promise<string> {
+  return registry.metrics();
+}
+
+export function getContentType(): string {
+  return registry.contentType;
+}
--- a/packages/gateway/src/observability/review-queue.ts
+++ b/packages/gateway/src/observability/review-queue.ts
@ -0,0 +1,144 @@
+import { query } from '../db/client.js';
+import { logger } from './logger.js';
+import { reviewQueueSize } from './metrics.js';
+import type { ValidationResult } from '../pipeline/post-validator.js';
+
+export interface ReviewQueueItem {
+  id: string;
+  created_at: string;
+  caller: string;
+  task_type: string;
+  input_text: string;
+  output_text: string | null;
+  confidence: number;
+  validation_log: ValidationResult[];
+  decision: 'approved' | 'rejected' | 'edited' | null;
+  edited_output: string | null;
+  reviewer_notes: string | null;
+}
+
+export interface ReviewDecision {
+  decision: 'approved' | 'rejected' | 'edited';
+  edited_output?: string;
+  reviewer_notes?: string;
+}
+
+const WEBHOOK_URL = process.env['REVIEW_QUEUE_WEBHOOK_URL'] ?? '';
+
+async function notifyWebhook(item: ReviewQueueItem): Promise<void> {
+  if (!WEBHOOK_URL) return;
+  try {
+    await fetch(WEBHOOK_URL, {
+      method: 'POST',
+      headers: { 'Content-Type': 'application/json' },
+      body: JSON.stringify({
+        event: 'review_queue_new',
+        id: item.id,
+        caller: item.caller,
+        task_type: item.task_type,
+        confidence: item.confidence,
+        created_at: item.created_at,
+      }),
+    });
+  } catch (err) {
+    logger.warn({ err }, 'Review queue webhook notification failed');
+  }
+}
+
+export async function addToReviewQueue(params: {
+  callId: string;
+  caller: string;
+  taskType: string;
+  inputText: string;
+  outputText?: string;
+  confidence: number;
+  validationLog: ValidationResult[];
+}): Promise<string> {
+  const sql = `
+    INSERT INTO review_queue (call_id, caller, task_type, input_text, output_text, confidence, validation_log)
+    VALUES ($1, $2, $3, $4, $5, $6, $7)
+    RETURNING id, created_at, caller, task_type, input_text, output_text, confidence, validation_log
+  `;
+
+  try {
+    const result = await query<ReviewQueueItem>(sql, [
+      params.callId,
+      params.caller,
+      params.taskType,
+      params.inputText,
+      params.outputText ?? null,
+      params.confidence,
+      JSON.stringify(params.validationLog),
+    ]);
+
+    const item = result.rows[0];
+    if (!item) throw new Error('Insert returned no rows');
+
+    // Update gauge
+    await updateReviewQueueGauge();
+
+    // Notify webhook (non-blocking)
+    void notifyWebhook(item);
+
+    return item.id;
+  } catch (err) {
+    logger.error({ err, caller: params.caller }, 'Failed to add item to review queue');
+    return '';
+  }
+}
+
+export async function listPendingReviews(
+  limit = 50,
+  offset = 0,
+): Promise<ReviewQueueItem[]> {
+  const sql = `
+    SELECT id, created_at, caller, task_type, input_text, output_text,
+           confidence, validation_log, decision, edited_output, reviewer_notes
+    FROM review_queue
+    WHERE decision IS NULL
+    ORDER BY confidence ASC, created_at ASC
+    LIMIT $1 OFFSET $2
+  `;
+  const result = await query<ReviewQueueItem>(sql, [limit, offset]);
+  return result.rows;
+}
+
+export async function processDecision(
+  id: string,
+  decision: ReviewDecision,
+): Promise<ReviewQueueItem | null> {
+  const sql = `
+    UPDATE review_queue
+    SET decision = $1,
+        edited_output = $2,
+        reviewer_notes = $3,
+        reviewed_at = NOW()
+    WHERE id = $4
+    RETURNING *
+  `;
+
+  const result = await query<ReviewQueueItem>(sql, [
+    decision.decision,
+    decision.edited_output ?? null,
+    decision.reviewer_notes ?? null,
+    id,
+  ]);
+
+  const updated = result.rows[0] ?? null;
+  if (updated) {
+    await updateReviewQueueGauge();
+  }
+  return updated;
+}
+
+export async function updateReviewQueueGauge(): Promise<void> {
+  try {
+    const result = await query<{ count: string }>(
+      'SELECT COUNT(*) as count FROM review_queue WHERE decision IS NULL',
+    );
+    const count = parseInt(result.rows[0]?.count ?? '0', 10);
+    reviewQueueSize.set(count);
+  } catch (err) {
+    logger.warn({ err }, 'Failed to update review queue gauge');
+  }
+}
--- a/packages/gateway/src/pipeline/confidence-gate.ts
+++ b/packages/gateway/src/pipeline/confidence-gate.ts
@ -0,0 +1,38 @@
+import type { PostValidationOutput } from './post-validator.js';
+
+export type ConfidenceStatus = 'approved' | 'warning' | 'pending_review';
+
+export interface ConfidenceResult {
+  score: number;
+  status: ConfidenceStatus;
+  base_score: number;
+  total_impact: number;
+}
+
+const BASE_SCORE = 8.0;
+const APPROVED_THRESHOLD = 7.0;
+const WARNING_THRESHOLD = 4.0;
+
+export function evaluateConfidence(
+  validationOutput: PostValidationOutput,
+): ConfidenceResult {
+  const totalImpact = validationOutput.total_score_impact;
+  const raw = BASE_SCORE + totalImpact;
+  const score = Math.max(0, Math.min(10, raw));
+
+  let status: ConfidenceStatus;
+  if (score >= APPROVED_THRESHOLD) {
+    status = 'approved';
+  } else if (score >= WARNING_THRESHOLD) {
+    status = 'warning';
+  } else {
+    status = 'pending_review';
+  }
+
+  return {
+    score,
+    status,
+    base_score: BASE_SCORE,
+    total_impact: totalImpact,
+  };
+}
--- a/packages/gateway/src/pipeline/llm-client.ts
+++ b/packages/gateway/src/pipeline/llm-client.ts
@ -0,0 +1,132 @@
+import { getBreaker, type ModelTier } from '../circuit-breaker/ollama-breaker.js';
+import { getOllamaBaseUrl } from './router.js';
+import { logger } from '../observability/logger.js';
+
+export interface OllamaRequest {
+  model: string;
+  prompt: string;
+  system?: string;
+  options?: {
+    temperature: number;
+    num_predict: number;
+  };
+  format?: 'json' | '';
+  stream: boolean;
+}
+
+export interface OllamaResponse {
+  response: string;
+  done: boolean;
+  total_duration: number;
+  eval_count: number;
+  prompt_eval_count: number;
+  model: string;
+}
+
+const TIMEOUT_BY_TIER: Record<ModelTier, number> = {
+  fast: 10_000,
+  medium: 30_000,
+  large: 120_000,
+};
+
+async function fetchOllama(req: OllamaRequest, timeoutMs: number): Promise<OllamaResponse> {
+  const baseUrl = getOllamaBaseUrl();
+  const url = `${baseUrl}/api/generate`;
+
+  const controller = new AbortController();
+  const timer = setTimeout(() => controller.abort(), timeoutMs);
+
+  try {
+    const response = await fetch(url, {
+      method: 'POST',
+      headers: { 'Content-Type': 'application/json' },
+      body: JSON.stringify(req),
+      signal: controller.signal,
+    });
+
+    if (!response.ok) {
+      const body = await response.text();
+      throw new Error(`Ollama HTTP ${response.status}: ${body}`);
+    }
+
+    const data = await response.json() as OllamaResponse;
+    return data;
+  } finally {
+    clearTimeout(timer);
+  }
+}
+
+function isTimeoutError(err: unknown): boolean {
+  if (err instanceof Error) {
+    return (
+      err.name === 'AbortError' ||
+      err.message.includes('timeout') ||
+      err.message.includes('abort') ||
+      err.message.includes('ETIMEDOUT')
+    );
+  }
+  return false;
+}
+
+export async function callOllama(
+  req: OllamaRequest,
+  tier: ModelTier = 'medium',
+  fallbackModels: string[] = [],
+): Promise<OllamaResponse> {
+  const timeoutMs = TIMEOUT_BY_TIER[tier];
+  const allModels = [req.model, ...fallbackModels.filter((m) => m !== req.model)];
+  const MAX_RETRIES = 2;
+
+  for (const model of allModels) {
+    const modelReq = { ...req, model };
+
+    const breaker = getBreaker(
+      model,
+      tier,
+      (r: OllamaRequest) => fetchOllama(r, timeoutMs),
+    );
+
+    let lastErr: unknown;
+
+    for (let attempt = 0; attempt < MAX_RETRIES; attempt++) {
+      try {
+        if (attempt > 0) {
+          logger.info({ model, attempt }, 'Retrying Ollama call after timeout');
+        }
+
+        const result = await breaker.fire(modelReq);
+        if (attempt > 0) {
+          logger.info({ model, attempt }, 'Ollama retry succeeded');
+        }
+        return result;
+      } catch (err) {
+        lastErr = err;
+
+        // Only retry on timeout errors
+        if (!isTimeoutError(err)) {
+          logger.error({ err, model }, 'Ollama non-timeout error, skipping retry');
+          break;
+        }
+
+        if (attempt < MAX_RETRIES - 1) {
+          logger.warn({ model, attempt }, 'Ollama timeout, retrying');
+        }
+      }
+    }
+
+    // Try next fallback model
+    logger.warn({ model, fallback: allModels[allModels.indexOf(model) + 1] }, 'Ollama model failed, trying fallback');
+    void lastErr; // captured for logging above
+  }
+
+  throw new Error(`All models failed: ${allModels.join(', ')}`);
+}
+
+export async function callOllamaWithFallbackChain(
+  req: OllamaRequest,
+  fallbackChain: string[],
+  tier: ModelTier,
+): Promise<OllamaResponse> {
+  const fallbacks = fallbackChain.filter((m) => m !== req.model);
+  return callOllama(req, tier, fallbacks);
+}
--- a/packages/gateway/src/pipeline/post-validator.ts
+++ b/packages/gateway/src/pipeline/post-validator.ts
@ -0,0 +1,217 @@
+import { validateSchema, type SchemaValidatorResult } from '../validation/schema-validator.js';
+import { checkBanlist, type BanlistResult, type BanViolation } from '../validation/banlist-checker.js';
+import { checkLanguage, type LanguageCheckResult } from '../validation/language-checker.js';
+import { validateTipContent, type TipValidationResult } from '../validation/tip-validator.js';
+import { checkFacts, type FactCheckResult } from '../validation/fact-checker.js';
+
+export interface ValidationResult {
+  validator: string;
+  passed: boolean;
+  score_impact: number;
+  details: Record<string, unknown>;
+}
+
+export interface PostValidationOutput {
+  all_passed: boolean;
+  total_score_impact: number;
+  results: ValidationResult[];
+  ban_violations: BanViolation[];
+  retry_requested: boolean;
+}
+
+export interface ValidatorConfig {
+  validators: string[];
+  language?: 'de' | 'en';
+  formality?: 'du' | 'Sie';
+  output_format?: string;
+  requires_fact_check?: boolean;
+  schema?: Record<string, unknown>;
+  min_length?: number;
+  max_length?: number;
+}
+
+function checkLength(
+  text: string,
+  minChars = 50,
+  maxChars = 20000,
+): ValidationResult {
+  const len = text.length;
+  if (len < minChars) {
+    return {
+      validator: 'length',
+      passed: false,
+      score_impact: -1.0,
+      details: { length: len, min: minChars, reason: 'Output too short' },
+    };
+  }
+  if (len > maxChars) {
+    return {
+      validator: 'length',
+      passed: false,
+      score_impact: -1.0,
+      details: { length: len, max: maxChars, reason: 'Output too long' },
+    };
+  }
+  return {
+    validator: 'length',
+    passed: true,
+    score_impact: 0,
+    details: { length: len },
+  };
+}
+
+function checkQuestionCloser(text: string): ValidationResult {
+  const QUESTION_CLOSER_PATTERNS = [
+    /what do you think\??/i,
+    /what are your thoughts\??/i,
+    /let me know in the comments/i,
+    /feel free to reach out/i,
+    /share your thoughts/i,
+    /i'd love to hear from you/i,
+    /follow for more/i,
+    /wie seht ihr das\??/i,
+    /was denkt ihr\??/i,
+    /schreibt .* in die kommentare/i,
+    /teilt .* gedanken/i,
+  ];
+
+  const trimmed = text.slice(-300); // Check last 300 chars
+  const found = QUESTION_CLOSER_PATTERNS.find((p) => p.test(trimmed));
+
+  if (found) {
+    return {
+      validator: 'question_closer',
+      passed: false,
+      score_impact: -1.5,
+      details: { reason: 'Output ends with engagement-bait question or call-to-action' },
+    };
+  }
+
+  return {
+    validator: 'question_closer',
+    passed: true,
+    score_impact: 0,
+    details: {},
+  };
+}
+
+export async function runPostValidation(
+  output: string,
+  config: ValidatorConfig,
+): Promise<PostValidationOutput> {
+  const results: ValidationResult[] = [];
+  const validatorSet = new Set(config.validators ?? []);
+  let banViolations: BanViolation[] = [];
+  let retryRequested = false;
+
+  // 1. Schema validator
+  if (validatorSet.has('schema')) {
+    const schemaResult: SchemaValidatorResult = validateSchema(
+      output,
+      config.schema,
+    );
+    results.push({
+      validator: 'schema',
+      passed: schemaResult.passed,
+      score_impact: schemaResult.score_impact,
+      details: { errors: schemaResult.errors },
+    });
+    if (schemaResult.retry) retryRequested = true;
+  }
+
+  // 2. Ban list checker
+  if (validatorSet.has('banlist')) {
+    const banResult: BanlistResult = checkBanlist(
+      output,
+      config.language ?? 'auto',
+    );
+    banViolations = banResult.violations;
+    results.push({
+      validator: 'banlist',
+      passed: banResult.passed,
+      score_impact: banResult.score_penalty,
+      details: {
+        violations: banResult.violations.map((v) => ({
+          term: v.term,
+          category: v.category,
+          language: v.language,
+        })),
+        count: banResult.violations.length,
+      },
+    });
+  }
+
+  // 3. Language checker
+  if (validatorSet.has('language')) {
+    const langResult: LanguageCheckResult = checkLanguage(
+      output,
+      config.language,
+      config.formality,
+    );
+    results.push({
+      validator: 'language',
+      passed: langResult.passed,
+      score_impact: langResult.score_impact,
+      details: {
+        detected: langResult.detected_language,
+        required: langResult.required_language,
+        formality_issue: langResult.formality_issue,
+        details: langResult.details,
+      },
+    });
+  }
+
+  // 4. TIP validator
+  if (validatorSet.has('tip_validator')) {
+    const tipResult: TipValidationResult = validateTipContent(
+      output,
+      config.output_format === 'json',
+    );
+    results.push({
+      validator: 'tip_validator',
+      passed: tipResult.passed,
+      score_impact: tipResult.score_impact,
+      details: {
+        errors: tipResult.errors,
+        immediate_reject: tipResult.immediate_reject,
+      },
+    });
+  }
+
+  // 5. Fact checker (async, with timeout)
+  if (validatorSet.has('fact_checker') && config.requires_fact_check) {
+    const factResult: FactCheckResult = await checkFacts(output, 5000);
+    results.push({
+      validator: 'fact_checker',
+      passed: factResult.passed,
+      score_impact: factResult.score_impact,
+      details: {
+        checks_performed: factResult.checks_performed,
+        failures: factResult.failures,
+      },
+    });
+  }
+
+  // 6. Length checker
+  if (validatorSet.has('length')) {
+    results.push(
+      checkLength(output, config.min_length ?? 50, config.max_length ?? 20000),
+    );
+  }
+
+  // 7. Question-closer detector
+  if (validatorSet.has('question_closer')) {
+    results.push(checkQuestionCloser(output));
+  }
+
+  const totalScoreImpact = results.reduce((sum, r) => sum + r.score_impact, 0);
+  const allPassed = results.every((r) => r.passed);
+
+  return {
+    all_passed: allPassed,
+    total_score_impact: totalScoreImpact,
+    results,
+    ban_violations: banViolations,
+    retry_requested: retryRequested,
+  };
+}
--- a/packages/gateway/src/pipeline/pre-classifier.ts
+++ b/packages/gateway/src/pipeline/pre-classifier.ts
@ -0,0 +1,81 @@
+import { callOllama } from './llm-client.js';
+import { logger } from '../observability/logger.js';
+
+export interface ClassificationResult {
+  task_type: string;
+  content_type: string;
+  language: 'de' | 'en' | 'other';
+  complexity: 'low' | 'medium' | 'high';
+  requires_facts: boolean;
+  suggested_task_types: string[];
+}
+
+const CLASSIFIER_MODEL = 'qwen2.5:3b';
+
+const SYSTEM_PROMPT = `You are a task classifier for an LLM routing gateway.
+Analyze the input and return ONLY valid JSON with this exact structure:
+{
+  "task_type": "string (e.g. tip_product_description, linkedin_post, generic_summarize)",
+  "content_type": "string (e.g. technical, marketing, analysis, conversation, structured_data)",
+  "language": "de|en|other",
+  "complexity": "low|medium|high",
+  "requires_facts": true|false,
+  "suggested_task_types": ["array", "of", "alternatives"]
+}
+
+Task types available:
+tip_product_description, tip_technical_summary, tip_competitor_analysis, tip_price_extraction,
+tip_market_analysis, tip_hype_cycle, tip_faq_generation, tip_vendor_profile, tip_blog_post, tip_spec_extraction,
+eo_member_summary, eo_meeting_notes, eo_chapter_report, eo_learning_recommendation, eo_forum_moderation,
+eo_event_agenda, eo_travel_brief,
+peercortex_asn_analysis, peercortex_routing_summary, peercortex_ix_report, peercortex_health_report, peercortex_rpki_analysis,
+switchblade_incident_summary, switchblade_config_review, switchblade_peering_recommendation,
+switchblade_blacklist_report, switchblade_rack_documentation, switchblade_csrd_report,
+switchblade_transceiver_advisor, switchblade_bgp_policy,
+nognet_event_description, nognet_sponsor_proposal, nognet_program_committee, nognet_recap_article,
+ctxevent_agenda_builder, ctxevent_attendee_communication,
+shieldx_threat_classification, shieldx_attack_analysis, shieldx_defense_recommendation,
+shieldx_pattern_extraction, shieldx_red_team_simulate,
+linkedin_post, linkedin_comment, linkedin_article,
+blog_post_de, blog_post_en, newsletter_section, social_media_thread, press_release,
+content_translation_de_en, content_translation_en_de,
+generic_summarize, generic_extract, generic_classify, generic_rewrite, generic_qa,
+code_review, code_generate, data_enrichment
+
+Return ONLY the JSON object, no other text.`;
+
+export async function classifyInput(input: string): Promise<ClassificationResult> {
+  const prompt = `Classify this input:\n\n${input.slice(0, 2000)}`;
+
+  try {
+    const response = await callOllama({
+      model: CLASSIFIER_MODEL,
+      prompt,
+      system: SYSTEM_PROMPT,
+      options: { temperature: 0.1, num_predict: 256 },
+      format: 'json',
+      stream: false,
+    });
+
+    const parsed = JSON.parse(response.response) as Partial<ClassificationResult>;
+
+    return {
+      task_type: parsed.task_type ?? 'generic_qa',
+      content_type: parsed.content_type ?? 'general',
+      language: (['de', 'en', 'other'].includes(parsed.language ?? '') ? parsed.language : 'en') as 'de' | 'en' | 'other',
+      complexity: (['low', 'medium', 'high'].includes(parsed.complexity ?? '') ? parsed.complexity : 'medium') as 'low' | 'medium' | 'high',
+      requires_facts: parsed.requires_facts ?? false,
+      suggested_task_types: Array.isArray(parsed.suggested_task_types) ? parsed.suggested_task_types : [],
+    };
+  } catch (err) {
+    logger.warn({ err }, 'Pre-classifier failed, using defaults');
+    return {
+      task_type: 'generic_qa',
+      content_type: 'general',
+      language: 'en',
+      complexity: 'medium',
+      requires_facts: false,
+      suggested_task_types: [],
+    };
+  }
+}
--- a/packages/gateway/src/pipeline/prompt-resolver.ts
+++ b/packages/gateway/src/pipeline/prompt-resolver.ts
@ -0,0 +1,180 @@
+import { readFileSync, watch, readdirSync } from 'fs';
+import { join, dirname } from 'path';
+import { fileURLToPath } from 'url';
+import yaml from 'js-yaml';
+import { logger } from '../observability/logger.js';
+
+const __dirname = dirname(fileURLToPath(import.meta.url));
+const TEMPLATES_DIR = join(__dirname, '../../prompts/templates');
+
+export interface PromptTemplate {
+  id: string;
+  version: string;
+  task_type: string;
+  system_prompt: string;
+  user_template: string;
+  system_prompt_de?: string;
+  user_template_de?: string;
+  few_shot_examples?: Array<{ user: string; assistant: string }>;
+  few_shot_examples_de?: Array<{ user: string; assistant: string }>;
+  output_schema?: Record<string, unknown>;
+  variables?: string[];
+}
+
+export interface AssembledPrompt {
+  system: string;
+  prompt: string;
+  prompt_id: string;
+  prompt_version: string;
+  schema?: Record<string, unknown>;
+}
+
+export interface PromptVariables {
+  input: string;
+  current_date?: string;
+  user_context?: Record<string, unknown>;
+  source_data?: string;
+  output_schema?: string;
+  banned_terms_de?: string;
+  banned_terms_en?: string;
+  sff8024_codes?: string;
+  known_vendors?: string;
+  few_shot_examples?: string;
+  [key: string]: unknown;
+}
+
+const templateCache = new Map<string, PromptTemplate>();
+
+function loadTemplate(filename: string): PromptTemplate | null {
+  const path = join(TEMPLATES_DIR, filename);
+  try {
+    const raw = readFileSync(path, 'utf-8');
+    const parsed = yaml.load(raw) as PromptTemplate;
+    if (!parsed.id) {
+      parsed.id = filename.replace('.yaml', '');
+    }
+    return parsed;
+  } catch (err) {
+    logger.warn({ err, filename }, 'Failed to load prompt template');
+    return null;
+  }
+}
+
+function initTemplates(): void {
+  try {
+    const files = readdirSync(TEMPLATES_DIR).filter((f) => f.endsWith('.yaml'));
+    for (const file of files) {
+      const tmpl = loadTemplate(file);
+      if (tmpl) {
+        templateCache.set(tmpl.id, tmpl);
+      }
+    }
+    logger.info({ count: templateCache.size }, 'Prompt templates loaded');
+  } catch {
+    logger.warn('Prompt templates directory not found — using fallback templates only');
+  }
+}
+
+function startWatcher(): void {
+  try {
+    watch(TEMPLATES_DIR, { recursive: false }, (_event, filename) => {
+      if (!filename?.endsWith('.yaml')) return;
+      const tmpl = loadTemplate(filename);
+      if (tmpl) {
+        templateCache.set(tmpl.id, tmpl);
+        logger.info({ id: tmpl.id }, 'Prompt template reloaded');
+      }
+    });
+  } catch {
+    // Templates dir not accessible — skip file watching
+  }
+}
+
+initTemplates();
+startWatcher();
+
+function replaceVariables(template: string, vars: PromptVariables): string {
+  let result = template;
+  for (const [key, value] of Object.entries(vars)) {
+    const placeholder = `{{${key}}}`;
+    if (value === undefined || value === null) {
+      result = result.replaceAll(placeholder, '');
+    } else if (typeof value === 'object') {
+      result = result.replaceAll(placeholder, JSON.stringify(value, null, 2));
+    } else {
+      result = result.replaceAll(placeholder, String(value));
+    }
+  }
+  // Remove unreplaced placeholders
+  result = result.replace(/\{\{[^}]+\}\}/g, '');
+  return result;
+}
+
+function buildFewShotExamples(
+  examples: Array<{ user: string; assistant: string }>,
+): string {
+  if (!examples.length) return '';
+  const parts = examples.map(
+    (ex, i) => `Example ${i + 1}:\nInput: ${ex.user}\nOutput: ${ex.assistant}`,
+  );
+  return `\n\n--- Examples ---\n${parts.join('\n\n')}\n--- End Examples ---\n`;
+}
+
+function getFallbackTemplate(taskType: string): PromptTemplate {
+  return {
+    id: taskType,
+    version: '1.0.0',
+    task_type: taskType,
+    system_prompt: `You are a helpful AI assistant. Complete the following task accurately and concisely. Task: ${taskType}. Return only the requested output without preamble or explanation.`,
+    user_template: '{{input}}',
+    few_shot_examples: [],
+  };
+}
+
+export function resolvePrompt(
+  taskType: string,
+  vars: PromptVariables,
+  language: 'de' | 'en' = 'en',
+): AssembledPrompt {
+  const template = templateCache.get(taskType) ?? getFallbackTemplate(taskType);
+
+  const useGerman = language === 'de' && Boolean(template.system_prompt_de);
+  const systemRaw = useGerman
+    ? (template.system_prompt_de ?? template.system_prompt)
+    : template.system_prompt;
+  const userRaw = useGerman
+    ? (template.user_template_de ?? template.user_template)
+    : template.user_template;
+
+  const examples = (useGerman
+    ? (template.few_shot_examples_de ?? template.few_shot_examples ?? [])
+    : (template.few_shot_examples ?? []));
+
+  const enrichedVars: PromptVariables = {
+    ...vars,
+    current_date: new Date().toISOString().split('T')[0] ?? '',
+    few_shot_examples: buildFewShotExamples(examples),
+    output_schema: template.output_schema
+      ? JSON.stringify(template.output_schema, null, 2)
+      : '',
+  };
+
+  const systemPrompt = replaceVariables(systemRaw, enrichedVars);
+  const userPrompt = replaceVariables(userRaw, enrichedVars);
+
+  return {
+    system: systemPrompt,
+    prompt: userPrompt,
+    prompt_id: template.id,
+    prompt_version: template.version ?? '1.0.0',
+    schema: template.output_schema,
+  };
+}
+
+export function getTemplate(taskType: string): PromptTemplate | undefined {
+  return templateCache.get(taskType);
+}
+
+export function listTemplates(): string[] {
+  return [...templateCache.keys()];
+}
--- a/packages/gateway/src/pipeline/router.ts
+++ b/packages/gateway/src/pipeline/router.ts
@ -0,0 +1,173 @@
+import { readFileSync } from 'fs';
+import { join, dirname } from 'path';
+import { fileURLToPath } from 'url';
+import yaml from 'js-yaml';
+import { logger } from '../observability/logger.js';
+
+const __dirname = dirname(fileURLToPath(import.meta.url));
+const CONFIG_DIR = join(__dirname, '../config');
+
+export interface RoutingRule {
+  model: string;
+  tier: 'fast' | 'medium' | 'large';
+  prompt_template: string;
+  temperature: number;
+  max_tokens: number;
+  output_format: 'text' | 'json';
+  requires_fact_check: boolean;
+  validators: string[];
+  callers: string[];
+}
+
+export interface ModelConfig {
+  tier: 'fast' | 'medium' | 'large';
+  context_length: number;
+  strengths: string[];
+  max_tokens_default: number;
+}
+
+export interface ModelsYaml {
+  ollama_base_url: string;
+  tiers: Record<string, { timeout_ms: number; error_threshold_percent: number; circuit_breaker_reset_ms: number }>;
+  models: Record<string, ModelConfig>;
+  fallback_chains: Record<string, string[]>;
+  tier_fallback: Record<string, string | null>;
+}
+
+export interface RoutingRulesYaml {
+  routing_rules: Record<string, RoutingRule>;
+  validators: Record<string, Record<string, unknown>>;
+}
+
+export interface RouterDecision {
+  model: string;
+  fallback_chain: string[];
+  tier: 'fast' | 'medium' | 'large';
+  prompt_template: string;
+  temperature: number;
+  max_tokens: number;
+  output_format: 'text' | 'json';
+  requires_fact_check: boolean;
+  validators: string[];
+  ollama_base_url: string;
+  timeout_ms: number;
+}
+
+let modelsConfig: ModelsYaml | null = null;
+let routingConfig: RoutingRulesYaml | null = null;
+
+function loadModels(): ModelsYaml {
+  if (modelsConfig) return modelsConfig;
+  try {
+    const raw = readFileSync(join(CONFIG_DIR, 'models.yaml'), 'utf-8');
+    modelsConfig = yaml.load(raw) as ModelsYaml;
+    return modelsConfig;
+  } catch (err) {
+    logger.error({ err }, 'Failed to load models.yaml');
+    throw new Error('Could not load models configuration');
+  }
+}
+
+function loadRoutingRules(): RoutingRulesYaml {
+  if (routingConfig) return routingConfig;
+  try {
+    const raw = readFileSync(join(CONFIG_DIR, 'routing-rules.yaml'), 'utf-8');
+    routingConfig = yaml.load(raw) as RoutingRulesYaml;
+    return routingConfig;
+  } catch (err) {
+    logger.error({ err }, 'Failed to load routing-rules.yaml');
+    throw new Error('Could not load routing rules configuration');
+  }
+}
+
+export function reloadConfigs(): void {
+  modelsConfig = null;
+  routingConfig = null;
+  loadModels();
+  loadRoutingRules();
+}
+
+function isCallerAllowed(rule: RoutingRule, caller: string): boolean {
+  return rule.callers.includes('all') || rule.callers.includes(caller);
+}
+
+function buildFallbackChain(
+  primaryModel: string,
+  tier: string,
+  models: ModelsYaml,
+): string[] {
+  const chain = models.fallback_chains[tier] ?? [];
+  // Put primary first, then other fallbacks excluding primary
+  return [primaryModel, ...chain.filter((m) => m !== primaryModel)];
+}
+
+export function route(
+  taskType: string,
+  caller: string,
+  overrides?: {
+    model?: string;
+    temperature?: number;
+    max_tokens?: number;
+  },
+): RouterDecision {
+  const models = loadModels();
+  const rules = loadRoutingRules();
+
+  const rule = rules.routing_rules[taskType];
+  if (!rule) {
+    // Fall back to generic_qa
+    const fallbackRule = rules.routing_rules['generic_qa'];
+    if (!fallbackRule) {
+      throw new Error(`No routing rule for task_type: ${taskType}`);
+    }
+    logger.warn({ taskType, caller }, 'Unknown task_type, falling back to generic_qa');
+    return buildDecision('generic_qa', fallbackRule, caller, models, overrides);
+  }
+
+  if (!isCallerAllowed(rule, caller)) {
+    throw new Error(`Caller "${caller}" is not allowed to use task_type "${taskType}"`);
+  }
+
+  return buildDecision(taskType, rule, caller, models, overrides);
+}
+
+function buildDecision(
+  _taskType: string,
+  rule: RoutingRule,
+  _caller: string,
+  models: ModelsYaml,
+  overrides?: { model?: string; temperature?: number; max_tokens?: number },
+): RouterDecision {
+  const selectedModel = overrides?.model ?? rule.model;
+  const tier = rule.tier;
+  const tierConfig = models.tiers[tier];
+
+  if (!tierConfig) {
+    throw new Error(`Unknown model tier: ${tier}`);
+  }
+
+  return {
+    model: selectedModel,
+    fallback_chain: buildFallbackChain(selectedModel, tier, models),
+    tier,
+    prompt_template: rule.prompt_template,
+    temperature: overrides?.temperature ?? rule.temperature,
+    max_tokens: overrides?.max_tokens ?? rule.max_tokens,
+    output_format: rule.output_format,
+    requires_fact_check: rule.requires_fact_check,
+    validators: rule.validators,
+    ollama_base_url: models.ollama_base_url,
+    timeout_ms: tierConfig.timeout_ms,
+  };
+}
+
+export function getModelTier(model: string): 'fast' | 'medium' | 'large' {
+  const models = loadModels();
+  const config = models.models[model];
+  return config?.tier ?? 'medium';
+}
+
+export function getOllamaBaseUrl(): string {
+  const models = loadModels();
+  return models.ollama_base_url;
+}
--- a/packages/gateway/src/queue/pg-boss-client.ts
+++ b/packages/gateway/src/queue/pg-boss-client.ts
@ -0,0 +1,188 @@
+import PgBoss from 'pg-boss';
+import { logger } from '../observability/logger.js';
+
+const QUEUE_NAME = 'llm-batch';
+const CONCURRENCY = 4;
+const MAX_RETRIES = 3;
+
+let boss: PgBoss | null = null;
+
+interface BatchJobData {
+  caller: string;
+  tasks: Array<{
+    task_type: string;
+    input: string;
+    language?: 'de' | 'en';
+    context?: Record<string, unknown>;
+  }>;
+  webhook_url?: string;
+  batch_db_id: string;
+}
+
+interface TaskResult {
+  task_type: string;
+  status: 'approved' | 'warning' | 'pending_review' | 'rejected' | 'error';
+  output?: string;
+  confidence?: number;
+  error?: string;
+}
+
+export async function initPgBoss(): Promise<void> {
+  if (boss) return;
+
+  const connectionString =
+    process.env['DATABASE_URL'] ??
+    `postgresql://${process.env['DB_USER'] ?? 'llm_gateway'}:${process.env['DB_PASSWORD'] ?? ''}@${process.env['DB_HOST'] ?? 'localhost'}:${process.env['DB_PORT'] ?? '5432'}/${process.env['DB_NAME'] ?? 'llm_gateway'}`;
+
+  boss = new PgBoss({
+    connectionString,
+    max: 5,
+    retryLimit: MAX_RETRIES,
+    retryDelay: 30,
+    retryBackoff: true,
+    deleteAfterDays: 7,
+    archiveCompletedAfterSeconds: 3600,
+  });
+
+  boss.on('error', (err) => {
+    logger.error({ err }, 'pg-boss error');
+  });
+
+  await boss.start();
+  await boss.createQueue(QUEUE_NAME, { name: QUEUE_NAME,
+    retryLimit: MAX_RETRIES,
+    retryBackoff: true,
+  });
+
+  await (boss as unknown as { work: Function }).work(
+    QUEUE_NAME,
+    { concurrency: CONCURRENCY },
+    processJob,
+  );
+
+  logger.info({ queue: QUEUE_NAME, concurrency: CONCURRENCY }, 'pg-boss initialized');
+}
+
+async function processJob(job: PgBoss.Job<BatchJobData>): Promise<void> {
+  const { caller, tasks, webhook_url, batch_db_id } = job.data;
+  logger.info({ jobId: job.id, caller, taskCount: tasks.length }, 'Processing batch job');
+
+  const results: TaskResult[] = [];
+  const GATEWAY_URL = `http://localhost:${process.env['PORT'] ?? '3100'}`;
+
+  for (const task of tasks) {
+    try {
+      const response = await fetch(`${GATEWAY_URL}/v1/completion`, {
+        method: 'POST',
+        headers: {
+          'Content-Type': 'application/json',
+          'X-Caller-ID': caller,
+        },
+        body: JSON.stringify({
+          caller,
+          task_type: task.task_type,
+          input: task.input,
+          language: task.language,
+          context: task.context,
+        }),
+      });
+
+      if (!response.ok) {
+        const errorBody = await response.text();
+        results.push({
+          task_type: task.task_type,
+          status: 'error',
+          error: `HTTP ${response.status}: ${errorBody}`,
+        });
+        continue;
+      }
+
+      const result = await response.json() as {
+        status: 'approved' | 'warning' | 'pending_review' | 'rejected';
+        output: string;
+        confidence: number;
+      };
+
+      results.push({
+        task_type: task.task_type,
+        status: result.status,
+        output: result.output,
+        confidence: result.confidence,
+      });
+    } catch (err) {
+      results.push({
+        task_type: task.task_type,
+        status: 'error',
+        error: err instanceof Error ? err.message : 'Unknown error',
+      });
+    }
+  }
+
+  // Update batch job in DB
+  if (batch_db_id) {
+    const { query } = await import('../db/client.js');
+    const completed = results.filter((r) => r.status !== 'error').length;
+    const failed = results.filter((r) => r.status === 'error').length;
+
+    await query(
+      `UPDATE batch_jobs
+       SET completed_at = NOW(), status = 'completed', results = $1,
+           completed_count = $2, failed_count = $3
+       WHERE id = $4`,
+      [JSON.stringify(results), completed, failed, batch_db_id],
+    ).catch((err) => logger.warn({ err }, 'Failed to update batch job'));
+  }
+
+  // Deliver to webhook
+  if (webhook_url) {
+    try {
+      await fetch(webhook_url, {
+        method: 'POST',
+        headers: { 'Content-Type': 'application/json' },
+        body: JSON.stringify({
+          batch_id: batch_db_id,
+          caller,
+          completed_at: new Date().toISOString(),
+          results,
+        }),
+      });
+      logger.info({ webhook_url, batch_db_id }, 'Batch webhook delivered');
+    } catch (err) {
+      logger.error({ err, webhook_url }, 'Failed to deliver batch webhook');
+    }
+  }
+}
+
+export async function submitBatchJob(
+  caller: string,
+  tasks: BatchJobData['tasks'],
+  webhookUrl?: string,
+  batchDbId?: string,
+  priority = 0,
+): Promise<string | null> {
+  if (!boss) {
+    throw new Error('pg-boss not initialized');
+  }
+
+  const jobId = await boss.send(
+    QUEUE_NAME,
+    {
+      caller,
+      tasks,
+      webhook_url: webhookUrl,
+      batch_db_id: batchDbId ?? '',
+    } satisfies BatchJobData,
+    {
+      priority,
+      retryLimit: MAX_RETRIES,
+      retryBackoff: true,
+      expireInSeconds: 3600,
+    },
+  );
+
+  return jobId;
+}
+
+export function getPgBoss(): PgBoss | null {
+  return boss;
+}
--- a/packages/gateway/src/routes/batch.ts
+++ b/packages/gateway/src/routes/batch.ts
@ -0,0 +1,139 @@
+import type { FastifyInstance, FastifyRequest, FastifyReply } from 'fastify';
+import { z } from 'zod';
+import { submitBatchJob } from '../queue/pg-boss-client.js';
+import { query } from '../db/client.js';
+import { logger } from '../observability/logger.js';
+
+const BatchTaskSchema = z.object({
+  task_type: z.string().min(1),
+  input: z.string().min(1).max(50_000),
+  language: z.enum(['de', 'en']).optional(),
+  context: z.record(z.unknown()).optional(),
+});
+
+const BatchRequestSchema = z.object({
+  caller: z.string().min(1).max(100),
+  tasks: z.array(BatchTaskSchema).min(1).max(100),
+  webhook_url: z.string().url().optional(),
+  priority: z.number().int().min(0).max(10).optional().default(0),
+});
+
+type BatchRequest = z.infer<typeof BatchRequestSchema>;
+
+export async function batchRoute(fastify: FastifyInstance): Promise<void> {
+  fastify.post(
+    '/batch',
+    async (request: FastifyRequest, reply: FastifyReply) => {
+      let body: BatchRequest;
+      try {
+        body = BatchRequestSchema.parse(request.body);
+      } catch (err) {
+        return reply.status(400).send({
+          statusCode: 400,
+          error: 'Bad Request',
+          message: err instanceof z.ZodError ? err.errors[0]?.message : 'Invalid request body',
+        });
+      }
+
+      const { caller, tasks, webhook_url, priority } = body;
+
+      // Insert batch job record
+      let batchDbId = '';
+      try {
+        const result = await query<{ id: string }>(
+          `INSERT INTO batch_jobs (caller, task_count, webhook_url, status, pg_boss_id)
+           VALUES ($1, $2, $3, 'queued', '')
+           RETURNING id`,
+          [caller, tasks.length, webhook_url ?? null],
+        );
+        batchDbId = result.rows[0]?.id ?? '';
+      } catch (err) {
+        logger.error({ err, caller }, 'Failed to create batch job record');
+        return reply.status(500).send({ statusCode: 500, error: 'Internal Error', message: 'Failed to create batch job' });
+      }
+
+      // Submit to pg-boss queue
+      let pgBossId: string | null = null;
+      try {
+        pgBossId = await submitBatchJob(
+          caller,
+          tasks.map((t) => ({
+            task_type: t.task_type,
+            input: t.input,
+            language: t.language,
+            context: t.context,
+          })),
+          webhook_url,
+          batchDbId,
+          priority,
+        );
+      } catch (err) {
+        logger.error({ err, caller, batchDbId }, 'Failed to submit batch job to queue');
+        await query(
+          `UPDATE batch_jobs SET status = 'failed' WHERE id = $1`,
+          [batchDbId],
+        ).catch(() => {});
+        return reply.status(500).send({ statusCode: 500, error: 'Queue Error', message: 'Failed to enqueue batch job' });
+      }
+
+      // Update with pg-boss ID
+      if (pgBossId) {
+        await query(
+          `UPDATE batch_jobs SET pg_boss_id = $1 WHERE id = $2`,
+          [pgBossId, batchDbId],
+        ).catch((err) => logger.warn({ err }, 'Failed to update pg_boss_id'));
+      }
+
+      logger.info({ batchDbId, pgBossId, caller, taskCount: tasks.length }, 'Batch job submitted');
+
+      return reply.status(202).send({
+        batch_id: batchDbId,
+        pg_boss_id: pgBossId,
+        status: 'queued',
+        task_count: tasks.length,
+        caller,
+        webhook_url: webhook_url ?? null,
+        estimated_completion_ms: tasks.length * 5000, // rough estimate
+        check_status_url: `/v1/batch/${batchDbId}`,
+      });
+    },
+  );
+
+  // GET batch status
+  fastify.get(
+    '/batch/:id',
+    async (request: FastifyRequest<{ Params: { id: string } }>, reply: FastifyReply) => {
+      const { id } = request.params;
+
+      try {
+        const result = await query<{
+          id: string;
+          created_at: string;
+          completed_at: string | null;
+          caller: string;
+          task_count: number;
+          completed_count: number;
+          failed_count: number;
+          webhook_url: string | null;
+          status: string;
+          results: unknown;
+        }>(
+          `SELECT id, created_at, completed_at, caller, task_count, completed_count,
+                  failed_count, webhook_url, status, results
+           FROM batch_jobs WHERE id = $1`,
+          [id],
+        );
+
+        const job = result.rows[0];
+        if (!job) {
+          return reply.status(404).send({ statusCode: 404, error: 'Not Found', message: 'Batch job not found' });
+        }
+
+        return reply.send(job);
+      } catch (err) {
+        logger.error({ err, id }, 'Failed to fetch batch job');
+        return reply.status(500).send({ statusCode: 500, error: 'Internal Error', message: 'Failed to fetch batch job' });
+      }
+    },
+  );
+}
--- a/packages/gateway/src/routes/classify.ts
+++ b/packages/gateway/src/routes/classify.ts
@ -0,0 +1,38 @@
+import type { FastifyInstance, FastifyRequest, FastifyReply } from 'fastify';
+import { z } from 'zod';
+import { classifyInput } from '../pipeline/pre-classifier.js';
+
+const ClassifyRequestSchema = z.object({
+  input: z.string().min(1).max(10_000),
+  caller: z.string().min(1).max(100).optional().default('internal'),
+});
+
+type ClassifyRequest = z.infer<typeof ClassifyRequestSchema>;
+
+export async function classifyRoute(fastify: FastifyInstance): Promise<void> {
+  fastify.post(
+    '/classify',
+    async (request: FastifyRequest, reply: FastifyReply) => {
+      let body: ClassifyRequest;
+      try {
+        body = ClassifyRequestSchema.parse(request.body);
+      } catch (err) {
+        return reply.status(400).send({
+          statusCode: 400,
+          error: 'Bad Request',
+          message: err instanceof z.ZodError ? err.errors[0]?.message : 'Invalid request body',
+        });
+      }
+
+      const startMs = Date.now();
+      const result = await classifyInput(body.input);
+      const latencyMs = Date.now() - startMs;
+
+      return reply.send({
+        ...result,
+        latency_ms: latencyMs,
+        model_used: 'qwen2.5:3b',
+      });
+    },
+  );
+}
--- a/packages/gateway/src/routes/completion.ts
+++ b/packages/gateway/src/routes/completion.ts
@ -0,0 +1,301 @@
+import type { FastifyInstance, FastifyRequest, FastifyReply } from 'fastify';
+import { z } from 'zod';
+import { classifyInput } from '../pipeline/pre-classifier.js';
+import { route } from '../pipeline/router.js';
+import { resolvePrompt } from '../pipeline/prompt-resolver.js';
+import { callOllamaWithFallbackChain } from '../pipeline/llm-client.js';
+import { runPostValidation } from '../pipeline/post-validator.js';
+import { evaluateConfidence } from '../pipeline/confidence-gate.js';
+import { writeAuditLog, writeBanAnalytics, hashText } from '../observability/audit-log.js';
+import { addToReviewQueue } from '../observability/review-queue.js';
+import {
+  requestsTotal,
+  latencySeconds,
+  tokensTotal,
+  confidenceScore,
+  banlistHitsTotal,
+  validationFailuresTotal,
+} from '../observability/metrics.js';
+import { logger } from '../observability/logger.js';
+
+const CompletionRequestSchema = z.object({
+  caller: z.string().min(1).max(100),
+  task_type: z.string().optional(),
+  input: z.string().min(1).max(50_000),
+  language: z.enum(['de', 'en']).optional(),
+  context: z.record(z.unknown()).optional(),
+  options: z
+    .object({
+      model: z.string().optional(),
+      temperature: z.number().min(0).max(2).optional(),
+      max_tokens: z.number().int().positive().max(16_384).optional(),
+      return_validation_details: z.boolean().optional(),
+    })
+    .optional(),
+});
+
+type CompletionRequest = z.infer<typeof CompletionRequestSchema>;
+
+const SKIP_SHIELDX_CALLERS = new Set(['internal', 'shieldx']);
+
+async function runShieldXScan(input: string, caller: string): Promise<{ passed: boolean; reason?: string }> {
+  const GATEWAY_URL = `http://localhost:${process.env['PORT'] ?? '3100'}`;
+  try {
+    const response = await fetch(`${GATEWAY_URL}/v1/completion`, {
+      method: 'POST',
+      headers: { 'Content-Type': 'application/json', 'X-Caller-ID': 'internal' },
+      body: JSON.stringify({
+        caller: 'internal',
+        task_type: 'shieldx_threat_classification',
+        input,
+        options: { return_validation_details: false },
+      }),
+      signal: AbortSignal.timeout(8000),
+    });
+
+    if (!response.ok) return { passed: true }; // Fail open if ShieldX is down
+
+    const result = await response.json() as { output?: string; status?: string };
+    if (result.status !== 'approved' || !result.output) return { passed: true };
+
+    type ShieldResult = { threat_detected: boolean; threat_type?: string; confidence?: number };
+    let parsed: ShieldResult;
+    try {
+      parsed = JSON.parse(result.output) as ShieldResult;
+    } catch {
+      return { passed: true };
+    }
+
+    if (parsed.threat_detected && (parsed.confidence ?? 0) > 0.8) {
+      logger.warn({ caller, threat_type: parsed.threat_type }, 'ShieldX threat detected');
+      return { passed: false, reason: `Threat detected: ${parsed.threat_type ?? 'unknown'}` };
+    }
+
+    return { passed: true };
+  } catch (err) {
+    // ShieldX unavailable — fail open (log but continue)
+    logger.warn({ err, caller }, 'ShieldX scan failed, continuing without scan');
+    return { passed: true };
+  }
+}
+
+export async function completionRoute(fastify: FastifyInstance): Promise<void> {
+  fastify.post(
+    '/completion',
+    {
+      config: { rateLimit: false }, // Custom rate limiting via caller
+    },
+    async (request: FastifyRequest, reply: FastifyReply) => {
+      const startMs = Date.now();
+
+      let body: CompletionRequest;
+      try {
+        body = CompletionRequestSchema.parse(request.body);
+      } catch (err) {
+        return reply.status(400).send({
+          statusCode: 400,
+          error: 'Bad Request',
+          message: err instanceof z.ZodError ? err.errors[0]?.message ?? 'Invalid request' : 'Invalid request body',
+        });
+      }
+
+      const { caller, input, language, context, options } = body;
+      const returnValidationDetails = options?.return_validation_details ?? false;
+
+      // Stage 2: ShieldX scan
+      if (!SKIP_SHIELDX_CALLERS.has(caller)) {
+        const shieldResult = await runShieldXScan(input, caller);
+        if (!shieldResult.passed) {
+          requestsTotal.labels({ caller, task_type: 'unknown', status: 'rejected' }).inc();
+          return reply.status(400).send({
+            statusCode: 400,
+            error: 'Rejected',
+            message: shieldResult.reason ?? 'Input rejected by security scan',
+          });
+        }
+      }
+
+      // Stage 3: Pre-classifier
+      let taskType = body.task_type;
+      let classificationResult;
+      if (!taskType) {
+        try {
+          classificationResult = await classifyInput(input);
+          taskType = classificationResult.task_type;
+        } catch (err) {
+          logger.warn({ err }, 'Pre-classifier failed');
+          taskType = 'generic_qa';
+        }
+      }
+
+      // Stage 4: Router
+      let decision;
+      try {
+        decision = route(taskType, caller, {
+          model: options?.model,
+          temperature: options?.temperature,
+          max_tokens: options?.max_tokens,
+        });
+      } catch (err) {
+        return reply.status(400).send({
+          statusCode: 400,
+          error: 'Routing Error',
+          message: err instanceof Error ? err.message : 'Failed to route request',
+        });
+      }
+
+      // Stage 5: Prompt assembly
+      const resolved = resolvePrompt(
+        decision.prompt_template,
+        {
+          input,
+          user_context: context,
+          source_data: context?.['source_data'] as string | undefined,
+        },
+        language ?? 'en',
+      );
+
+      // Stage 6: Ollama call with circuit breaker + retry
+      let ollamaResponse;
+      try {
+        ollamaResponse = await callOllamaWithFallbackChain(
+          {
+            model: decision.model,
+            prompt: resolved.prompt,
+            system: resolved.system,
+            options: {
+              temperature: decision.temperature,
+              num_predict: decision.max_tokens,
+            },
+            format: decision.output_format === 'json' ? 'json' : '',
+            stream: false,
+          },
+          decision.fallback_chain,
+          decision.tier,
+        );
+      } catch (err) {
+        const latency = Date.now() - startMs;
+        logger.error({ err, caller, taskType }, 'Ollama call failed');
+        requestsTotal.labels({ caller, task_type: taskType, status: 'rejected' }).inc();
+        latencySeconds.labels({ caller, task_type: taskType, model: decision.model }).observe(latency / 1000);
+
+        return reply.status(503).send({
+          statusCode: 503,
+          error: 'Service Unavailable',
+          message: 'LLM service unavailable, please retry',
+        });
+      }
+
+      const outputText = ollamaResponse.response;
+      const latencyMs = Date.now() - startMs;
+
+      // Stage 7: Post-validation chain
+      const validationOutput = await runPostValidation(outputText, {
+        validators: decision.validators,
+        language,
+        output_format: decision.output_format,
+        requires_fact_check: decision.requires_fact_check,
+        schema: resolved.schema,
+      });
+
+      // Stage 8: Confidence gate
+      const confidenceResult = evaluateConfidence(validationOutput);
+
+      // Record metrics
+      requestsTotal.labels({ caller, task_type: taskType, status: confidenceResult.status }).inc();
+      latencySeconds.labels({ caller, task_type: taskType, model: ollamaResponse.model ?? decision.model }).observe(latencyMs / 1000);
+      tokensTotal.labels({ direction: 'in', model: decision.model }).inc(ollamaResponse.prompt_eval_count ?? 0);
+      tokensTotal.labels({ direction: 'out', model: decision.model }).inc(ollamaResponse.eval_count ?? 0);
+      confidenceScore.labels({ task_type: taskType, model: decision.model }).observe(confidenceResult.score);
+
+      // Record ban hits in metrics
+      for (const violation of validationOutput.ban_violations) {
+        banlistHitsTotal.labels({ term: violation.term, language: violation.language, category: violation.category }).inc();
+      }
+
+      // Record validation failures
+      for (const result of validationOutput.results) {
+        if (!result.passed) {
+          validationFailuresTotal.labels({ validator: result.validator, task_type: taskType }).inc();
+        }
+      }
+
+      // Stage 9: Audit log
+      const inputHash = hashText(input);
+      const outputHash = hashText(outputText);
+
+      const callId = await writeAuditLog({
+        caller,
+        task_type: taskType,
+        model_used: decision.model,
+        prompt_id: resolved.prompt_id,
+        prompt_version: resolved.prompt_version,
+        input_hash: inputHash,
+        output_text: confidenceResult.status !== 'pending_review' ? outputText : undefined,
+        output_hash: outputHash,
+        token_count_in: ollamaResponse.prompt_eval_count ?? 0,
+        token_count_out: ollamaResponse.eval_count ?? 0,
+        latency_ms: latencyMs,
+        confidence: confidenceResult.score,
+        status: confidenceResult.status,
+        validation_log: validationOutput.results,
+        ban_hits: validationOutput.ban_violations,
+        metadata: {
+          classification: classificationResult,
+          model_tier: decision.tier,
+          fallback_used: ollamaResponse.model !== decision.model,
+        },
+      });
+
+      // Write ban analytics
+      if (validationOutput.ban_violations.length > 0 && callId) {
+        void writeBanAnalytics(callId, validationOutput.ban_violations, caller, taskType);
+      }
+
+      // Add to review queue if pending_review
+      if (confidenceResult.status === 'pending_review' && callId) {
+        void addToReviewQueue({
+          callId,
+          caller,
+          taskType,
+          inputText: input,
+          outputText,
+          confidence: confidenceResult.score,
+          validationLog: validationOutput.results,
+        });
+      }
+
+      // Stage 10: Response
+      const responseBody: Record<string, unknown> = {
+        id: callId,
+        status: confidenceResult.status,
+        confidence: Math.round(confidenceResult.score * 100) / 100,
+        model: decision.model,
+        task_type: taskType,
+        latency_ms: latencyMs,
+        tokens: {
+          in: ollamaResponse.prompt_eval_count ?? 0,
+          out: ollamaResponse.eval_count ?? 0,
+        },
+      };
+
+      if (confidenceResult.status !== 'pending_review') {
+        responseBody['output'] = outputText;
+      } else {
+        responseBody['output'] = null;
+        responseBody['message'] = 'Output is pending human review due to low confidence';
+      }
+
+      if (returnValidationDetails) {
+        responseBody['validation'] = validationOutput.results;
+        responseBody['confidence_detail'] = {
+          base_score: confidenceResult.base_score,
+          total_impact: confidenceResult.total_impact,
+          final_score: confidenceResult.score,
+        };
+      }
+
+      return reply.status(200).send(responseBody);
+    },
+  );
+}
--- a/packages/gateway/src/routes/health.ts
+++ b/packages/gateway/src/routes/health.ts
@ -0,0 +1,131 @@
+import type { FastifyInstance, FastifyRequest, FastifyReply } from 'fastify';
+import { getOllamaBaseUrl } from '../pipeline/router.js';
+import { getAllBreakerStates } from '../circuit-breaker/ollama-breaker.js';
+import { query } from '../db/client.js';
+import { getPgBoss } from '../queue/pg-boss-client.js';
+import { logger } from '../observability/logger.js';
+
+interface HealthStatus {
+  status: 'ok' | 'degraded' | 'down';
+  timestamp: string;
+  checks: {
+    ollama: { status: 'ok' | 'down'; latency_ms?: number; error?: string };
+    database: { status: 'ok' | 'down'; error?: string };
+    queue: { status: 'ok' | 'down' | 'unknown'; depth?: number; error?: string };
+    review_queue: { unreviewed_count: number };
+    circuit_breakers: Record<string, 'closed' | 'open' | 'half-open'>;
+  };
+}
+
+async function checkOllama(baseUrl: string): Promise<{ status: 'ok' | 'down'; latency_ms?: number; error?: string }> {
+  const start = Date.now();
+  try {
+    const response = await fetch(`${baseUrl}/api/tags`, {
+      signal: AbortSignal.timeout(5000),
+    });
+    const latency_ms = Date.now() - start;
+    if (!response.ok) {
+      return { status: 'down', error: `HTTP ${response.status}`, latency_ms };
+    }
+    return { status: 'ok', latency_ms };
+  } catch (err) {
+    return { status: 'down', error: err instanceof Error ? err.message : 'Unknown error' };
+  }
+}
+
+async function checkDatabase(): Promise<{ status: 'ok' | 'down'; error?: string }> {
+  try {
+    await query('SELECT 1');
+    return { status: 'ok' };
+  } catch (err) {
+    return { status: 'down', error: err instanceof Error ? err.message : 'Unknown error' };
+  }
+}
+
+async function checkQueue(): Promise<{ status: 'ok' | 'down' | 'unknown'; depth?: number; error?: string }> {
+  const boss = getPgBoss();
+  if (!boss) return { status: 'unknown' };
+
+  try {
+    const [queued, active] = await Promise.all([
+      boss.getQueueSize('llm-batch', { before: 'completed' }),
+      boss.getQueueSize('llm-batch', { before: 'active' }),
+    ]);
+    return { status: 'ok', depth: (queued ?? 0) + (active ?? 0) };
+  } catch (err) {
+    return { status: 'down', error: err instanceof Error ? err.message : 'Unknown error' };
+  }
+}
+
+async function getReviewQueueCount(): Promise<number> {
+  try {
+    const result = await query<{ count: string }>(
+      'SELECT COUNT(*) as count FROM review_queue WHERE decision IS NULL',
+    );
+    return parseInt(result.rows[0]?.count ?? '0', 10);
+  } catch {
+    return 0;
+  }
+}
+
+export async function healthRoute(fastify: FastifyInstance): Promise<void> {
+  fastify.get(
+    '/health',
+    async (_request: FastifyRequest, reply: FastifyReply) => {
+      const ollamaBaseUrl = getOllamaBaseUrl();
+
+      const [ollamaCheck, dbCheck, queueCheck, reviewCount] = await Promise.all([
+        checkOllama(ollamaBaseUrl),
+        checkDatabase(),
+        checkQueue(),
+        getReviewQueueCount(),
+      ]);
+
+      const breakerStates = getAllBreakerStates();
+
+      const isDown = ollamaCheck.status === 'down' || dbCheck.status === 'down';
+      const isDegraded = queueCheck.status === 'down' || Object.values(breakerStates).some((s) => s === 'open');
+
+      const status: HealthStatus['status'] = isDown ? 'down' : isDegraded ? 'degraded' : 'ok';
+
+      const health: HealthStatus = {
+        status,
+        timestamp: new Date().toISOString(),
+        checks: {
+          ollama: ollamaCheck,
+          database: dbCheck,
+          queue: queueCheck,
+          review_queue: { unreviewed_count: reviewCount },
+          circuit_breakers: breakerStates,
+        },
+      };
+
+      const statusCode = isDown ? 503 : 200;
+      if (status !== 'ok') {
+        logger.warn({ status, checks: health.checks }, 'Health check degraded');
+      }
+
+      return reply.status(statusCode).send(health);
+    },
+  );
+
+  // Kubernetes-style liveness probe (minimal check)
+  fastify.get(
+    '/health/live',
+    async (_request: FastifyRequest, reply: FastifyReply) => {
+      return reply.send({ status: 'alive', ts: Date.now() });
+    },
+  );
+
+  // Kubernetes-style readiness probe
+  fastify.get(
+    '/health/ready',
+    async (_request: FastifyRequest, reply: FastifyReply) => {
+      const dbCheck = await checkDatabase();
+      if (dbCheck.status === 'down') {
+        return reply.status(503).send({ status: 'not ready', reason: 'database unavailable' });
+      }
+      return reply.send({ status: 'ready' });
+    },
+  );
+}
--- a/packages/gateway/src/routes/metrics.ts
+++ b/packages/gateway/src/routes/metrics.ts
@ -0,0 +1,14 @@
+import type { FastifyInstance, FastifyRequest, FastifyReply } from 'fastify';
+import { getMetrics, getContentType } from '../observability/metrics.js';
+
+export async function metricsRoute(fastify: FastifyInstance): Promise<void> {
+  fastify.get(
+    '/metrics',
+    async (_request: FastifyRequest, reply: FastifyReply) => {
+      const metrics = await getMetrics();
+      return reply
+        .header('Content-Type', getContentType())
+        .send(metrics);
+    },
+  );
+}
--- a/packages/gateway/src/routes/review.ts
+++ b/packages/gateway/src/routes/review.ts
@ -0,0 +1,82 @@
+import type { FastifyInstance, FastifyRequest, FastifyReply } from 'fastify';
+import { z } from 'zod';
+import {
+  listPendingReviews,
+  processDecision,
+} from '../observability/review-queue.js';
+import { logger } from '../observability/logger.js';
+
+const DecisionSchema = z.object({
+  decision: z.enum(['approved', 'rejected', 'edited']),
+  edited_output: z.string().optional(),
+  reviewer_notes: z.string().optional(),
+});
+
+export async function reviewRoute(fastify: FastifyInstance): Promise<void> {
+  // List pending review items
+  fastify.get(
+    '/review',
+    async (
+      request: FastifyRequest<{ Querystring: { limit?: string; offset?: string } }>,
+      reply: FastifyReply,
+    ) => {
+      const limit = Math.min(parseInt(request.query.limit ?? '50', 10), 100);
+      const offset = Math.max(parseInt(request.query.offset ?? '0', 10), 0);
+
+      try {
+        const items = await listPendingReviews(limit, offset);
+        return reply.send({
+          items,
+          count: items.length,
+          limit,
+          offset,
+        });
+      } catch (err) {
+        logger.error({ err }, 'Failed to list review queue');
+        return reply.status(500).send({ statusCode: 500, error: 'Internal Error', message: 'Failed to list review items' });
+      }
+    },
+  );
+
+  // Submit decision for a review item
+  fastify.post(
+    '/review/:id/decide',
+    async (
+      request: FastifyRequest<{ Params: { id: string } }>,
+      reply: FastifyReply,
+    ) => {
+      const { id } = request.params;
+
+      let body;
+      try {
+        body = DecisionSchema.parse(request.body);
+      } catch (err) {
+        return reply.status(400).send({
+          statusCode: 400,
+          error: 'Bad Request',
+          message: err instanceof z.ZodError ? err.errors[0]?.message : 'Invalid request',
+        });
+      }
+
+      if (body.decision === 'edited' && !body.edited_output) {
+        return reply.status(400).send({
+          statusCode: 400,
+          error: 'Bad Request',
+          message: 'edited_output is required when decision is "edited"',
+        });
+      }
+
+      try {
+        const updated = await processDecision(id, body);
+        if (!updated) {
+          return reply.status(404).send({ statusCode: 404, error: 'Not Found', message: 'Review item not found' });
+        }
+        logger.info({ id, decision: body.decision }, 'Review decision submitted');
+        return reply.send(updated);
+      } catch (err) {
+        logger.error({ err, id }, 'Failed to process review decision');
+        return reply.status(500).send({ statusCode: 500, error: 'Internal Error', message: 'Failed to process decision' });
+      }
+    },
+  );
+}
--- a/packages/gateway/src/server.ts
+++ b/packages/gateway/src/server.ts
@ -0,0 +1,137 @@
+import Fastify from 'fastify';
+import fastifyCors from '@fastify/cors';
+import fastifyRateLimit from '@fastify/rate-limit';
+import fastifyHelmet from '@fastify/helmet';
+import { completionRoute } from './routes/completion.js';
+import { batchRoute } from './routes/batch.js';
+import { classifyRoute } from './routes/classify.js';
+import { healthRoute } from './routes/health.js';
+import { metricsRoute } from './routes/metrics.js';
+import { reviewRoute } from './routes/review.js';
+import { getPool } from './db/client.js';
+import { initPgBoss } from './queue/pg-boss-client.js';
+import { logger } from './observability/logger.js';
+
+const RATE_LIMITS: Record<string, number> = {
+  'n8n': 60,
+  'tip-scraper': 200,
+  'shieldx': 500,
+  'eo-global-pulse': 120,
+  'switchblade': 60,
+  'peercortex': 30,
+  'nognet': 30,
+  'internal': 1000,
+  'default': 20,
+};
+
+export function getCallerRateLimit(caller: string): number {
+  return RATE_LIMITS[caller] ?? RATE_LIMITS['default'] ?? 20;
+}
+
+async function buildServer() {
+  const server = Fastify({
+    logger: {
+      level: process.env['LOG_LEVEL'] ?? 'info',
+    },
+    trustProxy: true,
+  });
+
+  await server.register(fastifyHelmet, {
+    contentSecurityPolicy: {
+      directives: {
+        defaultSrc: ["'self'"],
+        scriptSrc: ["'none'"],
+        objectSrc: ["'none'"],
+      },
+    },
+  });
+
+  await server.register(fastifyCors, {
+    origin: [
+      'http://localhost:3000',
+      'http://localhost:3001',
+      'http://localhost:3100',
+      'http://192.168.178.169:3000',
+      'http://192.168.178.169:3001',
+      'http://192.168.178.196:3000',
+      /^http:\/\/192\.168\.178\.\d+/,
+      /^https:\/\/.*\.context-x\.org$/,
+    ],
+    methods: ['GET', 'POST', 'PUT', 'DELETE', 'OPTIONS'],
+    allowedHeaders: ['Content-Type', 'Authorization', 'X-Caller-ID'],
+    credentials: true,
+  });
+
+  await server.register(fastifyRateLimit, {
+    global: true,
+    max: 20,
+    timeWindow: '1 minute',
+    keyGenerator: (request) => {
+      const caller = (request.headers['x-caller-id'] as string) ?? 'default';
+      return `${caller}:${request.ip}`;
+    },
+    errorResponseBuilder: (_request, context) => ({
+      statusCode: 429,
+      error: 'Too Many Requests',
+      message: `Rate limit exceeded. Try again in ${context.after}`,
+    }),
+  });
+
+  await server.register(completionRoute, { prefix: '/v1' });
+  await server.register(batchRoute, { prefix: '/v1' });
+  await server.register(classifyRoute, { prefix: '/v1' });
+  await server.register(reviewRoute, { prefix: '/v1' });
+  await server.register(healthRoute);
+  await server.register(metricsRoute);
+
+  server.setErrorHandler((error, request, reply) => {
+    logger.error({ error, url: request.url, method: request.method }, 'Unhandled error');
+    const statusCode = error.statusCode ?? 500;
+    reply.status(statusCode).send({
+      statusCode,
+      error: error.name ?? 'InternalServerError',
+      message: statusCode >= 500 ? 'Internal server error' : error.message,
+    });
+  });
+
+  server.setNotFoundHandler((_request, reply) => {
+    reply.status(404).send({ statusCode: 404, error: 'Not Found', message: 'Route not found' });
+  });
+
+  return server;
+}
+
+async function main() {
+  const server = await buildServer();
+
+  const shutdown = async (signal: string) => {
+    logger.info({ signal }, 'Shutdown signal received');
+    try {
+      await server.close();
+      const pool = getPool();
+      await pool.end();
+      logger.info('Server and DB connections closed');
+      process.exit(0);
+    } catch (err) {
+      logger.error({ err }, 'Error during shutdown');
+      process.exit(1);
+    }
+  };
+
+  process.on('SIGTERM', () => shutdown('SIGTERM'));
+  process.on('SIGINT', () => shutdown('SIGINT'));
+
+  const port = parseInt(process.env['PORT'] ?? '3100', 10);
+  const host = process.env['HOST'] ?? '0.0.0.0';
+
+  try {
+    await initPgBoss();
+    await server.listen({ port, host });
+    logger.info({ port, host }, 'LLM Gateway started');
+  } catch (err) {
+    logger.error({ err }, 'Failed to start server');
+    process.exit(1);
+  }
+}
+
+main();
--- a/packages/gateway/src/validation/banlist-checker.ts
+++ b/packages/gateway/src/validation/banlist-checker.ts
@ -0,0 +1,111 @@
+import { EN_BANLIST } from '../banlists/en.js';
+import { DE_BANLIST } from '../banlists/de.js';
+import { AUTO_DETECTED_BANLIST } from '../banlists/auto-detected.js';
+import { getGiteaEntries } from '../banlists/sync-from-gitea.js';
+
+export interface BanViolation {
+  term: string;
+  category: string;
+  language: string;
+  position: number;
+  context: string;
+}
+
+export interface BanlistResult {
+  passed: boolean;
+  violations: BanViolation[];
+  score_penalty: number;
+}
+
+const PENALTY_PER_VIOLATION = 1.0;
+const MAX_PENALTY = 5.0;
+const CONTEXT_WINDOW = 50;
+
+function escapeForRegex(term: string): string {
+  return term.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
+}
+
+function buildPattern(term: string, wholeWord: boolean): RegExp {
+  const escaped = escapeForRegex(term);
+  const pattern = wholeWord ? `\\b${escaped}\\b` : escaped;
+  return new RegExp(pattern, 'gi');
+}
+
+function extractContext(text: string, position: number): string {
+  const start = Math.max(0, position - CONTEXT_WINDOW);
+  const end = Math.min(text.length, position + CONTEXT_WINDOW);
+  return text.slice(start, end).replace(/\n/g, ' ');
+}
+
+function checkList(
+  text: string,
+  entries: Array<{ term: string; category: string; wholeWord: boolean }>,
+  language: string,
+): BanViolation[] {
+  const violations: BanViolation[] = [];
+
+  for (const entry of entries) {
+    const regex = buildPattern(entry.term, entry.wholeWord);
+    let match: RegExpExecArray | null;
+
+    while ((match = regex.exec(text)) !== null) {
+      const position = match.index;
+      violations.push({
+        term: entry.term,
+        category: entry.category,
+        language,
+        position,
+        context: extractContext(text, position),
+      });
+      // Avoid infinite loop on zero-length match
+      if (match.index === regex.lastIndex) {
+        regex.lastIndex++;
+      }
+    }
+  }
+
+  return violations;
+}
+
+export function checkBanlist(
+  text: string,
+  language: 'en' | 'de' | 'auto' = 'auto',
+): BanlistResult {
+  const violations: BanViolation[] = [];
+
+  // Always check auto-detected patterns
+  violations.push(...checkList(text, AUTO_DETECTED_BANLIST, 'auto'));
+
+  // Language-specific checks
+  if (language === 'en' || language === 'auto') {
+    violations.push(...checkList(text, EN_BANLIST, 'en'));
+  }
+
+  if (language === 'de' || language === 'auto') {
+    violations.push(...checkList(text, DE_BANLIST, 'de'));
+  }
+
+  // Gitea synced additions
+  const giteaEntries = getGiteaEntries();
+  const relevantGiteaEntries = giteaEntries.filter(
+    (e) => e.language === 'auto' || e.language === language,
+  );
+  violations.push(...checkList(text, relevantGiteaEntries, 'gitea'));
+
+  // Deduplicate by term+position
+  const seen = new Set<string>();
+  const unique = violations.filter((v) => {
+    const key = `${v.term}:${v.position}`;
+    if (seen.has(key)) return false;
+    seen.add(key);
+    return true;
+  });
+
+  const penalty = Math.min(unique.length * PENALTY_PER_VIOLATION, MAX_PENALTY);
+
+  return {
+    passed: unique.length === 0,
+    violations: unique,
+    score_penalty: -penalty,
+  };
+}
--- a/packages/gateway/src/validation/fact-checker.ts
+++ b/packages/gateway/src/validation/fact-checker.ts
@ -0,0 +1,85 @@
+import { lookupAsn, lookupIx } from '../integrations/peeringdb.js';
+import { logger } from '../observability/logger.js';
+
+export interface FactCheckResult {
+  passed: boolean;
+  checks_performed: number;
+  failures: string[];
+  score_impact: number;
+}
+
+// ASN regex: AS followed by 1-10 digits
+const ASN_REGEX = /\bAS(\d{1,10})\b/g;
+// IX name patterns — rough heuristic
+const IX_NAME_REGEX = /\b([A-Z]{2,6}-IX|DE-CIX|LINX|AMS-IX|ECIX|BCIX|FNIX|KIXP)\b/g;
+
+export async function checkFacts(
+  text: string,
+  timeoutMs = 5000,
+): Promise<FactCheckResult> {
+  const failures: string[] = [];
+  let checksPerformed = 0;
+  let scoreImpact = 0;
+
+  // Extract ASNs
+  const asnMatches = [...text.matchAll(ASN_REGEX)];
+  const asns = [...new Set(asnMatches.map((m) => parseInt(m[1] ?? '0', 10)).filter((n) => n > 0))];
+
+  // Extract IX names
+  const ixMatches = [...text.matchAll(IX_NAME_REGEX)];
+  const ixNames = [...new Set(ixMatches.map((m) => m[1] ?? '').filter(Boolean))];
+
+  const asnChecks = asns.slice(0, 3).map(async (asn) => {
+    checksPerformed++;
+    try {
+      const result = await Promise.race([
+        lookupAsn(asn),
+        new Promise<null>((_, reject) =>
+          setTimeout(() => reject(new Error('timeout')), timeoutMs),
+        ),
+      ]);
+
+      if (result === null) {
+        // Could not find in PeeringDB — not necessarily wrong
+        logger.debug({ asn }, 'ASN not found in PeeringDB');
+      }
+    } catch (err) {
+      if ((err as Error).message === 'timeout') {
+        logger.debug({ asn }, 'PeeringDB ASN lookup timed out');
+      } else {
+        logger.warn({ err, asn }, 'PeeringDB ASN lookup error');
+      }
+    }
+  });
+
+  const ixChecks = ixNames.slice(0, 2).map(async (ixName) => {
+    checksPerformed++;
+    try {
+      const result = await Promise.race([
+        lookupIx(ixName),
+        new Promise<null>((_, reject) =>
+          setTimeout(() => reject(new Error('timeout')), timeoutMs),
+        ),
+      ]);
+
+      if (result === null) {
+        // IX name not found — flag as potential fabrication
+        failures.push(`IX "${ixName}" not found in PeeringDB`);
+        scoreImpact -= 2.0;
+      }
+    } catch (err) {
+      if ((err as Error).message !== 'timeout') {
+        logger.warn({ err, ixName }, 'PeeringDB IX lookup error');
+      }
+    }
+  });
+
+  await Promise.allSettled([...asnChecks, ...ixChecks]);
+
+  return {
+    passed: failures.length === 0,
+    checks_performed: checksPerformed,
+    failures,
+    score_impact: scoreImpact,
+  };
+}
--- a/packages/gateway/src/validation/language-checker.ts
+++ b/packages/gateway/src/validation/language-checker.ts
@ -0,0 +1,105 @@
+// franc is a pure ESM package — import as default
+// eslint-disable-next-line @typescript-eslint/ban-ts-comment
+// @ts-ignore — franc typings are CommonJS-shaped
+import { franc } from 'franc';
+
+export interface LanguageCheckResult {
+  passed: boolean;
+  detected_language: string;
+  required_language: string;
+  formality_issue: boolean;
+  sie_count: number;
+  du_count: number;
+  score_impact: number;
+  details: string[];
+}
+
+// German Sie-form indicators (formal)
+const SIE_PATTERNS = [
+  /\bSie\b/g,
+  /\bIhnen\b/g,
+  /\bIhr\b/g,
+  /\bIhre\b/g,
+  /\bIhrem\b/g,
+  /\bIhren\b/g,
+  /\bIhres\b/g,
+];
+
+// German du-form indicators (informal)
+const DU_PATTERNS = [
+  /\bdu\b/gi,
+  /\bdich\b/gi,
+  /\bdir\b/gi,
+  /\bdein\b/gi,
+  /\bdeine\b/gi,
+  /\bdeinem\b/gi,
+  /\bdeinen\b/gi,
+  /\bdeines\b/gi,
+];
+
+function countPatterns(text: string, patterns: RegExp[]): number {
+  return patterns.reduce((count, pattern) => {
+    const fresh = new RegExp(pattern.source, pattern.flags);
+    return count + (text.match(fresh) ?? []).length;
+  }, 0);
+}
+
+function mapFrancToLang(francCode: string): 'de' | 'en' | 'other' {
+  if (francCode === 'deu') return 'de';
+  if (francCode === 'eng') return 'en';
+  return 'other';
+}
+
+export function checkLanguage(
+  text: string,
+  requiredLanguage?: 'de' | 'en',
+  formalityMode?: 'du' | 'Sie',
+): LanguageCheckResult {
+  const francResult = franc(text, { minLength: 20 });
+  const detected = mapFrancToLang(francResult);
+  const required = requiredLanguage ?? 'en';
+  const details: string[] = [];
+  let scoreImpact = 0;
+
+  const wrongLanguage = requiredLanguage !== undefined && detected !== requiredLanguage && detected !== 'other';
+
+  if (wrongLanguage) {
+    scoreImpact -= 2.0;
+    details.push(`Wrong language: expected ${required}, detected ${detected}`);
+  }
+
+  // Check German formality
+  let sieCount = 0;
+  let duCount = 0;
+  let formalityIssue = false;
+
+  if (detected === 'de' || required === 'de') {
+    sieCount = countPatterns(text, SIE_PATTERNS);
+    duCount = countPatterns(text, DU_PATTERNS);
+
+    if (formalityMode === 'du' && sieCount > 2) {
+      // Should use du-form but uses Sie
+      scoreImpact -= 1.0;
+      formalityIssue = true;
+      details.push(`Formality mismatch: du-form required but found ${sieCount} Sie occurrences`);
+    } else if (formalityMode === 'Sie' && duCount > 2) {
+      // Should use Sie-form but uses du
+      scoreImpact -= 0.5;
+      formalityIssue = true;
+      details.push(`Formality mismatch: Sie-form required but found ${duCount} du occurrences`);
+    }
+  }
+
+  const passed = !wrongLanguage && !formalityIssue;
+
+  return {
+    passed,
+    detected_language: detected,
+    required_language: required,
+    formality_issue: formalityIssue,
+    sie_count: sieCount,
+    du_count: duCount,
+    score_impact: scoreImpact,
+    details,
+  };
+}
--- a/packages/gateway/src/validation/schema-validator.ts
+++ b/packages/gateway/src/validation/schema-validator.ts
@ -0,0 +1,43 @@
+// eslint-disable-next-line @typescript-eslint/no-var-requires
+import { createRequire } from 'module';
+const require = createRequire(import.meta.url);
+const Ajv = require('ajv');
+
+const ajv = new Ajv({ allErrors: true, strict: false });
+
+export interface SchemaValidatorResult {
+  passed: boolean;
+  errors: string[];
+  score_impact: number;
+  retry: boolean;
+}
+
+const validatorCache = new Map<string, unknown>();
+
+export function validateSchema(output: string, schema: Record<string, unknown> | undefined): SchemaValidatorResult {
+  if (!schema || Object.keys(schema).length === 0) {
+    return { passed: true, errors: [], score_impact: 0, retry: false };
+  }
+  
+  let parsed: unknown;
+  try {
+    parsed = JSON.parse(output);
+  } catch {
+    return { passed: false, errors: ['Output is not valid JSON'], score_impact: -8, retry: true };
+  }
+  
+  const schemaKey = JSON.stringify(schema);
+  let validate = validatorCache.get(schemaKey) as ((data: unknown) => boolean) | undefined;
+  if (!validate) {
+    validate = ajv.compile(schema) as (data: unknown) => boolean;
+    validatorCache.set(schemaKey, validate);
+  }
+  
+  const valid = validate(parsed);
+  if (!valid) {
+    const errors = (ajv.errorsText((validate as unknown as { errors: unknown[] | null }).errors) || 'Schema validation failed').split(', ');
+    return { passed: false, errors, score_impact: -5, retry: true };
+  }
+  
+  return { passed: true, errors: [], score_impact: 0.5, retry: false };
+}
--- a/packages/gateway/src/validation/tip-validator.ts
+++ b/packages/gateway/src/validation/tip-validator.ts
@ -0,0 +1,166 @@
+// TIP-specific fact validation for transceiver data
+
+export interface TipValidationResult {
+  passed: boolean;
+  errors: string[];
+  score_impact: number;
+  immediate_reject: boolean;
+}
+
+// Valid data rates in Gbps
+const VALID_DATA_RATES_GBPS = new Set([1, 2.5, 10, 25, 40, 100, 200, 400, 800, 1600]);
+
+// Valid form factors per SFF-8024
+const VALID_FORM_FACTORS = new Set([
+  'SFP', 'SFP+', 'SFP28', 'SFP56', 'SFP-DD',
+  'QSFP', 'QSFP+', 'QSFP28', 'QSFP56', 'QSFP-DD',
+  'OSFP', 'OSFP-XD',
+  'CFP', 'CFP2', 'CFP4', 'CFP8',
+  'CXP',
+  'XFP',
+  'X2',
+  'XENPAK',
+  'GBIC',
+  'SFP-DCO',
+  'DSFP',
+  'CDFP',
+  '400G-FR4',
+]);
+
+// Valid connector types
+const VALID_CONNECTORS = new Set([
+  'LC', 'SC', 'MPO', 'MPO-12', 'MPO-16', 'MPO-24',
+  'CS', 'SN', 'MDI', 'RJ45',
+  'MTP', 'MTRJ',
+  'FC', 'ST',
+  'E2000',
+  'DAC', 'ACC',
+  'CU',
+]);
+
+// Valid fiber types
+const VALID_FIBER_TYPES = new Set(['SMF', 'MMF', 'OM3', 'OM4', 'OM5', 'OS1', 'OS2', 'DAC', 'AOC', 'ACO']);
+
+// CWDM wavelengths: 1271 to 1611 nm, 20 nm steps
+const CWDM_WAVELENGTHS = new Set(
+  Array.from({ length: 18 }, (_, i) => 1271 + i * 20),
+);
+
+// Standard single-wavelength values
+const STANDARD_WAVELENGTHS = new Set([
+  850, 1310, 1330, 1550, 1490, 1270, 1300, 1320,
+  ...CWDM_WAVELENGTHS,
+]);
+
+// DWDM C-band: roughly 191.7 THz to 196.1 THz = 1528 to 1565 nm, ~0.8 nm spacing
+// We allow 1525 to 1570 nm range for DWDM
+const DWDM_MIN_NM = 1525;
+const DWDM_MAX_NM = 1570;
+const DWDM_STEP_NM = 0.8; // approximate ITU-T G.694.1 spacing
+
+function isValidWavelength(nm: number): boolean {
+  if (STANDARD_WAVELENGTHS.has(nm)) return true;
+  // Check DWDM range with approximate step validation
+  if (nm >= DWDM_MIN_NM && nm <= DWDM_MAX_NM) {
+    // Rough check: should be close to a 0.8nm ITU grid point
+    const offset = (nm - DWDM_MIN_NM) % DWDM_STEP_NM;
+    return offset < 0.2 || offset > 0.6;
+  }
+  return false;
+}
+
+// Regex patterns for extracting values from text
+const DATA_RATE_REGEX = /(\d+(?:\.\d+)?)\s*(?:G|Gbps|Gb\/s)/gi;
+const FORM_FACTOR_REGEX = /\b(SFP\+?28?56?-?DD?|QSFP\+?28?56?-?DD?|OSFP(?:-XD)?|CFP[248]?|CXP|XFP|GBIC|DSFP|CDFP)\b/gi;
+const WAVELENGTH_REGEX = /(\d{3,4}(?:\.\d+)?)\s*nm/gi;
+const CONNECTOR_REGEX = /\b(LC|SC|MPO(?:-\d+)?|MTP|CS|SN|RJ45|MDI|MTRJ|FC|ST|E2000|CU)\b/gi;
+
+export function validateTipContent(text: string, isJsonMode = false): TipValidationResult {
+  const errors: string[] = [];
+  let scoreImpact = 0;
+  let content = text;
+
+  // For JSON mode, extract string values
+  if (isJsonMode) {
+    try {
+      const parsed = JSON.parse(text);
+      content = JSON.stringify(parsed);
+    } catch {
+      // Fall through to text mode
+    }
+  }
+
+  // Check data rates
+  let match: RegExpExecArray | null;
+  const dataRateRegex = new RegExp(DATA_RATE_REGEX.source, 'gi');
+  while ((match = dataRateRegex.exec(content)) !== null) {
+    const rate = parseFloat(match[1] ?? '0');
+    if (!VALID_DATA_RATES_GBPS.has(rate)) {
+      errors.push(`Invalid data rate: ${rate} Gbps (not in standard set)`);
+      scoreImpact -= 1.5;
+    }
+  }
+
+  // Check form factors
+  const formFactorRegex = new RegExp(FORM_FACTOR_REGEX.source, 'gi');
+  while ((match = formFactorRegex.exec(content)) !== null) {
+    const ff = (match[1] ?? '').toUpperCase();
+    if (ff && !VALID_FORM_FACTORS.has(ff)) {
+      errors.push(`Unknown form factor: ${ff}`);
+      scoreImpact -= 1.5;
+    }
+  }
+
+  // Check wavelengths
+  const wavelengthRegex = new RegExp(WAVELENGTH_REGEX.source, 'gi');
+  while ((match = wavelengthRegex.exec(content)) !== null) {
+    const wl = parseFloat(match[1] ?? '0');
+    if (wl > 500 && !isValidWavelength(Math.round(wl))) {
+      errors.push(`Suspicious wavelength: ${wl} nm (not on standard grid)`);
+      scoreImpact -= 1.5;
+    }
+  }
+
+  // Check connectors (only flag clearly wrong values if context suggests transceiver spec)
+  if (content.toLowerCase().includes('connector')) {
+    const connectorRegex = new RegExp(CONNECTOR_REGEX.source, 'gi');
+    const found: string[] = [];
+    while ((match = connectorRegex.exec(content)) !== null) {
+      if (match[1]) found.push(match[1].toUpperCase());
+    }
+    for (const c of found) {
+      if (!VALID_CONNECTORS.has(c)) {
+        errors.push(`Unknown connector type: ${c}`);
+        scoreImpact -= 1.5;
+      }
+    }
+  }
+
+  const IMMEDIATE_REJECT_THRESHOLD = 3;
+  const immediateReject = errors.length >= IMMEDIATE_REJECT_THRESHOLD;
+
+  return {
+    passed: errors.length === 0,
+    errors,
+    score_impact: scoreImpact,
+    immediate_reject: immediateReject,
+  };
+}
+
+export function isValidFormFactor(ff: string): boolean {
+  return VALID_FORM_FACTORS.has(ff.toUpperCase());
+}
+
+export function isValidDataRate(gbps: number): boolean {
+  return VALID_DATA_RATES_GBPS.has(gbps);
+}
+
+export function isValidConnector(connector: string): boolean {
+  return VALID_CONNECTORS.has(connector.toUpperCase());
+}
+
+export function isValidFiberType(fiber: string): boolean {
+  return VALID_FIBER_TYPES.has(fiber.toUpperCase());
+}
+
+export { VALID_DATA_RATES_GBPS, VALID_FORM_FACTORS, VALID_CONNECTORS, VALID_FIBER_TYPES };
--- a/packages/gateway/tsconfig.json
+++ b/packages/gateway/tsconfig.json
@ -0,0 +1,22 @@
+{
+  "compilerOptions": {
+    "target": "ES2022",
+    "module": "NodeNext",
+    "moduleResolution": "NodeNext",
+    "lib": ["ES2022"],
+    "outDir": "./dist",
+    "rootDir": "./src",
+    "strict": true,
+    "exactOptionalPropertyTypes": false,
+    "noUncheckedIndexedAccess": false,
+    "esModuleInterop": true,
+    "skipLibCheck": true,
+    "forceConsistentCasingInFileNames": true,
+    "resolveJsonModule": true,
+    "declaration": true,
+    "declarationMap": true,
+    "sourceMap": true
+  },
+  "include": ["src/**/*"],
+  "exclude": ["node_modules", "dist"]
+}
--- a/packages/learning/Dockerfile
+++ b/packages/learning/Dockerfile
@ -0,0 +1,18 @@
+FROM node:22-alpine
+
+WORKDIR /app
+
+# Install tsx for running TypeScript directly
+RUN npm install -g tsx
+
+# Copy workspace root and learning package manifests
+COPY package.json ./
+COPY packages/learning/package.json ./packages/learning/
+
+# Install dependencies
+RUN npm install --workspace=packages/learning
+
+# Copy learning source
+COPY packages/learning/src ./packages/learning/src
+
+CMD ["node", "--import", "tsx/esm", "packages/learning/src/index.ts"]
--- a/packages/learning/package.json
+++ b/packages/learning/package.json
@ -0,0 +1,25 @@
+{
+  "name": "@llm-gateway/learning",
+  "version": "1.0.0",
+  "type": "module",
+  "scripts": {
+    "dev": "tsx watch src/index.ts",
+    "start": "node --import tsx/esm src/index.ts",
+    "build": "tsc"
+  },
+  "dependencies": {
+    "pg": "^8.13.1",
+    "pg-boss": "^10.1.3",
+    "js-yaml": "^4.1.0",
+    "node-cron": "^3.0.3",
+    "pino": "^9.5.0",
+    "tsx": "^4.19.2"
+  },
+  "devDependencies": {
+    "typescript": "^5.7.2",
+    "@types/node": "^22.10.6",
+    "@types/pg": "^8.11.10",
+    "@types/node-cron": "^3.0.11",
+    "@types/js-yaml": "^4.0.9"
+  }
+}
--- a/packages/learning/src/ban-learner/index.ts
+++ b/packages/learning/src/ban-learner/index.ts
@ -0,0 +1,396 @@
+/**
+ * Ban Learner — auto-detects new banned terms from approved outputs.
+ *
+ * Algorithm:
+ * 1. Pull last 24h of approved outputs
+ * 2. Extract suspicious phrases via regex pattern analysis
+ * 3. Compare edited review_queue items (what was removed = candidate)
+ * 4. Ask gateway LLM to identify AI-filler in low-confidence samples
+ * 5. Upsert candidates into ban_candidates with occurrence counts
+ * 6. Auto-promote candidates with count >= 5 to ban_candidates (promoted=true)
+ */
+
+import { query, withTransaction } from '../db/client.js';
+import { callGateway } from '../gateway-client.js';
+import { logger } from '../observability/logger.js';
+
+// ─── Pattern sets ───────────────────────────────────────────────────────────
+
+const EN_OPENER_PATTERNS = [
+  /\bin today'?s\b/gi,
+  /\bas we\b/gi,
+  /\bit(?:'s| is) worth noting\b/gi,
+  /\bit(?:'s| is) important to\b/gi,
+  /\bin (?:this|the) (?:fast-paced|ever-changing|dynamic)\b/gi,
+  /\bwithout further ado\b/gi,
+  /\blet(?:'s| us) dive (?:in|into)\b/gi,
+  /\bin conclusion\b/gi,
+  /\bto summarize\b/gi,
+  /\bhaving said that\b/gi,
+  /\bthat being said\b/gi,
+  /\ball things considered\b/gi,
+  /\bat the end of the day\b/gi,
+  /\bwhen all is said and done\b/gi,
+];
+
+const EN_BUZZWORD_PATTERNS = [
+  /\bleverage[sd]?\b/gi,
+  /\bsynerg(?:y|ies|ize[sd]?)\b/gi,
+  /\bholistic(?:ally)?\b/gi,
+  /\bcutting-edge\b/gi,
+  /\bstate-of-the-art\b/gi,
+  /\bparadigm shift\b/gi,
+  /\bgame[\s-]changer\b/gi,
+  /\bthought leader(?:ship)?\b/gi,
+  /\bpivot[ed]?\b/gi,
+  /\bdisrupt(?:ive|ion|ing)?\b/gi,
+  /\bbest-in-class\b/gi,
+  /\bworld-class\b/gi,
+  /\bempower(?:ing|ment)?\b/gi,
+  /\btransform(?:ative|ation)?\b/gi,
+  /\bseamless(?:ly)?\b/gi,
+  /\brobust solution\b/gi,
+];
+
+const EN_FILLER_PATTERNS = [
+  /\btruly\b/gi,
+  /\breally\b/gi,
+  /\babsolutely\b/gi,
+  /\bvery unique\b/gi,
+  /\bquite frankly\b/gi,
+  /\bneedless to say\b/gi,
+  /\bfirst and foremost\b/gi,
+  /\blast but not least\b/gi,
+  /\brest assured\b/gi,
+];
+
+const DE_FILLER_PATTERNS = [
+  /\bletztendlich\b/gi,
+  /\bzusammenfassend\b/gi,
+  /\babschlie[ßs]end\b/gi,
+  /\bganzheitlich\b/gi,
+  /\bnachhaltig\b/gi,
+  /\binnovativ\b/gi,
+  /\bsynergi(?:e|en|stisch)\b/gi,
+  /\bim endeffekt\b/gi,
+  /\bzu guter letzt\b/gi,
+  /\bgrunds[äa]tzlich\b/gi,
+  /\bselbstverst[äa]ndlich\b/gi,
+  /\bdiesbez[üu]glich\b/gi,
+];
+
+interface PatternGroup {
+  patterns: RegExp[];
+  category: 'opener' | 'closer' | 'buzzword' | 'filler' | 'transition';
+  language: 'en' | 'de' | 'auto';
+}
+
+const ALL_PATTERN_GROUPS: PatternGroup[] = [
+  { patterns: EN_OPENER_PATTERNS, category: 'opener', language: 'en' },
+  { patterns: EN_BUZZWORD_PATTERNS, category: 'buzzword', language: 'en' },
+  { patterns: EN_FILLER_PATTERNS, category: 'filler', language: 'en' },
+  { patterns: DE_FILLER_PATTERNS, category: 'filler', language: 'de' },
+];
+
+// ─── Types ──────────────────────────────────────────────────────────────────
+
+interface CandidateTerm {
+  term: string;
+  language: 'en' | 'de' | 'auto';
+  category: 'opener' | 'closer' | 'buzzword' | 'filler' | 'transition';
+  context: string;
+  taskType: string;
+}
+
+interface ApprovedOutput {
+  id: string;
+  task_type: string;
+  output_text: string;
+  confidence: number;
+}
+
+interface EditedOutput {
+  task_type: string;
+  output_text: string;
+  edited_output: string;
+}
+
+// ─── Core functions ─────────────────────────────────────────────────────────
+
+function extractCandidatesFromText(
+  text: string,
+  taskType: string,
+): CandidateTerm[] {
+  const candidates: CandidateTerm[] = [];
+
+  for (const group of ALL_PATTERN_GROUPS) {
+    for (const pattern of group.patterns) {
+      const matches = [...text.matchAll(pattern)];
+      for (const match of matches) {
+        const term = match[0]?.toLowerCase();
+        if (!term) continue;
+
+        // Extract surrounding context (up to 80 chars)
+        const start = Math.max(0, (match.index ?? 0) - 40);
+        const end = Math.min(text.length, (match.index ?? 0) + term.length + 40);
+        const context = text.slice(start, end).replace(/\n/g, ' ').trim();
+
+        candidates.push({
+          term,
+          language: group.language,
+          category: group.category,
+          context,
+          taskType,
+        });
+      }
+    }
+  }
+
+  return candidates;
+}
+
+function extractDiffCandidates(
+  original: string,
+  edited: string,
+  taskType: string,
+): CandidateTerm[] {
+  const candidates: CandidateTerm[] = [];
+
+  // Simple word-level diff: find words in original not in edited
+  const origWords = new Set(original.toLowerCase().split(/\s+/));
+  const editWords = new Set(edited.toLowerCase().split(/\s+/));
+
+  // Removed phrases: check if any known pattern terms were removed
+  for (const group of ALL_PATTERN_GROUPS) {
+    for (const pattern of group.patterns) {
+      const matches = [...original.matchAll(pattern)];
+      for (const match of matches) {
+        const term = match[0]?.toLowerCase();
+        if (!term) continue;
+        const termWords = term.split(/\s+/);
+        const removedAll = termWords.every((w) => !editWords.has(w) && origWords.has(w));
+        if (removedAll) {
+          const idx = match.index ?? 0;
+          const context = original.slice(Math.max(0, idx - 40), idx + term.length + 40).trim();
+          candidates.push({ term, language: group.language, category: group.category, context, taskType });
+        }
+      }
+    }
+  }
+
+  return candidates;
+}
+
+async function parseLlmBanCandidates(llmOutput: string): Promise<CandidateTerm[]> {
+  try {
+    const json = JSON.parse(llmOutput) as {
+      candidates: Array<{
+        term: string;
+        language: string;
+        category: string;
+        example_context: string;
+      }>;
+    };
+
+    return (json.candidates ?? []).map((c) => ({
+      term: c.term.toLowerCase().trim(),
+      language: (['en', 'de', 'auto'].includes(c.language) ? c.language : 'auto') as 'en' | 'de' | 'auto',
+      category: (['buzzword', 'filler', 'opener', 'closer', 'transition'].includes(c.category)
+        ? c.category
+        : 'filler') as CandidateTerm['category'],
+      context: c.example_context ?? '',
+      taskType: 'llm-detected',
+    }));
+  } catch {
+    logger.warn({ llmOutput: llmOutput.slice(0, 200) }, 'Failed to parse LLM ban candidate response');
+    return [];
+  }
+}
+
+async function upsertCandidate(
+  candidatesByTerm: Map<string, { term: CandidateTerm; taskTypes: Set<string>; contexts: string[] }>,
+): Promise<{ upserted: number; promoted: number }> {
+  let upserted = 0;
+  let promoted = 0;
+
+  for (const [key, data] of candidatesByTerm.entries()) {
+    const { term } = data;
+    const taskTypes = [...data.taskTypes];
+    const contexts = data.contexts.slice(0, 3);
+
+    try {
+      await withTransaction(async (client) => {
+        // Upsert: if term+language already exists, increment count
+        const result = await client.query<{ id: string; occurrence_count: number; promoted: boolean }>(
+          `INSERT INTO ban_candidates (term, language, category, occurrence_count, source_task_types, example_contexts)
+           VALUES ($1, $2, $3, $4, $5, $6)
+           ON CONFLICT (term, language) DO UPDATE
+             SET occurrence_count = ban_candidates.occurrence_count + $4,
+                 source_task_types = (
+                   SELECT array_agg(DISTINCT t) FROM unnest(
+                     ban_candidates.source_task_types || $5::text[]
+                   ) AS t
+                 ),
+                 example_contexts = CASE
+                   WHEN array_length(ban_candidates.example_contexts, 1) < 3
+                   THEN ban_candidates.example_contexts || $6::text[]
+                   ELSE ban_candidates.example_contexts
+                 END
+           WHERE ban_candidates.rejected = false
+           RETURNING id, occurrence_count, promoted`,
+          [
+            term.term,
+            term.language,
+            term.category,
+            data.taskTypes.size,
+            taskTypes,
+            contexts,
+          ],
+        );
+
+        upserted++;
+        const row = result.rows[0];
+
+        // Auto-promote if threshold reached
+        if (row && !row.promoted && row.occurrence_count >= 5) {
+          await client.query(
+            `UPDATE ban_candidates SET promoted = true, promoted_at = now() WHERE id = $1`,
+            [row.id],
+          );
+          promoted++;
+          logger.info(
+            { term: term.term, language: term.language, count: row.occurrence_count },
+            'Auto-promoted ban candidate to banlist',
+          );
+        }
+      });
+    } catch (err) {
+      logger.error({ err, term: key }, 'Failed to upsert ban candidate');
+    }
+  }
+
+  return { upserted, promoted };
+}
+
+// ─── Main job ───────────────────────────────────────────────────────────────
+
+export async function runBanLearner(): Promise<void> {
+  const startedAt = Date.now();
+  logger.info('Ban learner job started');
+
+  // 1. Pull last 24h approved outputs
+  const approvedResult = await query<ApprovedOutput>(
+    `SELECT id, task_type, output_text, confidence::float as confidence
+     FROM llm_calls
+     WHERE status = 'approved'
+       AND created_at > now() - interval '24 hours'
+       AND output_text IS NOT NULL
+       AND output_text != ''
+     ORDER BY created_at DESC
+     LIMIT 500`,
+  );
+
+  const approved = approvedResult.rows;
+  logger.info({ count: approved.length }, 'Pulled approved outputs');
+
+  // 2. Pull edited outputs from review_queue
+  const editedResult = await query<EditedOutput>(
+    `SELECT rq.task_type, rq.output_text, rq.edited_output
+     FROM review_queue rq
+     WHERE rq.decision = 'edited'
+       AND rq.edited_output IS NOT NULL
+       AND rq.reviewed_at > now() - interval '24 hours'`,
+  );
+
+  const edited = editedResult.rows;
+  logger.info({ count: edited.length }, 'Pulled edited outputs from review_queue');
+
+  // 3. Pull low-confidence outputs for LLM analysis
+  const lowConfResult = await query<ApprovedOutput>(
+    `SELECT id, task_type, output_text, confidence::float as confidence
+     FROM llm_calls
+     WHERE confidence < 6.0
+       AND created_at > now() - interval '24 hours'
+       AND output_text IS NOT NULL
+       AND status IN ('approved', 'warning')
+     ORDER BY confidence ASC
+     LIMIT 20`,
+  );
+
+  const lowConf = lowConfResult.rows;
+
+  // Accumulate all candidates
+  const candidateMap = new Map<
+    string,
+    { term: CandidateTerm; taskTypes: Set<string>; contexts: string[] }
+  >();
+
+  const addCandidate = (c: CandidateTerm) => {
+    const key = `${c.term}::${c.language}`;
+    const existing = candidateMap.get(key);
+    if (existing) {
+      existing.taskTypes.add(c.taskType);
+      if (existing.contexts.length < 3) existing.contexts.push(c.context);
+    } else {
+      candidateMap.set(key, {
+        term: c,
+        taskTypes: new Set([c.taskType]),
+        contexts: [c.context],
+      });
+    }
+  };
+
+  // Extract from approved outputs via regex
+  for (const output of approved) {
+    const candidates = extractCandidatesFromText(output.output_text, output.task_type);
+    candidates.forEach(addCandidate);
+  }
+
+  // Extract from edited diffs
+  for (const edit of edited) {
+    if (!edit.edited_output) continue;
+    const candidates = extractDiffCandidates(edit.output_text, edit.edited_output, edit.task_type);
+    candidates.forEach(addCandidate);
+  }
+
+  // 4. LLM-based analysis of low-confidence samples
+  if (lowConf.length >= 5) {
+    const samples = lowConf
+      .slice(0, 20)
+      .map((o, i) => `--- Sample ${i + 1} (confidence: ${o.confidence}) ---\n${o.output_text.slice(0, 300)}`)
+      .join('\n\n');
+
+    try {
+      const result = await callGateway({
+        taskType: 'internal-ban-detect',
+        input: samples,
+        caller: 'internal',
+      });
+
+      const llmCandidates = await parseLlmBanCandidates(result.output);
+      logger.info({ count: llmCandidates.length }, 'LLM detected ban candidates');
+      llmCandidates.forEach(addCandidate);
+    } catch (err) {
+      logger.warn({ err }, 'LLM ban detection failed, continuing without it');
+    }
+  }
+
+  // 5. Filter: only candidates appearing in >= 3 different outputs
+  const filteredCandidates = new Map(
+    [...candidateMap.entries()].filter(([, v]) => v.taskTypes.size >= 3),
+  );
+
+  logger.info(
+    { total: candidateMap.size, filtered: filteredCandidates.size },
+    'Filtered ban candidates by occurrence threshold',
+  );
+
+  // 6. Upsert to DB
+  const { upserted, promoted } = await upsertCandidate(filteredCandidates);
+
+  const durationMs = Date.now() - startedAt;
+  logger.info(
+    { upserted, promoted, durationMs },
+    'Ban learner job completed',
+  );
+}
--- a/packages/learning/src/db/client.ts
+++ b/packages/learning/src/db/client.ts
@ -0,0 +1,78 @@
+import pg from 'pg';
+import { logger } from '../observability/logger.js';
+
+const { Pool } = pg;
+
+let pool: pg.Pool | null = null;
+
+export function getPool(): pg.Pool {
+  if (!pool) {
+    pool = new Pool({
+      host: process.env['DB_HOST'] ?? 'localhost',
+      port: parseInt(process.env['DB_PORT'] ?? '5432', 10),
+      database: process.env['DB_NAME'] ?? 'llm_gateway',
+      user: process.env['DB_USER'] ?? 'llm',
+      password: process.env['DB_PASSWORD'] ?? '',
+      max: 5,
+      idleTimeoutMillis: 30_000,
+      connectionTimeoutMillis: 5_000,
+    });
+
+    pool.on('error', (err) => {
+      logger.error({ err }, 'PostgreSQL pool error (learning engine)');
+    });
+  }
+  return pool;
+}
+
+export async function query<T extends pg.QueryResultRow = pg.QueryResultRow>(
+  sql: string,
+  params?: unknown[],
+): Promise<pg.QueryResult<T>> {
+  const p = getPool();
+  const maxRetries = 3;
+  let lastError: Error | null = null;
+
+  for (let attempt = 0; attempt < maxRetries; attempt++) {
+    try {
+      return await p.query<T>(sql, params);
+    } catch (err) {
+      const pgErr = err as pg.DatabaseError;
+      const isRetryable = pgErr.code === '40P01' || pgErr.code === '40001';
+      if (!isRetryable || attempt === maxRetries - 1) {
+        throw err;
+      }
+      lastError = pgErr;
+      const delay = 50 * Math.pow(2, attempt);
+      await new Promise((resolve) => setTimeout(resolve, delay));
+      logger.warn({ attempt, sql: sql.slice(0, 80) }, 'Retrying after deadlock');
+    }
+  }
+
+  throw lastError ?? new Error('Query failed after retries');
+}
+
+export async function withTransaction<T>(
+  fn: (client: pg.PoolClient) => Promise<T>,
+): Promise<T> {
+  const p = getPool();
+  const client = await p.connect();
+  try {
+    await client.query('BEGIN');
+    const result = await fn(client);
+    await client.query('COMMIT');
+    return result;
+  } catch (err) {
+    await client.query('ROLLBACK');
+    throw err;
+  } finally {
+    client.release();
+  }
+}
+
+export async function closePool(): Promise<void> {
+  if (pool) {
+    await pool.end();
+    pool = null;
+  }
+}
--- a/packages/learning/src/db/migrations/002_learning.sql
+++ b/packages/learning/src/db/migrations/002_learning.sql
@ -0,0 +1,129 @@
+-- LLM Gateway Learning Engine Schema
+-- Run after 001_initial.sql
+-- psql -U llm -d llm_gateway -f 002_learning.sql
+
+-- ─── BAN CANDIDATES ────────────────────────────────────────────────────────
+-- Auto-detected suspicious phrases waiting for promotion to banlist
+CREATE TABLE IF NOT EXISTS ban_candidates (
+  id                UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+  created_at        TIMESTAMPTZ NOT NULL DEFAULT now(),
+  term              VARCHAR(256) NOT NULL,
+  language          VARCHAR(4) NOT NULL CHECK (language IN ('en', 'de', 'auto')),
+  category          VARCHAR(32) NOT NULL CHECK (category IN ('buzzword', 'filler', 'opener', 'closer', 'transition')),
+  occurrence_count  INTEGER NOT NULL DEFAULT 1,
+  source_task_types TEXT[] NOT NULL DEFAULT '{}',
+  example_contexts  TEXT[],
+  promoted          BOOLEAN NOT NULL DEFAULT false,
+  promoted_at       TIMESTAMPTZ,
+  rejected          BOOLEAN NOT NULL DEFAULT false,
+  rejected_at       TIMESTAMPTZ,
+  rejected_reason   TEXT,
+  UNIQUE(term, language)
+);
+
+CREATE INDEX IF NOT EXISTS idx_ban_candidates_term ON ban_candidates (term, language);
+CREATE INDEX IF NOT EXISTS idx_ban_candidates_count ON ban_candidates (occurrence_count DESC) WHERE promoted = false AND rejected = false;
+
+-- ─── FEW-SHOT CANDIDATES ────────────────────────────────────────────────────
+CREATE TABLE IF NOT EXISTS few_shot_candidates (
+  id                    UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+  created_at            TIMESTAMPTZ NOT NULL DEFAULT now(),
+  task_type             VARCHAR(64) NOT NULL,
+  llm_call_id           UUID REFERENCES llm_calls(id) ON DELETE SET NULL,
+  input_text            TEXT NOT NULL,
+  output_text           TEXT NOT NULL,
+  confidence            NUMERIC(3,1) NOT NULL,
+  similarity_to_existing NUMERIC(4,3),
+  promoted              BOOLEAN NOT NULL DEFAULT false,
+  promoted_at           TIMESTAMPTZ,
+  template_version      VARCHAR(16),
+  is_negative           BOOLEAN NOT NULL DEFAULT false,
+  negative_reason       TEXT
+);
+
+CREATE INDEX IF NOT EXISTS idx_few_shot_candidates_task ON few_shot_candidates (task_type, confidence DESC);
+CREATE INDEX IF NOT EXISTS idx_few_shot_candidates_pending ON few_shot_candidates (task_type) WHERE promoted = false;
+
+-- ─── ROUTING CANDIDATES ─────────────────────────────────────────────────────
+CREATE TABLE IF NOT EXISTS routing_candidates (
+  id                        UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+  created_at                TIMESTAMPTZ NOT NULL DEFAULT now(),
+  task_type                 VARCHAR(64) NOT NULL,
+  current_model             VARCHAR(128) NOT NULL,
+  candidate_model           VARCHAR(128) NOT NULL,
+  current_avg_confidence    NUMERIC(4,2),
+  candidate_avg_confidence  NUMERIC(4,2),
+  current_p95_latency_ms    INTEGER,
+  candidate_p95_latency_ms  INTEGER,
+  sample_size               INTEGER NOT NULL,
+  auto_applied              BOOLEAN NOT NULL DEFAULT false,
+  applied_at                TIMESTAMPTZ,
+  rollback_at               TIMESTAMPTZ,
+  rollback_reason           TEXT
+);
+
+CREATE INDEX IF NOT EXISTS idx_routing_candidates_task ON routing_candidates (task_type, created_at DESC);
+
+-- ─── PROMPT CANDIDATES ──────────────────────────────────────────────────────
+CREATE TABLE IF NOT EXISTS prompt_candidates (
+  id                       UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+  created_at               TIMESTAMPTZ NOT NULL DEFAULT now(),
+  template_id              VARCHAR(128) NOT NULL,
+  current_version          VARCHAR(16) NOT NULL,
+  candidate_version        VARCHAR(16) NOT NULL,
+  current_system_prompt    TEXT NOT NULL,
+  candidate_system_prompt  TEXT NOT NULL,
+  improvement_rationale    TEXT NOT NULL,
+  changes_made             TEXT[] NOT NULL DEFAULT '{}',
+  expected_improvements    TEXT[] NOT NULL DEFAULT '{}',
+  test_confidence_delta    NUMERIC(4,2),
+  auto_applied             BOOLEAN NOT NULL DEFAULT false,
+  human_approved           BOOLEAN,
+  applied_at               TIMESTAMPTZ,
+  review_queue_id          UUID REFERENCES review_queue(id) ON DELETE SET NULL
+);
+
+CREATE INDEX IF NOT EXISTS idx_prompt_candidates_template ON prompt_candidates (template_id, created_at DESC);
+CREATE INDEX IF NOT EXISTS idx_prompt_candidates_pending ON prompt_candidates (template_id) WHERE auto_applied = false AND human_approved IS NULL;
+
+-- ─── LEARNING REPORTS ───────────────────────────────────────────────────────
+CREATE TABLE IF NOT EXISTS learning_reports (
+  id           UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+  created_at   TIMESTAMPTZ NOT NULL DEFAULT now(),
+  period_from  TIMESTAMPTZ NOT NULL,
+  period_to    TIMESTAMPTZ NOT NULL,
+  report_data  JSONB NOT NULL
+);
+
+CREATE INDEX IF NOT EXISTS idx_learning_reports_period ON learning_reports (period_from DESC);
+
+-- ─── A/B TEST TRACKING ──────────────────────────────────────────────────────
+CREATE TABLE IF NOT EXISTS ab_tests (
+  id                  UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+  created_at          TIMESTAMPTZ NOT NULL DEFAULT now(),
+  completed_at        TIMESTAMPTZ,
+  task_type           VARCHAR(64) NOT NULL,
+  control_model       VARCHAR(128) NOT NULL,
+  challenger_model    VARCHAR(128) NOT NULL,
+  traffic_percent     INTEGER NOT NULL DEFAULT 10,
+  control_calls       INTEGER NOT NULL DEFAULT 0,
+  challenger_calls    INTEGER NOT NULL DEFAULT 0,
+  control_avg_conf    NUMERIC(4,2),
+  challenger_avg_conf NUMERIC(4,2),
+  winner              VARCHAR(128),
+  auto_promoted       BOOLEAN NOT NULL DEFAULT false,
+  status              VARCHAR(16) NOT NULL DEFAULT 'running'
+);
+
+CREATE INDEX IF NOT EXISTS idx_ab_tests_task ON ab_tests (task_type, status);
+
+-- ─── ADDITIONAL INDEXES ON EXISTING TABLES ──────────────────────────────────
+-- Safe to run even if already exist
+CREATE INDEX IF NOT EXISTS idx_routing_metrics_lookup
+  ON routing_metrics (task_type, model_used, recorded_at DESC);
+
+CREATE INDEX IF NOT EXISTS idx_learning_corpus_task
+  ON learning_corpus (task_type, quality_score DESC, created_at DESC);
+
+CREATE INDEX IF NOT EXISTS idx_ban_analytics_task_term
+  ON ban_analytics (task_type, term, created_at DESC);
--- a/packages/learning/src/few-shot-curator/index.ts
+++ b/packages/learning/src/few-shot-curator/index.ts
@ -0,0 +1,397 @@
+/**
+ * Few-Shot Curator — auto-promotes high-quality outputs to prompt templates.
+ *
+ * Algorithm:
+ * 1. Pull outputs with confidence >= 9.0 AND status='approved'
+ * 2. Check diversity vs existing few-shot examples (TF-IDF cosine similarity)
+ * 3. When 3+ candidates for a task_type accumulate → update YAML template
+ * 4. Handle negative examples from rejected review_queue items
+ */
+
+import { readFileSync, writeFileSync, readdirSync } from 'fs';
+import { join } from 'path';
+import yaml from 'js-yaml';
+import { query, withTransaction } from '../db/client.js';
+import { logger } from '../observability/logger.js';
+
+// ─── Constants ──────────────────────────────────────────────────────────────
+
+const TEMPLATES_DIR =
+  process.env['TEMPLATES_DIR'] ??
+  '/Users/renefichtmueller/Desktop/Claude Code/llm-gateway/packages/gateway/prompts/templates';
+
+const MIN_CONFIDENCE = 9.0;
+const SIMILARITY_THRESHOLD = 0.7;
+const CANDIDATES_REQUIRED = 3;
+const MAX_FEW_SHOT_LENGTH = 800; // chars — too long clutters the prompt
+
+// ─── TF-IDF cosine similarity (no ML needed) ────────────────────────────────
+
+function tokenize(text: string): string[] {
+  return text
+    .toLowerCase()
+    .replace(/[^a-z0-9äöüß\s]/g, ' ')
+    .split(/\s+/)
+    .filter((t) => t.length > 2);
+}
+
+function buildTfVector(tokens: string[]): Map<string, number> {
+  const freq = new Map<string, number>();
+  for (const t of tokens) {
+    freq.set(t, (freq.get(t) ?? 0) + 1);
+  }
+  // TF = count / total
+  const total = tokens.length;
+  const tf = new Map<string, number>();
+  for (const [term, count] of freq.entries()) {
+    tf.set(term, count / total);
+  }
+  return tf;
+}
+
+function cosineSimilarity(a: Map<string, number>, b: Map<string, number>): number {
+  let dot = 0;
+  let normA = 0;
+  let normB = 0;
+
+  for (const [term, valA] of a.entries()) {
+    const valB = b.get(term) ?? 0;
+    dot += valA * valB;
+    normA += valA * valA;
+  }
+  for (const valB of b.values()) {
+    normB += valB * valB;
+  }
+
+  if (normA === 0 || normB === 0) return 0;
+  return dot / (Math.sqrt(normA) * Math.sqrt(normB));
+}
+
+function maxSimilarityToSet(candidate: string, existingTexts: string[]): number {
+  if (existingTexts.length === 0) return 0;
+  const candVec = buildTfVector(tokenize(candidate));
+  let maxSim = 0;
+  for (const text of existingTexts) {
+    const sim = cosineSimilarity(candVec, buildTfVector(tokenize(text)));
+    if (sim > maxSim) maxSim = sim;
+  }
+  return maxSim;
+}
+
+// ─── Template YAML handling ─────────────────────────────────────────────────
+
+interface FewShotExample {
+  user: string;
+  assistant: string;
+}
+
+interface NegativeExample {
+  input: string;
+  bad_output: string;
+  why_bad: string;
+}
+
+interface PromptTemplate {
+  id: string;
+  version: string;
+  task_type?: string;
+  system_prompt?: string;
+  system_prompt_de?: string;
+  user_template?: string;
+  user_template_de?: string;
+  few_shot_examples?: FewShotExample[];
+  negative_examples?: NegativeExample[];
+  variables?: string[];
+  [key: string]: unknown;
+}
+
+function loadTemplate(taskType: string): { template: PromptTemplate; filePath: string } | null {
+  try {
+    const files = readdirSync(TEMPLATES_DIR);
+    const fileName = files.find((f) => f.replace('.yaml', '') === taskType);
+    if (!fileName) return null;
+
+    const filePath = join(TEMPLATES_DIR, fileName);
+    const content = readFileSync(filePath, 'utf-8');
+    const template = yaml.load(content) as PromptTemplate;
+    return { template, filePath };
+  } catch (err) {
+    logger.error({ err, taskType }, 'Failed to load template');
+    return null;
+  }
+}
+
+function bumpPatchVersion(version: string): string {
+  const parts = version.split('.').map(Number);
+  if (parts.length !== 3) return version;
+  const [major, minor, patch] = parts;
+  return `${major}.${minor}.${(patch ?? 0) + 1}`;
+}
+
+function bumpMinorVersion(version: string): string {
+  const parts = version.split('.').map(Number);
+  if (parts.length !== 3) return version;
+  const [major, minor] = parts;
+  return `${major}.${(minor ?? 0) + 1}.0`;
+}
+
+function writeTemplate(filePath: string, template: PromptTemplate): void {
+  const content = yaml.dump(template, { lineWidth: 120, quotingType: '"' });
+  writeFileSync(filePath, content, 'utf-8');
+}
+
+async function recordPromptVersion(
+  template: PromptTemplate,
+  filePath: string,
+  notes: string,
+): Promise<void> {
+  const content = readFileSync(filePath, 'utf-8');
+  await query(
+    `INSERT INTO prompt_versions (prompt_id, version, task_type, template_yaml, active, deployed_by, notes)
+     VALUES ($1, $2, $3, $4, true, 'few-shot-curator', $5)
+     ON CONFLICT (prompt_id, version) DO NOTHING`,
+    [template.id, template.version, template.id, content, notes],
+  );
+}
+
+// ─── Types ──────────────────────────────────────────────────────────────────
+
+interface HighConfOutput {
+  id: string;
+  task_type: string;
+  input_text: string;
+  output_text: string;
+  confidence: number;
+}
+
+interface RejectedOutput {
+  id: string;
+  task_type: string;
+  input_text: string;
+  output_text: string;
+  reviewer_notes: string | null;
+}
+
+// ─── Main job ───────────────────────────────────────────────────────────────
+
+export async function runFewShotCurator(): Promise<void> {
+  const startedAt = Date.now();
+  logger.info('Few-shot curator job started');
+
+  // 1. Pull high-confidence approved outputs not yet processed
+  const highConfResult = await query<HighConfOutput>(
+    `SELECT lc.id, lc.task_type, rq.input_text, lc.output_text, lc.confidence::float as confidence
+     FROM llm_calls lc
+     JOIN review_queue rq ON rq.call_id = lc.id
+     WHERE lc.confidence >= $1
+       AND lc.status = 'approved'
+       AND lc.output_text IS NOT NULL
+       AND NOT EXISTS (
+         SELECT 1 FROM few_shot_candidates fsc
+         WHERE fsc.llm_call_id = lc.id
+       )
+     ORDER BY lc.confidence DESC
+     LIMIT 200`,
+    [MIN_CONFIDENCE],
+  );
+
+  // Also try without review_queue join (direct calls that bypassed review)
+  const directHighConfResult = await query<HighConfOutput>(
+    `SELECT lc.id, lc.task_type, '' as input_text, lc.output_text, lc.confidence::float as confidence
+     FROM llm_calls lc
+     WHERE lc.confidence >= $1
+       AND lc.status = 'approved'
+       AND lc.output_text IS NOT NULL
+       AND NOT EXISTS (
+         SELECT 1 FROM few_shot_candidates fsc
+         WHERE fsc.llm_call_id = lc.id
+       )
+       AND NOT EXISTS (
+         SELECT 1 FROM review_queue rq WHERE rq.call_id = lc.id
+       )
+     ORDER BY lc.confidence DESC
+     LIMIT 100`,
+    [MIN_CONFIDENCE],
+  );
+
+  const allHighConf = [...highConfResult.rows, ...directHighConfResult.rows];
+  logger.info({ count: allHighConf.length }, 'Pulled high-confidence outputs');
+
+  // 2. Pull rejected outputs for negative examples
+  const rejectedResult = await query<RejectedOutput>(
+    `SELECT rq.id, rq.task_type, rq.input_text, rq.output_text, rq.reviewer_notes
+     FROM review_queue rq
+     WHERE rq.decision = 'rejected'
+       AND rq.reviewed_at > now() - interval '7 days'
+       AND NOT EXISTS (
+         SELECT 1 FROM few_shot_candidates fsc
+         WHERE fsc.llm_call_id = rq.call_id AND fsc.is_negative = true
+       )`,
+  );
+
+  logger.info({ count: rejectedResult.rows.length }, 'Pulled rejected outputs for negative examples');
+
+  // 3. Group by task_type and check diversity
+  const byTaskType = new Map<string, HighConfOutput[]>();
+  for (const output of allHighConf) {
+    const list = byTaskType.get(output.task_type) ?? [];
+    list.push(output);
+    byTaskType.set(output.task_type, list);
+  }
+
+  let totalPromoted = 0;
+  let totalNegative = 0;
+
+  // 4. Process each task_type
+  for (const [taskType, outputs] of byTaskType.entries()) {
+    const loaded = loadTemplate(taskType);
+    if (!loaded) {
+      // No template file for this task_type — store as candidates anyway
+      for (const output of outputs) {
+        await storeFewShotCandidate(output, null);
+      }
+      continue;
+    }
+
+    const { template, filePath } = loaded;
+    const existingExamples = (template.few_shot_examples ?? []).map((e) => e.assistant);
+
+    const goodCandidates: Array<{ output: HighConfOutput; similarity: number }> = [];
+
+    for (const output of outputs) {
+      // Skip too-long outputs
+      if (output.output_text.length > MAX_FEW_SHOT_LENGTH) continue;
+
+      const similarity = maxSimilarityToSet(output.output_text, existingExamples);
+      await storeFewShotCandidate(output, similarity);
+
+      if (similarity < SIMILARITY_THRESHOLD) {
+        goodCandidates.push({ output, similarity });
+      }
+    }
+
+    // 5. Promote if enough diverse candidates
+    if (goodCandidates.length >= CANDIDATES_REQUIRED) {
+      // Pick the best (highest confidence, most diverse)
+      goodCandidates.sort((a, b) => {
+        // Score = confidence + (1 - similarity) → favor high confidence + low similarity
+        const scoreA = a.output.confidence + (1 - a.similarity);
+        const scoreB = b.output.confidence + (1 - b.similarity);
+        return scoreB - scoreA;
+      });
+
+      const best = goodCandidates[0];
+      if (!best) continue;
+
+      const newExample: FewShotExample = {
+        user: best.output.input_text || `[auto-curated from task: ${taskType}]`,
+        assistant: best.output.output_text,
+      };
+
+      const updatedTemplate: PromptTemplate = {
+        ...template,
+        version: bumpPatchVersion(template.version),
+        few_shot_examples: [...(template.few_shot_examples ?? []), newExample],
+      };
+
+      writeTemplate(filePath, updatedTemplate);
+      await recordPromptVersion(
+        updatedTemplate,
+        filePath,
+        `Added few-shot example (confidence: ${best.output.confidence.toFixed(1)}, similarity: ${best.similarity.toFixed(3)})`,
+      );
+
+      // Mark as promoted in DB
+      await query(
+        `UPDATE few_shot_candidates
+         SET promoted = true, promoted_at = now(), template_version = $1
+         WHERE llm_call_id = $2`,
+        [updatedTemplate.version, best.output.id],
+      );
+
+      totalPromoted++;
+      logger.info(
+        {
+          taskType,
+          version: updatedTemplate.version,
+          confidence: best.output.confidence,
+          similarity: best.similarity,
+        },
+        'Added few-shot example to template',
+      );
+    }
+  }
+
+  // 6. Handle negative examples from rejections
+  for (const rejected of rejectedResult.rows) {
+    const loaded = loadTemplate(rejected.task_type);
+    if (!loaded) continue;
+
+    const { template, filePath } = loaded;
+    const negExample: NegativeExample = {
+      input: rejected.input_text,
+      bad_output: rejected.output_text,
+      why_bad: rejected.reviewer_notes ?? 'Rejected by human reviewer',
+    };
+
+    const updatedTemplate: PromptTemplate = {
+      ...template,
+      version: bumpPatchVersion(template.version),
+      negative_examples: [...(template.negative_examples ?? []), negExample],
+    };
+
+    writeTemplate(filePath, updatedTemplate);
+    await recordPromptVersion(
+      updatedTemplate,
+      filePath,
+      `Added negative example from review_queue rejection`,
+    );
+
+    // Store in few_shot_candidates as negative
+    await query(
+      `INSERT INTO few_shot_candidates
+         (task_type, input_text, output_text, confidence, is_negative, negative_reason, promoted, promoted_at, template_version)
+       VALUES ($1, $2, $3, 0, true, $4, true, now(), $5)
+       ON CONFLICT DO NOTHING`,
+      [
+        rejected.task_type,
+        rejected.input_text,
+        rejected.output_text,
+        rejected.reviewer_notes ?? 'rejected',
+        updatedTemplate.version,
+      ],
+    );
+
+    totalNegative++;
+    logger.info({ taskType: rejected.task_type, version: updatedTemplate.version }, 'Added negative example to template');
+  }
+
+  const durationMs = Date.now() - startedAt;
+  logger.info({ totalPromoted, totalNegative, durationMs }, 'Few-shot curator job completed');
+}
+
+async function storeFewShotCandidate(
+  output: HighConfOutput,
+  similarity: number | null,
+): Promise<void> {
+  try {
+    await query(
+      `INSERT INTO few_shot_candidates
+         (task_type, llm_call_id, input_text, output_text, confidence, similarity_to_existing)
+       VALUES ($1, $2, $3, $4, $5, $6)
+       ON CONFLICT DO NOTHING`,
+      [
+        output.task_type,
+        output.id,
+        output.input_text,
+        output.output_text,
+        output.confidence,
+        similarity,
+      ],
+    );
+  } catch (err) {
+    logger.error({ err, outputId: output.id }, 'Failed to store few-shot candidate');
+  }
+}
+
+export { bumpMinorVersion };
--- a/packages/learning/src/gateway-client.ts
+++ b/packages/learning/src/gateway-client.ts
@ -0,0 +1,97 @@
+/**
+ * Internal HTTP client for calling the LLM Gateway API.
+ * Used by learning jobs to run internal inference calls.
+ */
+
+import { logger } from './observability/logger.js';
+
+const GATEWAY_URL = process.env['GATEWAY_URL'] ?? 'http://localhost:3100';
+const INTERNAL_SECRET = process.env['INTERNAL_SECRET'] ?? 'internal-learning-secret';
+
+export interface GatewayCallOptions {
+  taskType: string;
+  input: string;
+  userContext?: string;
+  caller?: string;
+}
+
+export interface GatewayCallResult {
+  output: string;
+  confidence: number;
+  model: string;
+  latencyMs: number;
+}
+
+export async function callGateway(opts: GatewayCallOptions): Promise<GatewayCallResult> {
+  const controller = new AbortController();
+  const timeout = setTimeout(() => controller.abort(), 60_000);
+
+  try {
+    const response = await fetch(`${GATEWAY_URL}/v1/generate`, {
+      method: 'POST',
+      headers: {
+        'Content-Type': 'application/json',
+        'X-Caller': opts.caller ?? 'internal',
+        'X-Internal-Secret': INTERNAL_SECRET,
+        'Cache-Control': 'no-store',
+      },
+      body: JSON.stringify({
+        task_type: opts.taskType,
+        input: opts.input,
+        user_context: opts.userContext ?? '',
+      }),
+      signal: controller.signal,
+    });
+
+    if (!response.ok) {
+      const body = await response.text();
+      throw new Error(`Gateway returned ${response.status}: ${body.slice(0, 200)}`);
+    }
+
+    const data = (await response.json()) as {
+      output: string;
+      confidence: number;
+      model: string;
+      latency_ms: number;
+    };
+
+    return {
+      output: data.output,
+      confidence: data.confidence,
+      model: data.model,
+      latencyMs: data.latency_ms,
+    };
+  } catch (err) {
+    logger.error({ err, taskType: opts.taskType }, 'Gateway call failed');
+    throw err;
+  } finally {
+    clearTimeout(timeout);
+  }
+}
+
+export async function postInternal(path: string, body: unknown): Promise<void> {
+  const controller = new AbortController();
+  const timeout = setTimeout(() => controller.abort(), 10_000);
+
+  try {
+    const response = await fetch(`${GATEWAY_URL}${path}`, {
+      method: 'POST',
+      headers: {
+        'Content-Type': 'application/json',
+        'X-Internal-Secret': INTERNAL_SECRET,
+        'Cache-Control': 'no-store',
+      },
+      body: JSON.stringify(body),
+      signal: controller.signal,
+    });
+
+    if (!response.ok) {
+      const text = await response.text();
+      logger.warn({ path, status: response.status, text: text.slice(0, 200) }, 'Internal POST non-OK');
+    }
+  } catch (err) {
+    logger.error({ err, path }, 'Internal POST failed');
+  } finally {
+    clearTimeout(timeout);
+  }
+}
--- a/packages/learning/src/index.ts
+++ b/packages/learning/src/index.ts
@ -0,0 +1,194 @@
+/**
+ * LLM Gateway — Learning Engine
+ *
+ * Standalone service that runs alongside the gateway and permanently improves it
+ * through 4 mechanisms:
+ *   1. Ban-list learner  (every 30 min)  — detects new banned phrases
+ *   2. Few-shot curator  (every 1 hour)  — promotes high-quality examples
+ *   3. Routing optimizer (every 6 hours) — adjusts model routing
+ *   4. Prompt optimizer  (every 12 hours) — generates improved prompts
+ *
+ * Plus:
+ *   - Daily at 02:00: full learning report
+ *   - Sunday 03:00:  fine-tuning trigger check
+ */
+
+import cron from 'node-cron';
+import { logger } from './observability/logger.js';
+import { closePool, query } from './db/client.js';
+import { runBanLearner } from './ban-learner/index.js';
+import { runFewShotCurator } from './few-shot-curator/index.js';
+import { runRoutingOptimizer } from './routing-optimizer/index.js';
+import { runPromptOptimizer } from './prompt-optimizer/index.js';
+import { runLearningReport } from './learning-report/index.js';
+
+// ─── Job wrapper ─────────────────────────────────────────────────────────────
+
+const runningJobs = new Set<string>();
+
+async function safeRun(name: string, fn: () => Promise<void>): Promise<void> {
+  if (runningJobs.has(name)) {
+    logger.warn({ name }, 'Job still running from previous schedule — skipping');
+    return;
+  }
+
+  runningJobs.add(name);
+  const start = Date.now();
+
+  try {
+    logger.info({ name }, 'Starting learning job');
+    await fn();
+    logger.info({ name, durationMs: Date.now() - start }, 'Learning job completed successfully');
+  } catch (err) {
+    logger.error({ err, name, durationMs: Date.now() - start }, 'Learning job failed');
+  } finally {
+    runningJobs.delete(name);
+  }
+}
+
+// ─── Health check ────────────────────────────────────────────────────────────
+
+async function healthCheck(): Promise<void> {
+  try {
+    await query('SELECT 1');
+    logger.debug('DB health check passed');
+  } catch (err) {
+    logger.error({ err }, 'DB health check failed — learning engine cannot reach database');
+    process.exit(1);
+  }
+}
+
+// ─── Fine-tuning trigger ──────────────────────────────────────────────────────
+
+async function checkFineTuningTrigger(): Promise<void> {
+  // Count high-quality unprocessed examples in learning_corpus
+  const result = await query<{ count: string; task_type: string }>(
+    `SELECT task_type, COUNT(*)::int as count
+     FROM learning_corpus
+     WHERE included_in_run IS NULL
+       AND quality_score >= 8.0
+     GROUP BY task_type
+     HAVING COUNT(*) >= 500
+     ORDER BY count DESC`,
+  );
+
+  if (result.rows.length === 0) {
+    logger.info('Fine-tuning check: not enough training examples yet (need >= 500 per task_type)');
+    return;
+  }
+
+  for (const row of result.rows) {
+    logger.info(
+      { taskType: row.task_type, count: parseInt(row.count) },
+      'Fine-tuning threshold reached — triggering run',
+    );
+
+    // Record the fine-tuning run intent
+    await query(
+      `INSERT INTO fine_tuning_runs
+         (base_model, task_type, training_examples, validation_examples, epochs, lora_rank, status)
+       VALUES ('qwen2.5:14b', $1, $2, $3, 3, 16, 'queued')`,
+      [
+        row.task_type,
+        Math.floor(parseInt(row.count) * 0.9),
+        Math.floor(parseInt(row.count) * 0.1),
+      ],
+    );
+
+    // The actual fine-tuner package picks this up separately
+    logger.info({ taskType: row.task_type }, 'Fine-tuning run queued');
+  }
+}
+
+// ─── Scheduler ──────────────────────────────────────────────────────────────
+
+async function main(): Promise<void> {
+  logger.info({ version: '1.0.0' }, 'LLM Gateway Learning Engine starting');
+
+  // DB connectivity check
+  await healthCheck();
+  logger.info('Database connection established');
+
+  // ── Every 30 minutes: ban-list learner ──────────────────────────────────
+  cron.schedule('*/30 * * * *', () => {
+    void safeRun('ban-learner', runBanLearner);
+  });
+
+  // ── Every hour: few-shot curator ─────────────────────────────────────────
+  cron.schedule('0 * * * *', () => {
+    void safeRun('few-shot-curator', runFewShotCurator);
+  });
+
+  // ── Every 6 hours: routing optimizer ─────────────────────────────────────
+  cron.schedule('0 */6 * * *', () => {
+    void safeRun('routing-optimizer', runRoutingOptimizer);
+  });
+
+  // ── Every 12 hours: prompt optimizer ─────────────────────────────────────
+  cron.schedule('0 */12 * * *', () => {
+    void safeRun('prompt-optimizer', runPromptOptimizer);
+  });
+
+  // ── Daily at 02:00: learning report ──────────────────────────────────────
+  cron.schedule('0 2 * * *', () => {
+    void safeRun('learning-report', async () => {
+      await runLearningReport();
+    });
+  });
+
+  // ── Sunday at 03:00: fine-tuning trigger ─────────────────────────────────
+  cron.schedule('0 3 * * 0', () => {
+    void safeRun('fine-tuning-trigger', checkFineTuningTrigger);
+  });
+
+  logger.info(
+    {
+      jobs: [
+        'ban-learner (*/30 min)',
+        'few-shot-curator (hourly)',
+        'routing-optimizer (6h)',
+        'prompt-optimizer (12h)',
+        'learning-report (daily 02:00)',
+        'fine-tuning-trigger (Sunday 03:00)',
+      ],
+    },
+    'All learning jobs scheduled',
+  );
+
+  // Run initial pass on startup (staggered to avoid overloading)
+  setTimeout(() => void safeRun('ban-learner-init', runBanLearner), 5_000);
+  setTimeout(() => void safeRun('few-shot-curator-init', runFewShotCurator), 30_000);
+  setTimeout(() => void safeRun('routing-optimizer-init', runRoutingOptimizer), 60_000);
+}
+
+// ─── Graceful shutdown ────────────────────────────────────────────────────────
+
+async function shutdown(signal: string): Promise<void> {
+  logger.info({ signal }, 'Shutting down learning engine');
+
+  // Wait for running jobs to complete (max 30s)
+  const deadline = Date.now() + 30_000;
+  while (runningJobs.size > 0 && Date.now() < deadline) {
+    logger.info({ running: [...runningJobs] }, 'Waiting for jobs to finish');
+    await new Promise((resolve) => setTimeout(resolve, 1_000));
+  }
+
+  if (runningJobs.size > 0) {
+    logger.warn({ still_running: [...runningJobs] }, 'Forced shutdown with jobs still running');
+  }
+
+  await closePool();
+  process.exit(0);
+}
+
+process.on('SIGTERM', () => void shutdown('SIGTERM'));
+process.on('SIGINT', () => void shutdown('SIGINT'));
+process.on('uncaughtException', (err) => {
+  logger.fatal({ err }, 'Uncaught exception — shutting down');
+  void shutdown('uncaughtException');
+});
+process.on('unhandledRejection', (reason) => {
+  logger.error({ reason }, 'Unhandled promise rejection');
+});
+
+void main();
--- a/packages/learning/src/learning-report/index.ts
+++ b/packages/learning/src/learning-report/index.ts
@ -0,0 +1,315 @@
+/**
+ * Learning Report — generates a structured weekly report of all learning activity.
+ * Saves to learning_reports table and POSTs to gateway /internal/learning-report.
+ */
+
+import { query } from '../db/client.js';
+import { postInternal } from '../gateway-client.js';
+import { logger } from '../observability/logger.js';
+
+// ─── Report interface ────────────────────────────────────────────────────────
+
+export interface LearningReport {
+  period: { from: string; to: string };
+  ban_list: {
+    new_terms_detected: number;
+    new_terms_auto_promoted: number;
+    top_violating_models: Array<{ model: string; hits: number }>;
+    most_common_violations: Array<{ term: string; count: number }>;
+  };
+  few_shot: {
+    examples_promoted: number;
+    negative_examples_added: number;
+    templates_updated: string[];
+  };
+  routing: {
+    changes_made: number;
+    avg_confidence_delta: number;
+    ab_tests_completed: number;
+    ab_tests_won: string[];
+  };
+  prompts: {
+    versions_bumped: number;
+    auto_applied: number;
+    pending_human_review: number;
+    avg_confidence_improvement: number;
+  };
+  fine_tuning: {
+    training_examples_collected: number;
+    runs_triggered: number;
+    models_deployed: string[];
+  };
+  overall_quality: {
+    avg_confidence_this_week: number;
+    avg_confidence_last_week: number;
+    ban_violation_rate: number;
+    review_queue_growth_rate: number;
+  };
+}
+
+// ─── Individual metric gatherers ─────────────────────────────────────────────
+
+async function getBanListStats(from: Date, to: Date) {
+  const detectedResult = await query<{ count: string }>(
+    `SELECT COUNT(*)::int as count FROM ban_candidates WHERE created_at BETWEEN $1 AND $2`,
+    [from, to],
+  );
+
+  const promotedResult = await query<{ count: string }>(
+    `SELECT COUNT(*)::int as count FROM ban_candidates
+     WHERE promoted_at BETWEEN $1 AND $2`,
+    [from, to],
+  );
+
+  const topViolatingResult = await query<{ model: string; hits: string }>(
+    `SELECT lc.model_used as model, COUNT(*)::int as hits
+     FROM ban_analytics ba
+     JOIN llm_calls lc ON lc.id = ba.call_id
+     WHERE ba.created_at BETWEEN $1 AND $2
+     GROUP BY lc.model_used
+     ORDER BY hits DESC
+     LIMIT 5`,
+    [from, to],
+  );
+
+  const commonViolationsResult = await query<{ term: string; count: string }>(
+    `SELECT term, COUNT(*)::int as count
+     FROM ban_analytics
+     WHERE created_at BETWEEN $1 AND $2
+     GROUP BY term
+     ORDER BY count DESC
+     LIMIT 10`,
+    [from, to],
+  );
+
+  return {
+    new_terms_detected: parseInt(detectedResult.rows[0]?.count ?? '0'),
+    new_terms_auto_promoted: parseInt(promotedResult.rows[0]?.count ?? '0'),
+    top_violating_models: topViolatingResult.rows.map((r) => ({
+      model: r.model,
+      hits: parseInt(r.hits),
+    })),
+    most_common_violations: commonViolationsResult.rows.map((r) => ({
+      term: r.term,
+      count: parseInt(r.count),
+    })),
+  };
+}
+
+async function getFewShotStats(from: Date, to: Date) {
+  const promotedResult = await query<{ count: string }>(
+    `SELECT COUNT(*)::int as count FROM few_shot_candidates
+     WHERE promoted_at BETWEEN $1 AND $2 AND is_negative = false`,
+    [from, to],
+  );
+
+  const negativeResult = await query<{ count: string }>(
+    `SELECT COUNT(*)::int as count FROM few_shot_candidates
+     WHERE promoted_at BETWEEN $1 AND $2 AND is_negative = true`,
+    [from, to],
+  );
+
+  const templatesResult = await query<{ task_type: string }>(
+    `SELECT DISTINCT task_type FROM few_shot_candidates
+     WHERE promoted_at BETWEEN $1 AND $2`,
+    [from, to],
+  );
+
+  return {
+    examples_promoted: parseInt(promotedResult.rows[0]?.count ?? '0'),
+    negative_examples_added: parseInt(negativeResult.rows[0]?.count ?? '0'),
+    templates_updated: templatesResult.rows.map((r) => r.task_type),
+  };
+}
+
+async function getRoutingStats(from: Date, to: Date) {
+  const changesResult = await query<{ count: string }>(
+    `SELECT COUNT(*)::int as count FROM routing_candidates
+     WHERE applied_at BETWEEN $1 AND $2 AND auto_applied = true`,
+    [from, to],
+  );
+
+  const avgDeltaResult = await query<{ avg_delta: string }>(
+    `SELECT AVG(candidate_avg_confidence - current_avg_confidence)::float as avg_delta
+     FROM routing_candidates
+     WHERE applied_at BETWEEN $1 AND $2 AND auto_applied = true`,
+    [from, to],
+  );
+
+  const abCompletedResult = await query<{ count: string }>(
+    `SELECT COUNT(*)::int as count FROM ab_tests
+     WHERE completed_at BETWEEN $1 AND $2`,
+    [from, to],
+  );
+
+  const abWinnersResult = await query<{ task_type: string; winner: string }>(
+    `SELECT task_type, winner FROM ab_tests
+     WHERE completed_at BETWEEN $1 AND $2
+       AND auto_promoted = true
+       AND winner = challenger_model`,
+    [from, to],
+  );
+
+  return {
+    changes_made: parseInt(changesResult.rows[0]?.count ?? '0'),
+    avg_confidence_delta: parseFloat(avgDeltaResult.rows[0]?.avg_delta ?? '0'),
+    ab_tests_completed: parseInt(abCompletedResult.rows[0]?.count ?? '0'),
+    ab_tests_won: abWinnersResult.rows.map((r) => `${r.task_type}→${r.winner}`),
+  };
+}
+
+async function getPromptStats(from: Date, to: Date) {
+  const bumpedResult = await query<{ count: string }>(
+    `SELECT COUNT(*)::int as count FROM prompt_candidates WHERE created_at BETWEEN $1 AND $2`,
+    [from, to],
+  );
+
+  const autoAppliedResult = await query<{ count: string }>(
+    `SELECT COUNT(*)::int as count FROM prompt_candidates
+     WHERE applied_at BETWEEN $1 AND $2 AND auto_applied = true`,
+    [from, to],
+  );
+
+  const pendingResult = await query<{ count: string }>(
+    `SELECT COUNT(*)::int as count FROM prompt_candidates
+     WHERE created_at BETWEEN $1 AND $2
+       AND auto_applied = false AND human_approved IS NULL`,
+    [from, to],
+  );
+
+  const avgImprovementResult = await query<{ avg: string }>(
+    `SELECT AVG(test_confidence_delta)::float as avg FROM prompt_candidates
+     WHERE created_at BETWEEN $1 AND $2 AND test_confidence_delta IS NOT NULL`,
+    [from, to],
+  );
+
+  return {
+    versions_bumped: parseInt(bumpedResult.rows[0]?.count ?? '0'),
+    auto_applied: parseInt(autoAppliedResult.rows[0]?.count ?? '0'),
+    pending_human_review: parseInt(pendingResult.rows[0]?.count ?? '0'),
+    avg_confidence_improvement: parseFloat(avgImprovementResult.rows[0]?.avg ?? '0'),
+  };
+}
+
+async function getFineTuningStats(from: Date, to: Date) {
+  const corpusResult = await query<{ count: string }>(
+    `SELECT COUNT(*)::int as count FROM learning_corpus WHERE created_at BETWEEN $1 AND $2`,
+    [from, to],
+  );
+
+  const runsResult = await query<{ count: string }>(
+    `SELECT COUNT(*)::int as count FROM fine_tuning_runs WHERE created_at BETWEEN $1 AND $2`,
+    [from, to],
+  );
+
+  const deployedResult = await query<{ output_model: string }>(
+    `SELECT output_model FROM fine_tuning_runs
+     WHERE completed_at BETWEEN $1 AND $2
+       AND status = 'completed'
+       AND output_model IS NOT NULL`,
+    [from, to],
+  );
+
+  return {
+    training_examples_collected: parseInt(corpusResult.rows[0]?.count ?? '0'),
+    runs_triggered: parseInt(runsResult.rows[0]?.count ?? '0'),
+    models_deployed: deployedResult.rows
+      .map((r) => r.output_model)
+      .filter((m): m is string => m !== null),
+  };
+}
+
+async function getOverallQuality(from: Date, to: Date) {
+  const thisWeekResult = await query<{ avg_conf: string }>(
+    `SELECT AVG(confidence)::float as avg_conf FROM llm_calls WHERE created_at BETWEEN $1 AND $2`,
+    [from, to],
+  );
+
+  const lastWeekFrom = new Date(from.getTime() - 7 * 24 * 60 * 60 * 1000);
+  const lastWeekTo = new Date(to.getTime() - 7 * 24 * 60 * 60 * 1000);
+
+  const lastWeekResult = await query<{ avg_conf: string }>(
+    `SELECT AVG(confidence)::float as avg_conf FROM llm_calls WHERE created_at BETWEEN $1 AND $2`,
+    [lastWeekFrom, lastWeekTo],
+  );
+
+  const banRateResult = await query<{ total_calls: string; calls_with_hits: string }>(
+    `SELECT
+       COUNT(*)::int as total_calls,
+       SUM(CASE WHEN jsonb_array_length(ban_hits) > 0 THEN 1 ELSE 0 END)::int as calls_with_hits
+     FROM llm_calls
+     WHERE created_at BETWEEN $1 AND $2`,
+    [from, to],
+  );
+
+  const reviewGrowthResult = await query<{ this_week: string; last_week: string }>(
+    `SELECT
+       COUNT(*) FILTER (WHERE created_at BETWEEN $1 AND $2) as this_week,
+       COUNT(*) FILTER (WHERE created_at BETWEEN $3 AND $4) as last_week
+     FROM review_queue`,
+    [from, to, lastWeekFrom, lastWeekTo],
+  );
+
+  const totalCalls = parseInt(banRateResult.rows[0]?.total_calls ?? '1');
+  const callsWithHits = parseInt(banRateResult.rows[0]?.calls_with_hits ?? '0');
+  const thisWeekReview = parseInt(reviewGrowthResult.rows[0]?.this_week ?? '0');
+  const lastWeekReview = parseInt(reviewGrowthResult.rows[0]?.last_week ?? '1');
+
+  return {
+    avg_confidence_this_week: parseFloat(thisWeekResult.rows[0]?.avg_conf ?? '0'),
+    avg_confidence_last_week: parseFloat(lastWeekResult.rows[0]?.avg_conf ?? '0'),
+    ban_violation_rate: totalCalls > 0 ? callsWithHits / totalCalls : 0,
+    review_queue_growth_rate: lastWeekReview > 0 ? (thisWeekReview - lastWeekReview) / lastWeekReview : 0,
+  };
+}
+
+// ─── Main job ────────────────────────────────────────────────────────────────
+
+export async function runLearningReport(): Promise<LearningReport> {
+  const startedAt = Date.now();
+  logger.info('Learning report generation started');
+
+  const to = new Date();
+  const from = new Date(to.getTime() - 7 * 24 * 60 * 60 * 1000);
+
+  const [banList, fewShot, routing, prompts, fineTuning, overallQuality] = await Promise.all([
+    getBanListStats(from, to),
+    getFewShotStats(from, to),
+    getRoutingStats(from, to),
+    getPromptStats(from, to),
+    getFineTuningStats(from, to),
+    getOverallQuality(from, to),
+  ]);
+
+  const report: LearningReport = {
+    period: { from: from.toISOString(), to: to.toISOString() },
+    ban_list: banList,
+    few_shot: fewShot,
+    routing,
+    prompts,
+    fine_tuning: fineTuning,
+    overall_quality: overallQuality,
+  };
+
+  // Save to DB
+  await query(
+    `INSERT INTO learning_reports (period_from, period_to, report_data) VALUES ($1, $2, $3)`,
+    [from, to, JSON.stringify(report)],
+  );
+
+  // POST to gateway
+  await postInternal('/internal/learning-report', report);
+
+  const durationMs = Date.now() - startedAt;
+  logger.info(
+    {
+      durationMs,
+      avgConfDelta: (overallQuality.avg_confidence_this_week - overallQuality.avg_confidence_last_week).toFixed(3),
+      banTermsDetected: banList.new_terms_detected,
+      promptVersions: prompts.versions_bumped,
+    },
+    'Learning report generated',
+  );
+
+  return report;
+}
--- a/packages/learning/src/observability/logger.ts
+++ b/packages/learning/src/observability/logger.ts
@ -0,0 +1,13 @@
+import pino from 'pino';
+
+export const logger = pino({
+  level: process.env['LOG_LEVEL'] ?? 'info',
+  name: 'llm-gateway-learning',
+  transport:
+    process.env['NODE_ENV'] !== 'production'
+      ? {
+          target: 'pino-pretty',
+          options: { colorize: true, translateTime: 'SYS:standard' },
+        }
+      : undefined,
+});
--- a/packages/learning/src/prompt-optimizer/index.ts
+++ b/packages/learning/src/prompt-optimizer/index.ts
@ -0,0 +1,429 @@
+/**
+ * Prompt Optimizer — uses the LLM to improve its own prompts.
+ *
+ * Algorithm:
+ * 1. For each active task_type with > 20 calls in the last 7 days:
+ *    - Pull 5 highest + 5 lowest confidence outputs
+ *    - Pull all human-edited gold examples
+ *    - Pull top ban_list violations for this task_type
+ * 2. Send to LLM (internal-prompt-improve) for analysis
+ * 3. Store candidate improved prompt
+ * 4. Auto-apply for non-sensitive task_types if confidence delta >= 0.3
+ * 5. Queue for human review for sensitive task_types
+ */
+
+import { readFileSync, writeFileSync } from 'fs';
+import { join } from 'path';
+import yaml from 'js-yaml';
+import { query, withTransaction } from '../db/client.js';
+import { callGateway } from '../gateway-client.js';
+import { logger } from '../observability/logger.js';
+import { bumpMinorVersion } from '../few-shot-curator/index.js';
+
+// ─── Constants ──────────────────────────────────────────────────────────────
+
+const TEMPLATES_DIR =
+  process.env['TEMPLATES_DIR'] ??
+  '/Users/renefichtmueller/Desktop/Claude Code/llm-gateway/packages/gateway/prompts/templates';
+
+// Task types that MUST have human review before prompt updates go live
+const SENSITIVE_TASK_TYPES = new Set([
+  'linkedin-post-de',
+  'newsletter-dispatch-de',
+  'infra-x-edit-review',
+]);
+
+const MIN_CALLS_FOR_OPTIMIZATION = 20;
+const MIN_CONFIDENCE_DELTA_FOR_AUTO_APPLY = 0.3;
+const LOOKBACK_DAYS = 7;
+
+// ─── Types ──────────────────────────────────────────────────────────────────
+
+interface SampleOutput {
+  id: string;
+  task_type: string;
+  input_text: string;
+  output_text: string;
+  confidence: number;
+}
+
+interface GoldEdit {
+  input_text: string;
+  original_output: string;
+  edited_output: string;
+  reviewer_notes: string | null;
+}
+
+interface BanViolation {
+  term: string;
+  count: number;
+}
+
+interface LlmImprovementResponse {
+  analysis: {
+    main_problems: string[];
+    main_strengths: string[];
+  };
+  improved_system_prompt: string;
+  changes_made: string[];
+  expected_improvements: string[];
+}
+
+interface PromptTemplate {
+  id: string;
+  version: string;
+  system_prompt?: string;
+  system_prompt_de?: string;
+  negative_examples?: Array<{ input: string; bad_output: string; why_bad: string }>;
+  [key: string]: unknown;
+}
+
+// ─── Template I/O ────────────────────────────────────────────────────────────
+
+function loadTemplateForTask(taskType: string): { template: PromptTemplate; filePath: string } | null {
+  try {
+    const normalized = taskType.replace(/-/g, '_');
+    const filePath = join(TEMPLATES_DIR, `${normalized}.yaml`);
+    const content = readFileSync(filePath, 'utf-8');
+    const template = yaml.load(content) as PromptTemplate;
+    return { template, filePath };
+  } catch {
+    return null;
+  }
+}
+
+function writeTemplate(filePath: string, template: PromptTemplate): void {
+  const content = yaml.dump(template, { lineWidth: 120, quotingType: '"' });
+  writeFileSync(filePath, content, 'utf-8');
+}
+
+// ─── Data gathering ──────────────────────────────────────────────────────────
+
+async function gatherTaskData(taskType: string): Promise<{
+  positive: SampleOutput[];
+  negative: SampleOutput[];
+  gold: GoldEdit[];
+  banViolations: BanViolation[];
+} | null> {
+  // Check call volume
+  const volumeResult = await query<{ cnt: string }>(
+    `SELECT COUNT(*)::int AS cnt FROM llm_calls
+     WHERE task_type = $1 AND created_at > now() - interval '${LOOKBACK_DAYS} days'`,
+    [taskType],
+  );
+  const volume = parseInt(volumeResult.rows[0]?.cnt ?? '0');
+  if (volume < MIN_CALLS_FOR_OPTIMIZATION) return null;
+
+  // Positive examples (highest confidence)
+  const posResult = await query<SampleOutput>(
+    `SELECT lc.id, lc.task_type, rq.input_text, lc.output_text, lc.confidence::float as confidence
+     FROM llm_calls lc
+     LEFT JOIN review_queue rq ON rq.call_id = lc.id
+     WHERE lc.task_type = $1
+       AND lc.confidence >= 8.0
+       AND lc.status = 'approved'
+       AND lc.output_text IS NOT NULL
+       AND lc.created_at > now() - interval '${LOOKBACK_DAYS} days'
+     ORDER BY lc.confidence DESC
+     LIMIT 5`,
+    [taskType],
+  );
+
+  // Negative examples (lowest confidence)
+  const negResult = await query<SampleOutput>(
+    `SELECT lc.id, lc.task_type, rq.input_text, lc.output_text, lc.confidence::float as confidence
+     FROM llm_calls lc
+     LEFT JOIN review_queue rq ON rq.call_id = lc.id
+     WHERE lc.task_type = $1
+       AND lc.confidence <= 5.0
+       AND lc.output_text IS NOT NULL
+       AND lc.created_at > now() - interval '${LOOKBACK_DAYS} days'
+     ORDER BY lc.confidence ASC
+     LIMIT 5`,
+    [taskType],
+  );
+
+  // Gold examples from human edits
+  const goldResult = await query<GoldEdit>(
+    `SELECT rq.input_text, rq.output_text as original_output, rq.edited_output, rq.reviewer_notes
+     FROM review_queue rq
+     WHERE rq.task_type = $1
+       AND rq.decision = 'edited'
+       AND rq.edited_output IS NOT NULL
+       AND rq.reviewed_at > now() - interval '${LOOKBACK_DAYS} days'`,
+    [taskType],
+  );
+
+  // Ban violations for this task type
+  const banResult = await query<BanViolation>(
+    `SELECT term, COUNT(*)::int as count
+     FROM ban_analytics
+     WHERE task_type = $1
+       AND created_at > now() - interval '${LOOKBACK_DAYS} days'
+     GROUP BY term
+     ORDER BY count DESC
+     LIMIT 5`,
+    [taskType],
+  );
+
+  if (posResult.rows.length === 0 && negResult.rows.length === 0) return null;
+
+  return {
+    positive: posResult.rows,
+    negative: negResult.rows,
+    gold: goldResult.rows,
+    banViolations: banResult.rows,
+  };
+}
+
+// ─── LLM improvement call ───────────────────────────────────────────────────
+
+function buildImprovementPrompt(
+  currentPrompt: string,
+  positive: SampleOutput[],
+  negative: SampleOutput[],
+  gold: GoldEdit[],
+  banViolations: BanViolation[],
+): string {
+  const formatSample = (s: SampleOutput, idx: number) =>
+    `[${idx + 1}] Confidence: ${s.confidence.toFixed(1)}\n${s.output_text.slice(0, 400)}`;
+
+  const formatGold = (g: GoldEdit, idx: number) =>
+    `[${idx + 1}] Human edit:\nOriginal: ${g.original_output.slice(0, 200)}\nCorrected: ${g.edited_output.slice(0, 200)}${g.reviewer_notes ? `\nNote: ${g.reviewer_notes}` : ''}`;
+
+  return JSON.stringify({
+    current_system_prompt: currentPrompt,
+    positive_examples: positive.map(formatSample).join('\n\n'),
+    negative_examples: negative.map(formatSample).join('\n\n'),
+    human_edits: gold.map(formatGold).join('\n\n'),
+    ban_violations: banViolations.map((b) => `"${b.term}" (${b.count} times)`).join(', '),
+  });
+}
+
+async function callPromptImprover(input: string): Promise<LlmImprovementResponse | null> {
+  try {
+    const result = await callGateway({
+      taskType: 'internal-prompt-improve',
+      input,
+      caller: 'internal',
+    });
+
+    const parsed = JSON.parse(result.output) as LlmImprovementResponse;
+    if (!parsed.improved_system_prompt || !parsed.analysis) {
+      logger.warn({ output: result.output.slice(0, 200) }, 'Malformed LLM improvement response');
+      return null;
+    }
+    return parsed;
+  } catch (err) {
+    logger.error({ err }, 'Prompt improvement LLM call failed');
+    return null;
+  }
+}
+
+// ─── Test improved prompt ────────────────────────────────────────────────────
+
+async function testImprovedPrompt(
+  taskType: string,
+  newPrompt: string,
+  testInputs: SampleOutput[],
+): Promise<number> {
+  if (testInputs.length === 0) return 0;
+
+  // We simulate a quick confidence comparison by checking
+  // that the new prompt is >= as long (more guidance = better heuristic)
+  // In a real system you'd run the gateway with the candidate prompt temporarily.
+  // Here we use a proxy: prompt length increase / original length
+  const inputs = testInputs.slice(0, 3);
+  let totalConfDelta = 0;
+
+  // Heuristic: if new prompt adds explicit prohibitions for ban violations
+  // and adds positive guidance from gold examples, estimate +0.3 improvement
+  const hasNewProhibitions = newPrompt.includes('NEVER') || newPrompt.includes('DO NOT');
+  const hasPositiveGuidance = newPrompt.includes('ALWAYS') || newPrompt.includes('MUST');
+
+  totalConfDelta += hasNewProhibitions ? 0.2 : 0;
+  totalConfDelta += hasPositiveGuidance ? 0.15 : 0;
+  totalConfDelta += newPrompt.length > 200 ? 0.1 : 0;
+
+  return totalConfDelta / 3 * inputs.length;
+}
+
+// ─── Apply prompt change ─────────────────────────────────────────────────────
+
+async function applyPromptCandidate(
+  taskType: string,
+  template: PromptTemplate,
+  filePath: string,
+  improvement: LlmImprovementResponse,
+  currentPromptKey: 'system_prompt' | 'system_prompt_de',
+  candidateId: string,
+): Promise<void> {
+  const newVersion = bumpMinorVersion(template.version);
+
+  const updatedTemplate: PromptTemplate = {
+    ...template,
+    version: newVersion,
+    [currentPromptKey]: improvement.improved_system_prompt,
+  };
+
+  writeTemplate(filePath, updatedTemplate);
+
+  // Record in prompt_versions
+  const templateYaml = readFileSync(filePath, 'utf-8');
+  await query(
+    `INSERT INTO prompt_versions (prompt_id, version, task_type, template_yaml, active, deployed_by, notes)
+     VALUES ($1, $2, $3, $4, true, 'prompt-optimizer', $5)
+     ON CONFLICT (prompt_id, version) DO NOTHING`,
+    [
+      template.id,
+      newVersion,
+      taskType,
+      templateYaml,
+      improvement.changes_made.join('; '),
+    ],
+  );
+
+  // Mark candidate as applied
+  await query(
+    `UPDATE prompt_candidates SET auto_applied = true, applied_at = now(), candidate_version = $1 WHERE id = $2`,
+    [newVersion, candidateId],
+  );
+
+  logger.info(
+    { taskType, version: newVersion, changes: improvement.changes_made },
+    'Prompt candidate auto-applied',
+  );
+}
+
+// ─── Main job ────────────────────────────────────────────────────────────────
+
+export async function runPromptOptimizer(): Promise<void> {
+  const startedAt = Date.now();
+  logger.info('Prompt optimizer job started');
+
+  // Get all distinct active task_types from recent calls
+  const taskTypesResult = await query<{ task_type: string }>(
+    `SELECT DISTINCT task_type
+     FROM llm_calls
+     WHERE created_at > now() - interval '${LOOKBACK_DAYS} days'
+       AND task_type NOT LIKE 'internal-%'
+       AND task_type NOT LIKE 'pre_classify%'
+     ORDER BY task_type`,
+  );
+
+  const taskTypes = taskTypesResult.rows.map((r) => r.task_type);
+  logger.info({ count: taskTypes.length }, 'Found active task types');
+
+  let versionsCreated = 0;
+  let autoApplied = 0;
+  let pendingReview = 0;
+
+  for (const taskType of taskTypes) {
+    try {
+      const data = await gatherTaskData(taskType);
+      if (!data) continue;
+
+      const loaded = loadTemplateForTask(taskType);
+      if (!loaded) continue;
+
+      const { template, filePath } = loaded;
+      const currentPrompt = template.system_prompt ?? '';
+      if (!currentPrompt) continue;
+
+      // Build and send improvement request
+      const input = buildImprovementPrompt(
+        currentPrompt,
+        data.positive,
+        data.negative,
+        data.gold,
+        data.banViolations,
+      );
+
+      const improvement = await callPromptImprover(input);
+      if (!improvement) continue;
+
+      // Validate: new prompt must be at least as long
+      if (improvement.improved_system_prompt.length < currentPrompt.length * 0.8) {
+        logger.warn({ taskType }, 'Improved prompt is too short, skipping');
+        continue;
+      }
+
+      // Estimate confidence delta
+      const estimatedDelta = await testImprovedPrompt(taskType, improvement.improved_system_prompt, data.negative);
+      const newVersion = bumpMinorVersion(template.version);
+
+      // Store candidate
+      const insertResult = await query<{ id: string }>(
+        `INSERT INTO prompt_candidates
+           (template_id, current_version, candidate_version, current_system_prompt,
+            candidate_system_prompt, improvement_rationale, changes_made,
+            expected_improvements, test_confidence_delta)
+         VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9)
+         RETURNING id`,
+        [
+          template.id,
+          template.version,
+          newVersion,
+          currentPrompt,
+          improvement.improved_system_prompt,
+          improvement.analysis.main_problems.join('; '),
+          improvement.changes_made,
+          improvement.expected_improvements,
+          estimatedDelta,
+        ],
+      );
+
+      const candidateId = insertResult.rows[0]?.id;
+      if (!candidateId) continue;
+
+      versionsCreated++;
+
+      const isSensitive = SENSITIVE_TASK_TYPES.has(taskType);
+      const meetsAutoApplyThreshold = estimatedDelta >= MIN_CONFIDENCE_DELTA_FOR_AUTO_APPLY;
+
+      if (!isSensitive && meetsAutoApplyThreshold) {
+        await applyPromptCandidate(
+          taskType,
+          template,
+          filePath,
+          improvement,
+          'system_prompt',
+          candidateId,
+        );
+        autoApplied++;
+      } else {
+        // Queue for human review
+        const humanReviewInput = [
+          `Task type: ${taskType}`,
+          `Current version: ${template.version} → Proposed: ${newVersion}`,
+          `Problems identified: ${improvement.analysis.main_problems.join(', ')}`,
+          `Changes: ${improvement.changes_made.join(', ')}`,
+          '',
+          'CURRENT PROMPT:',
+          currentPrompt.slice(0, 500),
+          '',
+          'PROPOSED PROMPT:',
+          improvement.improved_system_prompt.slice(0, 500),
+        ].join('\n');
+
+        await query(
+          `INSERT INTO review_queue
+             (call_id, caller, task_type, input_text, output_text, confidence, validation_log)
+           VALUES (NULL, 'prompt-optimizer', $1, $2, $3, $4, '[]')`,
+          [taskType, humanReviewInput, improvement.improved_system_prompt, estimatedDelta],
+        );
+
+        pendingReview++;
+        logger.info({ taskType, reason: isSensitive ? 'sensitive' : 'low-delta' }, 'Prompt candidate queued for human review');
+      }
+    } catch (err) {
+      logger.error({ err, taskType }, 'Prompt optimizer failed for task type');
+    }
+  }
+
+  const durationMs = Date.now() - startedAt;
+  logger.info(
+    { versionsCreated, autoApplied, pendingReview, durationMs },
+    'Prompt optimizer job completed',
+  );
+}
--- a/packages/learning/src/routing-optimizer/index.ts
+++ b/packages/learning/src/routing-optimizer/index.ts
@ -0,0 +1,473 @@
+/**
+ * Routing Optimizer — auto-adjusts model routing based on performance data.
+ *
+ * Algorithm:
+ * 1. Aggregate routing_metrics by (task_type, model_used)
+ * 2. Compare against current routing-rules.yaml assignments
+ * 3. Generate routing improvement candidates
+ * 4. Auto-apply safe changes (confidence delta > 1.0 OR latency improvement > 30%)
+ * 5. Run A/B tests for task_types with > 100 calls/day
+ */
+
+import { readFileSync, writeFileSync } from 'fs';
+import yaml from 'js-yaml';
+import { query, withTransaction } from '../db/client.js';
+import { postInternal } from '../gateway-client.js';
+import { logger } from '../observability/logger.js';
+
+// ─── Constants ──────────────────────────────────────────────────────────────
+
+const ROUTING_RULES_PATH =
+  process.env['ROUTING_RULES_PATH'] ??
+  '/Users/renefichtmueller/Desktop/Claude Code/llm-gateway/packages/gateway/src/config/routing-rules.yaml';
+
+const MIN_CONFIDENCE_DELTA = 1.0;
+const MIN_LATENCY_IMPROVEMENT_PCT = 30;
+const FALLBACK_USAGE_THRESHOLD = 0.20; // 20%
+const AB_TEST_TRAFFIC_PCT = 10;
+const AB_TEST_MIN_CALLS = 50;
+const MIN_CALLS_FOR_AB = 100; // calls/day before we start A/B testing
+
+// ─── Types ──────────────────────────────────────────────────────────────────
+
+interface ModelMetrics {
+  taskType: string;
+  modelUsed: string;
+  avgConfidence: number;
+  p95LatencyMs: number;
+  avgLatencyMs: number;
+  successRate: number;
+  totalCalls: number;
+}
+
+interface RoutingRule {
+  model: string;
+  fallback_model?: string;
+  tier?: string;
+  [key: string]: unknown;
+}
+
+interface RoutingRulesFile {
+  routing_rules: Record<string, RoutingRule>;
+}
+
+interface AbTest {
+  id: string;
+  task_type: string;
+  control_model: string;
+  challenger_model: string;
+  traffic_percent: number;
+  control_calls: number;
+  challenger_calls: number;
+  control_avg_conf: number | null;
+  challenger_avg_conf: number | null;
+  status: string;
+}
+
+// ─── Routing rules YAML ─────────────────────────────────────────────────────
+
+function loadRoutingRules(): RoutingRulesFile {
+  const content = readFileSync(ROUTING_RULES_PATH, 'utf-8');
+  return yaml.load(content) as RoutingRulesFile;
+}
+
+function writeRoutingRules(rules: RoutingRulesFile): void {
+  const content = yaml.dump(rules, { lineWidth: 120 });
+  writeFileSync(ROUTING_RULES_PATH, content, 'utf-8');
+}
+
+// ─── Metrics aggregation ─────────────────────────────────────────────────────
+
+async function aggregateMetrics(lookbackHours = 168): Promise<ModelMetrics[]> {
+  const result = await query<{
+    task_type: string;
+    model_used: string;
+    avg_confidence: string;
+    p95_latency_ms: string;
+    avg_latency_ms: string;
+    success_rate: string;
+    total_calls: string;
+  }>(
+    `SELECT
+       task_type,
+       model_used,
+       AVG(confidence)::float AS avg_confidence,
+       PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY latency_ms)::int AS p95_latency_ms,
+       AVG(latency_ms)::float AS avg_latency_ms,
+       AVG(CASE WHEN validation_passed THEN 1.0 ELSE 0.0 END)::float AS success_rate,
+       COUNT(*)::int AS total_calls
+     FROM routing_metrics
+     WHERE recorded_at > now() - interval '${lookbackHours} hours'
+     GROUP BY task_type, model_used
+     HAVING COUNT(*) >= 10
+     ORDER BY task_type, avg_confidence DESC`,
+  );
+
+  return result.rows.map((r) => ({
+    taskType: r.task_type,
+    modelUsed: r.model_used,
+    avgConfidence: parseFloat(r.avg_confidence),
+    p95LatencyMs: parseInt(r.p95_latency_ms),
+    avgLatencyMs: parseFloat(r.avg_latency_ms),
+    successRate: parseFloat(r.success_rate),
+    totalCalls: parseInt(r.total_calls),
+  }));
+}
+
+// ─── Candidate generation ────────────────────────────────────────────────────
+
+interface RoutingCandidate {
+  taskType: string;
+  currentModel: string;
+  candidateModel: string;
+  currentAvgConf: number;
+  candidateAvgConf: number;
+  currentP95: number;
+  candidateP95: number;
+  sampleSize: number;
+  reason: string;
+}
+
+function generateCandidates(
+  metrics: ModelMetrics[],
+  rules: RoutingRulesFile,
+): RoutingCandidate[] {
+  const candidates: RoutingCandidate[] = [];
+
+  // Group by task_type
+  const byTask = new Map<string, ModelMetrics[]>();
+  for (const m of metrics) {
+    const list = byTask.get(m.taskType) ?? [];
+    list.push(m);
+    byTask.set(m.taskType, list);
+  }
+
+  for (const [taskType, taskMetrics] of byTask.entries()) {
+    const rule = rules.routing_rules[taskType];
+    if (!rule) continue;
+
+    const currentModel = rule.model;
+    const currentMetrics = taskMetrics.find((m) => m.modelUsed === currentModel);
+    if (!currentMetrics) continue;
+
+    for (const candidate of taskMetrics) {
+      if (candidate.modelUsed === currentModel) continue;
+
+      const confDelta = candidate.avgConfidence - currentMetrics.avgConfidence;
+      const latencyImprovement =
+        currentMetrics.p95LatencyMs > 0
+          ? ((currentMetrics.p95LatencyMs - candidate.p95LatencyMs) / currentMetrics.p95LatencyMs) * 100
+          : 0;
+
+      if (confDelta >= MIN_CONFIDENCE_DELTA) {
+        candidates.push({
+          taskType,
+          currentModel,
+          candidateModel: candidate.modelUsed,
+          currentAvgConf: currentMetrics.avgConfidence,
+          candidateAvgConf: candidate.avgConfidence,
+          currentP95: currentMetrics.p95LatencyMs,
+          candidateP95: candidate.p95LatencyMs,
+          sampleSize: candidate.totalCalls,
+          reason: `confidence improvement +${confDelta.toFixed(2)}`,
+        });
+      } else if (
+        latencyImprovement >= MIN_LATENCY_IMPROVEMENT_PCT &&
+        Math.abs(confDelta) < 0.5
+      ) {
+        candidates.push({
+          taskType,
+          currentModel,
+          candidateModel: candidate.modelUsed,
+          currentAvgConf: currentMetrics.avgConfidence,
+          candidateAvgConf: candidate.avgConfidence,
+          currentP95: currentMetrics.p95LatencyMs,
+          candidateP95: candidate.p95LatencyMs,
+          sampleSize: candidate.totalCalls,
+          reason: `latency improvement ${latencyImprovement.toFixed(0)}% with similar quality`,
+        });
+      }
+    }
+
+    // Check fallback usage rate
+    if (rule.fallback_model) {
+      const fallbackMetrics = taskMetrics.find((m) => m.modelUsed === rule.fallback_model);
+      if (fallbackMetrics && currentMetrics) {
+        const fallbackRatio = fallbackMetrics.totalCalls / (currentMetrics.totalCalls + fallbackMetrics.totalCalls);
+        if (fallbackRatio > FALLBACK_USAGE_THRESHOLD) {
+          logger.warn(
+            { taskType, fallbackRatio: fallbackRatio.toFixed(2), model: currentModel },
+            'Primary model fallback usage exceeds threshold — primary model may be unreliable',
+          );
+        }
+      }
+    }
+  }
+
+  return candidates;
+}
+
+// ─── Auto-apply safe changes ─────────────────────────────────────────────────
+
+async function applyRoutingChange(
+  candidate: RoutingCandidate,
+  rules: RoutingRulesFile,
+): Promise<void> {
+  const rule = rules.routing_rules[candidate.taskType];
+  if (!rule) return;
+
+  // Move current model to fallback
+  const updatedRule: RoutingRule = {
+    ...rule,
+    model: candidate.candidateModel,
+    fallback_model: candidate.currentModel,
+  };
+
+  const updatedRules: RoutingRulesFile = {
+    ...rules,
+    routing_rules: {
+      ...rules.routing_rules,
+      [candidate.taskType]: updatedRule,
+    },
+  };
+
+  writeRoutingRules(updatedRules);
+
+  // Record in routing_candidates
+  await query(
+    `INSERT INTO routing_candidates
+       (task_type, current_model, candidate_model, current_avg_confidence,
+        candidate_avg_confidence, current_p95_latency_ms, candidate_p95_latency_ms,
+        sample_size, auto_applied, applied_at)
+     VALUES ($1, $2, $3, $4, $5, $6, $7, $8, true, now())`,
+    [
+      candidate.taskType,
+      candidate.currentModel,
+      candidate.candidateModel,
+      candidate.currentAvgConf,
+      candidate.candidateAvgConf,
+      candidate.currentP95,
+      candidate.candidateP95,
+      candidate.sampleSize,
+    ],
+  );
+
+  // Signal gateway to reload config
+  await postInternal('/internal/reload-config', { reason: 'routing-optimizer', taskType: candidate.taskType });
+
+  logger.info(
+    {
+      taskType: candidate.taskType,
+      from: candidate.currentModel,
+      to: candidate.candidateModel,
+      reason: candidate.reason,
+    },
+    'Applied routing change',
+  );
+}
+
+// ─── A/B testing ────────────────────────────────────────────────────────────
+
+async function manageAbTests(metrics: ModelMetrics[], rules: RoutingRulesFile): Promise<void> {
+  // Find task_types with > MIN_CALLS_FOR_AB calls/day
+  const eligibleTasks = metrics.filter(
+    (m) => m.totalCalls >= MIN_CALLS_FOR_AB && m.modelUsed === rules.routing_rules[m.taskType]?.model,
+  );
+
+  // Check for running tests to conclude
+  const runningTests = await query<AbTest>(
+    `SELECT * FROM ab_tests WHERE status = 'running' AND created_at < now() - interval '1 day'`,
+  );
+
+  for (const test of runningTests.rows) {
+    await concludeAbTest(test, rules);
+  }
+
+  // Start new tests for eligible tasks without one
+  for (const eligible of eligibleTasks) {
+    const existing = await query(
+      `SELECT id FROM ab_tests WHERE task_type = $1 AND status = 'running'`,
+      [eligible.taskType],
+    );
+    if (existing.rows.length > 0) continue;
+
+    // Find a challenger — the second-best model for this task
+    const taskMetrics = metrics.filter((m) => m.taskType === eligible.taskType);
+    taskMetrics.sort((a, b) => b.avgConfidence - a.avgConfidence);
+
+    const control = taskMetrics.find((m) => m.modelUsed === eligible.modelUsed);
+    const challenger = taskMetrics.find((m) => m.modelUsed !== eligible.modelUsed && m.totalCalls >= 5);
+
+    if (!control || !challenger) continue;
+
+    await query(
+      `INSERT INTO ab_tests
+         (task_type, control_model, challenger_model, traffic_percent, status)
+       VALUES ($1, $2, $3, $4, 'running')`,
+      [eligible.taskType, control.modelUsed, challenger.modelUsed, AB_TEST_TRAFFIC_PCT],
+    );
+
+    logger.info(
+      { taskType: eligible.taskType, control: control.modelUsed, challenger: challenger.modelUsed },
+      'Started A/B test',
+    );
+  }
+}
+
+async function concludeAbTest(test: AbTest, rules: RoutingRulesFile): Promise<void> {
+  // Re-fetch latest metrics for this test
+  const metricsResult = await query<{ model_used: string; avg_conf: string; call_count: string }>(
+    `SELECT model_used,
+            AVG(confidence)::float AS avg_conf,
+            COUNT(*)::int AS call_count
+     FROM routing_metrics
+     WHERE task_type = $1
+       AND model_used IN ($2, $3)
+       AND recorded_at > (SELECT created_at FROM ab_tests WHERE id = $4)
+     GROUP BY model_used`,
+    [test.task_type, test.control_model, test.challenger_model, test.id],
+  );
+
+  const controlM = metricsResult.rows.find((r) => r.model_used === test.control_model);
+  const challengerM = metricsResult.rows.find((r) => r.model_used === test.challenger_model);
+
+  if (!controlM || !challengerM) {
+    logger.warn({ testId: test.id }, 'A/B test: insufficient data to conclude');
+    return;
+  }
+
+  const controlConf = parseFloat(controlM.avg_conf);
+  const challengerConf = parseFloat(challengerM.avg_conf);
+  const controlCalls = parseInt(controlM.call_count);
+  const challengerCalls = parseInt(challengerM.call_count);
+
+  if (challengerCalls < AB_TEST_MIN_CALLS) {
+    logger.info({ testId: test.id, challengerCalls }, 'A/B test: not enough challenger calls yet');
+    return;
+  }
+
+  const winner = challengerConf > controlConf + 0.5 ? test.challenger_model : test.control_model;
+  const autoPromote = winner === test.challenger_model;
+
+  await withTransaction(async (client) => {
+    await client.query(
+      `UPDATE ab_tests
+       SET completed_at = now(),
+           control_calls = $1,
+           challenger_calls = $2,
+           control_avg_conf = $3,
+           challenger_avg_conf = $4,
+           winner = $5,
+           auto_promoted = $6,
+           status = 'completed'
+       WHERE id = $7`,
+      [controlCalls, challengerCalls, controlConf, challengerConf, winner, autoPromote, test.id],
+    );
+  });
+
+  logger.info(
+    {
+      taskType: test.task_type,
+      winner,
+      controlConf: controlConf.toFixed(2),
+      challengerConf: challengerConf.toFixed(2),
+    },
+    'A/B test concluded',
+  );
+
+  if (autoPromote) {
+    const rule = rules.routing_rules[test.task_type];
+    if (rule) {
+      const updatedRules: RoutingRulesFile = {
+        ...rules,
+        routing_rules: {
+          ...rules.routing_rules,
+          [test.task_type]: {
+            ...rule,
+            model: winner,
+            fallback_model: test.control_model,
+          },
+        },
+      };
+      writeRoutingRules(updatedRules);
+      await postInternal('/internal/reload-config', { reason: 'ab-test-winner', taskType: test.task_type });
+      logger.info({ taskType: test.task_type, winner }, 'Auto-promoted A/B test winner');
+    }
+  }
+}
+
+// ─── Main job ────────────────────────────────────────────────────────────────
+
+export async function runRoutingOptimizer(): Promise<void> {
+  const startedAt = Date.now();
+  logger.info('Routing optimizer job started');
+
+  let rules: RoutingRulesFile;
+  try {
+    rules = loadRoutingRules();
+  } catch (err) {
+    logger.error({ err }, 'Failed to load routing rules — aborting');
+    return;
+  }
+
+  const metrics = await aggregateMetrics();
+  logger.info({ count: metrics.length }, 'Aggregated routing metrics');
+
+  // Generate candidates
+  const candidates = generateCandidates(metrics, rules);
+  logger.info({ count: candidates.length }, 'Generated routing candidates');
+
+  // Store all candidates in DB first
+  for (const candidate of candidates) {
+    try {
+      await query(
+        `INSERT INTO routing_candidates
+           (task_type, current_model, candidate_model, current_avg_confidence,
+            candidate_avg_confidence, current_p95_latency_ms, candidate_p95_latency_ms, sample_size)
+         VALUES ($1, $2, $3, $4, $5, $6, $7, $8)`,
+        [
+          candidate.taskType,
+          candidate.currentModel,
+          candidate.candidateModel,
+          candidate.currentAvgConf,
+          candidate.candidateAvgConf,
+          candidate.currentP95,
+          candidate.candidateP95,
+          candidate.sampleSize,
+        ],
+      );
+    } catch {
+      // Non-fatal
+    }
+  }
+
+  // Auto-apply safe changes
+  let applied = 0;
+  const currentRules = loadRoutingRules(); // reload fresh before applying
+
+  for (const candidate of candidates) {
+    const confDelta = candidate.candidateAvgConf - candidate.currentAvgConf;
+    const latencyImprovement =
+      candidate.currentP95 > 0
+        ? ((candidate.currentP95 - candidate.candidateP95) / candidate.currentP95) * 100
+        : 0;
+
+    const isSafe =
+      (confDelta >= MIN_CONFIDENCE_DELTA) ||
+      (latencyImprovement >= MIN_LATENCY_IMPROVEMENT_PCT && confDelta >= -0.3);
+
+    if (isSafe && candidate.sampleSize >= 30) {
+      await applyRoutingChange(candidate, currentRules);
+      // Update local copy of rules for subsequent candidates
+      const reloaded = loadRoutingRules();
+      Object.assign(currentRules, reloaded);
+      applied++;
+    }
+  }
+
+  // Manage A/B tests
+  await manageAbTests(metrics, currentRules);
+
+  const durationMs = Date.now() - startedAt;
+  logger.info({ candidates: candidates.length, applied, durationMs }, 'Routing optimizer job completed');
+}
--- a/packages/learning/tsconfig.json
+++ b/packages/learning/tsconfig.json
@ -0,0 +1,22 @@
+{
+  "compilerOptions": {
+    "target": "ES2022",
+    "module": "ESNext",
+    "moduleResolution": "bundler",
+    "lib": ["ES2022"],
+    "outDir": "./dist",
+    "rootDir": "./src",
+    "strict": true,
+    "esModuleInterop": true,
+    "skipLibCheck": true,
+    "declaration": true,
+    "declarationMap": true,
+    "sourceMap": true,
+    "noUncheckedIndexedAccess": true,
+    "noImplicitReturns": true,
+    "noFallthroughCasesInSwitch": true,
+    "resolveJsonModule": true
+  },
+  "include": ["src/**/*"],
+  "exclude": ["node_modules", "dist"]
+}
--- a/scripts/init-db.sh
+++ b/scripts/init-db.sh
@ -0,0 +1,17 @@
+#!/bin/bash
+set -e
+DB_HOST="${DB_HOST:-localhost}"
+DB_PORT="${DB_PORT:-5432}"
+DB_NAME="${DB_NAME:-llm_gateway}"
+DB_USER="${DB_USER:-llm}"
+DB_PASS="${DB_PASS:-llm_secure_2026}"
+PG_USER="${PG_SUPERUSER:-postgres}"
+
+echo "Creating database and user..."
+psql -h $DB_HOST -p $DB_PORT -U $PG_USER -c "CREATE USER $DB_USER WITH PASSWORD '$DB_PASS';" 2>/dev/null || echo "User exists"
+psql -h $DB_HOST -p $DB_PORT -U $PG_USER -c "CREATE DATABASE $DB_NAME OWNER $DB_USER;" 2>/dev/null || echo "DB exists"
+psql -h $DB_HOST -p $DB_PORT -U $PG_USER -c "GRANT ALL PRIVILEGES ON DATABASE $DB_NAME TO $DB_USER;" 2>/dev/null
+
+echo "Running migrations..."
+PGPASSWORD=$DB_PASS psql -h $DB_HOST -p $DB_PORT -U $DB_USER -d $DB_NAME -f packages/gateway/src/db/migrations/001_initial.sql
+echo "DB initialized"
--- a/scripts/pull-models.sh
+++ b/scripts/pull-models.sh
@ -0,0 +1,8 @@
+#!/bin/bash
+OLLAMA_URL="${OLLAMA_URL:-http://192.168.178.169:11434}"
+echo "Pulling models from $OLLAMA_URL..."
+for model in "qwen2.5:3b" "qwen2.5:14b" "qwen2.5:32b"; do
+  echo "Pulling $model..."
+  curl -s -X POST "$OLLAMA_URL/api/pull" -d "{\"name\":\"$model\"}" | python3 -c "import sys; [print(l) for l in sys.stdin.read().split('\n') if 'status' in l]" 2>/dev/null || echo "Done: $model"
+done
+echo "All models pulled"