feat: restore workbench v1 and publish wired v2

feat: publish llm gateway v2 dashboard alongside restored workbench
2026-05-03 09:53:40 +02:00 · 2026-05-01 17:43:32 +02:00
30 changed files with 12942 additions and 809 deletions
--- a/AI_CONTROL_PLANE_SYSTEM_DESIGN.md
+++ b/AI_CONTROL_PLANE_SYSTEM_DESIGN.md
@ -0,0 +1,426 @@
+# AI Control Plane System Design
+
+## 1. Purpose
+
+LLM Gateway is a deterministic, observable, policy-driven routing layer for AI execution with memory and cost control.
+
+It routes requests from clients to the right model, provider, agent, or tool based on:
+
+- policy
+- cost
+- availability
+- context
+- memory
+- trust level
+- historical route success
+
+It also provides:
+
+- full observability through immutable receipts
+- reproducible AI runs
+- shared memory persistence
+- route memory
+- token and cost optimization
+
+## 2. High-Level Architecture
+
+```text
+Input Layer
+  clients, APIs, MCP, internal connectors
+      |
+      v
+Control Plane
+  trust routing, policy, compression, memory, provider routing
+      |
+      v
+Execution Layer
+  local models, external providers, tools, services
+      |
+      v
+Output
+  response to caller
+      |
+      v
+Receipts + Memory Update
+
+Side System:
+  Memory Layer
+    global memory, project memory, route memory, semantic cache
+```
+
+## 3. Components
+
+### 3.1 Client Entry
+
+Clients connect via API, MCP, OpenAI-compatible endpoints, or internal connectors.
+
+Supported client targets:
+
+- Codex
+- Claude Code
+- ChatGPT
+- Cursor
+- VS Code and Continue-style IDEs
+- automation pipelines
+- n8n
+- internal services
+
+Each request should include:
+
+- payload: prompt, input, files, tool call, or task
+- metadata: user, project, agent, task type
+- optional routing hints
+- optional policy hints
+
+### 3.2 Trust Router
+
+The Trust Router is the first decision point.
+
+Responsibilities:
+
+- validate client identity
+- assign trust level
+- classify request type
+- classify data sensitivity
+- apply initial routing hints
+- attach enriched request context
+
+Example classification labels:
+
+- code
+- infra
+- legal
+- security
+- general
+- document
+- automation
+
+Output:
+
+- enriched request context
+- trust score
+- sensitivity label
+- classification label
+
+### 3.3 Policy Engine
+
+The Policy Engine is the core decision system.
+
+It evaluates:
+
+- data sensitivity
+- allowed providers
+- allowed models
+- allowed tools
+- cost constraints
+- project rules
+- compliance rules
+- offline/simulation/live mode
+
+Example policies:
+
+- never send legal data to public APIs
+- prefer local models for internal code
+- use external models only if confidence is below a threshold
+- block requests containing secrets
+- require admin override for production deployment tools
+
+Output:
+
+- allowed routes
+- blocked routes
+- required redactions
+- execution constraints
+- policy decision log
+
+### 3.4 Memory Query
+
+Memory is queried before compression and execution.
+
+Memory sources:
+
+- project memory
+- global memory
+- route memory
+- semantic cache
+- handoffs
+- receipts
+- reproducible runs
+
+Output:
+
+- relevant memory context
+- prior decisions
+- route hints
+- cache candidates
+
+### 3.5 Compression Engine
+
+The Compression Engine optimizes request and memory context before execution.
+
+Functions:
+
+- token reduction
+- context deduplication
+- semantic summarization
+- cache lookup
+- prompt/context packaging
+- token budget enforcement
+
+Input:
+
+- raw request
+- policy constraints
+- memory context
+- target model context budget
+
+Output:
+
+- compressed payload
+- token metrics before and after
+- cache hit or miss
+- compression receipt data
+
+### 3.6 Provider Router
+
+The Provider Router makes the final execution decision.
+
+It selects:
+
+- local model
+- external provider
+- AI client/agent
+- tool execution
+- fallback route
+
+Criteria:
+
+- policy constraints
+- trust level
+- cost
+- latency
+- availability
+- model capability
+- route memory
+- benchmark results
+- agent reputation
+
+Output:
+
+- selected execution target
+- fallback routes
+- route explanation
+
+### 3.7 Execution Layer
+
+The Execution Layer handles actual processing.
+
+Execution target types:
+
+- local models such as Ollama, LM Studio, LocalAI, llama.cpp, vLLM
+- external APIs such as OpenAI, Anthropic, Mistral, Groq, OpenRouter
+- AI clients such as Claude Code, Codex, Cursor, ChatGPT adapters
+- tools, scripts, workflows, and internal services
+
+Execution returns:
+
+- raw response
+- latency
+- token usage
+- provider metadata
+- errors
+- tool call results
+
+### 3.8 Receipt Engine
+
+The Receipt Engine creates an immutable trace for each request.
+
+Receipts include:
+
+- request id
+- input summary or redacted input
+- trust decisions
+- policy decisions
+- memory refs
+- compression results
+- selected model/provider/tool
+- fallback chain
+- response summary or full response depending on policy
+- token usage
+- cost estimate
+- timestamps
+- errors
+- blocked routes
+
+Receipts are immutable and stored.
+
+### 3.9 Memory Layer
+
+Memory is separate from execution but connected to routing and compression.
+
+Memory types:
+
+1. Project memory
+   - task history
+   - decisions
+   - context
+   - handoffs
+
+2. Global memory
+   - shared knowledge
+   - user/team preferences
+   - reusable runbooks
+
+3. Route memory
+   - routing decisions
+   - success and failure patterns
+   - optimization feedback
+
+4. Semantic cache
+   - previous responses
+   - embedding lookup
+   - prompt/result reuse
+
+Memory is:
+
+- append-only by default
+- queryable
+- versioned where possible
+- used during routing and compression
+
+### 3.10 Route Reflector Memory
+
+Route Reflector Memory is specialized route memory inspired by BGP route reflectors.
+
+Functions:
+
+- learns optimal AI routes
+- shares routing knowledge across clients
+- improves future routing decisions
+- records fallback success and failures
+- contributes to Provider Router decisions
+
+Examples:
+
+- code debugging works best through Codex plus local validation
+- private infra diagnostics should route to local models
+- long-form reasoning performs better on selected external models
+- JSON extraction for project X has best success on model Y
+
+## 4. Data Flow
+
+1. Client sends request.
+2. Trust Router classifies request and assigns trust.
+3. Policy Engine filters allowed routes.
+4. Memory Layer is queried for context and prior route knowledge.
+5. Compression Engine optimizes payload.
+6. Provider Router selects execution target and fallback chain.
+7. Execution Layer processes request.
+8. Response is returned to client.
+9. Receipt Engine generates immutable receipt.
+10. Memory Layer is updated with outcome.
+11. Route Reflector Memory updates routing knowledge.
+
+## 5. Modes Of Operation
+
+### Live Mode
+
+- real execution
+- full routing active
+- receipts stored
+- memory updated
+
+### Simulation Mode
+
+- no real execution
+- shows trust decisions
+- shows policy decisions
+- shows selected route and fallbacks
+- estimates cost and tokens
+- useful for testing policies
+
+### Offline Mode
+
+- only local models allowed
+- no external provider calls
+- remote sync disabled unless explicitly allowed
+- receipts marked as offline
+
+## 6. Control Functions
+
+The system supports:
+
+- trace request
+- replay request
+- force route
+- override policy as admin
+- inspect receipts
+- inspect memory
+- simulate routing
+- compare routes
+- inspect provider availability
+- inspect route memory
+
+## 7. Storage
+
+Required storage components:
+
+- receipts database: immutable logs
+- memory database: structured + vector
+- policy definitions
+- routing history
+- route reflector memory
+- semantic cache
+- reproducible run artifacts
+
+Recommended default:
+
+- SQLite for personal mode
+- Postgres plus pgvector for team/server mode
+- Git/Gitea as durable memory sync and audit transport
+
+## 8. Metrics
+
+System tracks:
+
+- token usage
+- compression ratio
+- cache hit rate
+- latency per provider
+- cost per request
+- routing success rate
+- fallback rate
+- trust level distribution
+- blocked route count
+- policy override count
+- agent reputation
+- benchmark scores
+
+## 9. Security Model
+
+- strict policy enforcement before external calls
+- data classification at entry
+- local-first routing possible
+- no sensitive data leaves system if blocked by policy
+- no secret sync to memory
+- audit trail via receipts
+- consent ledger for tool, memory, and provider permissions
+- safe config writer for external tool setup
+
+## 10. Extensibility
+
+The system supports:
+
+- new providers
+- new local models
+- new tools
+- new MCP resources
+- new policy rules
+- custom routing logic
+- custom memory backends
+- custom benchmarks
+- custom data source connectors
+
+## 11. Core Idea
+
+LLM Gateway is a deterministic, observable, policy-driven routing layer for AI execution with memory and cost control.
--- a/OPEN_SOURCE_BLUEPRINT.md
+++ b/OPEN_SOURCE_BLUEPRINT.md
--- a/OPEN_SOURCE_FEATURE_MATRIX.md
+++ b/OPEN_SOURCE_FEATURE_MATRIX.md
@ -0,0 +1,66 @@
+# Open Source Feature Matrix
+
+## Legend
+
+- `ready`: exists and is usable with cleanup
+- `partial`: exists but needs extraction/hardening
+- `missing`: must be built
+
+| Feature | Current | OSS Target | Priority |
+|---|---|---|---:|
+| Fastify gateway | ready | keep | P0 |
+| Client SDK | ready | keep + docs | P0 |
+| Health checks | ready | keep + doctor | P0 |
+| Dashboard | partial | topology-first app | P1 |
+| Ollama routing | ready | generic local provider | P0 |
+| LM Studio detection | missing | discovery provider | P0 |
+| LocalAI/llama.cpp/vLLM detection | missing | discovery provider | P0 |
+| Hosted provider registry | partial | provider adapters + consent | P0 |
+| OpenAI-compatible API | partial | first-class adapter | P0 |
+| MCP server | missing | first-class | P0 |
+| Claude Code integration | partial | MCP + bridge | P0 |
+| Codex integration | partial | MCP + LSP | P0 |
+| ChatGPT integration | missing | exports/import + adapter docs | P1 |
+| Cursor/VS Code integration | missing | safe config writer | P1 |
+| n8n integration | missing | workflow pack | P1 |
+| Trust Router | missing | core | P0 |
+| Policy Engine | missing | provider/model/tool constraints | P0 |
+| Provider Router | partial | final route + fallback decision | P0 |
+| Context Receipt | missing | core | P0 |
+| Shared Gitea Memory | missing | core | P0 |
+| Route Reflector Memory | missing | routing memory | P0 |
+| AI Handoff Protocol | partial | core | P0 |
+| Consent Ledger | missing | core | P0 |
+| Setup Doctor | missing | CLI + UI | P0 |
+| Safe Config Writer | missing | CLI + UI | P0 |
+| Offline Mode | missing | policy mode | P0 |
+| Simulation Mode | missing | dry-run routing decisions | P0 |
+| Compression/token saving | partial | first-class engine | P1 |
+| Semantic cache | missing | optional | P1 |
+| Capability Benchmark Lab | missing | routing input | P1 |
+| Agent Reputation Score | missing | routing input | P1 |
+| Reproducible Runs | missing | audit/eval | P1 |
+| Integration Marketplace | missing | local catalog | P1 |
+| Data connectors | missing | scoped connectors | P1 |
+| Team Mode | missing | RBAC/admin | P2 |
+| Prompt/agent versioning | partial | Git-backed | P2 |
+| Import wizard | missing | guided migration | P2 |
+
+## Public Positioning
+
+Do not position this as another LiteLLM clone.
+
+Positioning:
+
+> Adaptive LLM Gateway discovers your local and hosted AI stack, connects it through a secure MCP and OpenAI-compatible control plane, and gives every agent shared memory, policy, receipts, compression, and routing.
+
+Core differentiators:
+
+- AI environment discovery
+- Trust Router
+- Context Receipts
+- Shared Git/Gitea Memory
+- AI Handoff Protocol
+- Consent Ledger
+- Reproducible AI Runs
+- model and agent benchmark learning
--- a/OPEN_SOURCE_GAP_ANALYSIS.md
+++ b/OPEN_SOURCE_GAP_ANALYSIS.md
@ -0,0 +1,133 @@
+# Open Source Gap Analysis
+
+This document maps the current Context-X LLM Gateway to the planned open-source Adaptive LLM Gateway.
+
+## Current Strengths
+
+Already present in the repository:
+
+| Area | Current State | Notes |
+|---|---|---|
+| Gateway API | Present | Fastify gateway in `packages/gateway`. |
+| Completion API | Present | Main route: `/v1/completion`. |
+| Classification | Present | `/v1/classify` and pre-classifier pipeline. |
+| Batch jobs | Present | `/v1/batch` and PgBoss queue integration. |
+| Health checks | Present | `/health`, `/health/live`, `/health/ready`. |
+| Metrics | Present | Prometheus metrics and dashboard metrics. |
+| Dashboard | Present | Operational dashboard exists in `packages/gateway/public`. |
+| Routing rules | Present | YAML routing rules and model tiers. |
+| Local model routing | Present | Ollama-based routing and fallback chains. |
+| Hosted providers | Partial | External provider registry exists. Needs OSS cleanup and discovery. |
+| Cost tracking | Present | Cost analytics, token tracking, cost stream. |
+| Compression accounting | Partial | TokenVault/cost hooks exist. Needs first-class compression engine. |
+| Learning engine | Present | Learning cycles, model performance tracking, fine-tuner package. |
+| Client SDK | Present | `@llm-gateway/client`. |
+| OpenAI compatibility | Partial | `chatgpt-api-adapter` and `openai-bridge` exist. Needs clean OSS path. |
+| Codex integration | Partial | `packages/codex-lsp-adapter` exists. Needs production hardening. |
+| Claude Code integration | Partial | `packages/claude-code-bridge` exists. Needs MCP-first flow. |
+| LightRAG/RAG | Present | LightRAG sidecar exists. Needs generic connector story. |
+| Handoff sync | Partial | `sync/` handoff folder exists. Needs protocol and tools. |
+| Gitea use | Present internally | Needs generic Gitea memory backend. |
+
+## Missing For Open Source
+
+These features need to be added or extracted:
+
+| Feature | Status | Priority | Target Package/Area |
+|---|---|---:|---|
+| First-run setup wizard | Missing | P0 | `packages/cli`, `packages/discovery` |
+| Local AI discovery | Missing | P0 | `packages/discovery` |
+| Public provider discovery | Partial | P0 | `packages/discovery`, `packages/providers` |
+| AI client detection | Missing | P0 | `packages/discovery` |
+| MCP server | Missing | P0 | `packages/mcp-server` |
+| Trust Router | Missing | P0 | `packages/trust-router` |
+| Consent Ledger | Missing | P0 | `packages/consent-ledger` |
+| Shared Gitea Memory | Missing | P0 | `packages/memory-sync` |
+| Context Receipt | Missing | P0 | `packages/context-receipts` |
+| AI Handoff Protocol | Partial | P0 | `packages/handoff` |
+| Safe Config Writer | Missing | P0 | `packages/config-writer` |
+| Setup Doctor | Missing | P0 | `packages/doctor` |
+| Offline Mode | Missing | P0 | gateway config/policy |
+| Capability Benchmark Lab | Missing | P1 | `packages/benchmark-lab` |
+| Agent Reputation Score | Missing | P1 | `packages/agent-reputation` |
+| Reproducible Runs | Missing | P1 | `packages/run-ledger` |
+| Visual Topology Map | Missing | P1 | dashboard UI/API |
+| Integration Marketplace | Missing | P1 | `packages/integrations` + UI |
+| Data source connectors | Missing | P1 | `packages/connectors` |
+| Context Compression Engine | Partial | P1 | `packages/context-compression` |
+| Semantic cache | Missing/mentioned | P1 | `packages/cache` |
+| Team mode | Missing | P2 | auth/policy/admin UI |
+| Prompt/agent versioning | Partial | P2 | memory/git/prompt registry |
+| Migration/import wizard | Missing | P2 | `packages/import-wizard` |
+
+## Context-X Assumptions To Remove
+
+Before public release, remove or move behind an example profile:
+
+- hardcoded `context-x.org` domains
+- hardcoded `fichtmueller.org` Ollama endpoint
+- Erik-specific paths such as `/opt/llm-gateway`
+- private project callers and templates as defaults
+- internal IP assumptions
+- private training data
+- private bridge assumptions
+- secret-looking examples
+- Context-X branding as default OSS UI
+
+Keep them as:
+
+```text
+examples/profiles/context-x/
+```
+
+or as a private deployment overlay.
+
+## Proposed New Packages
+
+```text
+packages/
+  cli/                    # init, doctor, integrate, import, mode
+  discovery/              # detects models, clients, runtimes, providers
+  mcp-server/             # MCP tools/resources
+  trust-router/           # sensitivity + policy routing
+  consent-ledger/         # append-only permissions ledger
+  memory-sync/            # local/git/gitea memory backend
+  handoff/                # AI Handoff Protocol schema + helpers
+  context-receipts/       # receipts and audit artifacts
+  config-writer/          # safe config diffs and rollback
+  benchmark-lab/          # model/agent benchmark suite
+  agent-reputation/       # agent scorecards
+  run-ledger/             # reproducible AI runs
+  context-compression/    # compression + token budget manager
+  integrations/           # integration catalog manifests
+  connectors/             # data source connectors
+  import-wizard/          # migration/import helpers
+```
+
+## MVP Cut
+
+The first useful OSS release should not try to ship everything.
+
+MVP must include:
+
+- CLI with `init`, `doctor`, `start`, `integrate`
+- local AI discovery: Ollama + LM Studio + OpenAI-compatible `/v1/models`
+- provider env discovery with consent
+- MCP server with safe gateway and memory tools
+- Trust Router with four trust levels
+- Gitea/Git memory backend
+- Context Receipts
+- AI Handoff Protocol
+- Safe Config Writer
+- Offline Mode
+- basic topology dashboard
+
+MVP can defer:
+
+- full benchmark lab
+- team RBAC
+- all data connectors
+- full import wizard
+- advanced compression comparisons
+- agent reputation automation
+
--- a/OPEN_SOURCE_IMPLEMENTATION_ROADMAP.md
+++ b/OPEN_SOURCE_IMPLEMENTATION_ROADMAP.md
@ -0,0 +1,212 @@
+# Open Source Implementation Roadmap
+
+## Phase 0: Sanitize And Productize
+
+Goal: make the current codebase safe to publish and understandable outside Context-X.
+
+Tasks:
+
+- Add OSS name and package naming decision.
+- Move Context-X-only files into `examples/profiles/context-x/`.
+- Add `.env.example` without private domains or secrets.
+- Replace hardcoded defaults with generated config.
+- Add license, contributing guide, security policy, and public README.
+- Run secret scan and dependency/license audit.
+- Decide which training data can be published.
+
+Exit criteria:
+
+- Fresh clone can install without private services.
+- No private domains or internal IPs are required for default startup.
+- Public README explains local-only setup.
+
+## Phase 1: Adaptive Init
+
+Goal: detect the user's AI environment and create config.
+
+Packages:
+
+- `packages/cli`
+- `packages/discovery`
+- `packages/config-writer`
+
+Commands:
+
+```bash
+adaptive-llm-gateway init
+adaptive-llm-gateway doctor
+adaptive-llm-gateway integrate <target>
+adaptive-llm-gateway mode offline
+adaptive-llm-gateway simulate <request-file>
+```
+
+Detection targets:
+
+- Ollama
+- LM Studio
+- LocalAI
+- llama.cpp server
+- vLLM
+- Open WebUI
+- OpenAI-compatible endpoints
+- OpenAI/Anthropic/Groq/Mistral/OpenRouter env keys
+- Claude Code
+- Codex
+- Cursor
+- VS Code
+- Continue.dev
+- n8n
+- Docker containers
+- Git/Gitea availability
+
+Exit criteria:
+
+- `init` writes `~/.adaptive-llm-gateway/config.yaml`.
+- No external integration is enabled without approval.
+- `doctor` reports actionable health and setup status.
+
+## Phase 2: Trust, Consent, Receipts
+
+Goal: every request goes through policy and produces an audit artifact.
+
+Packages:
+
+- `packages/trust-router`
+- `packages/policy-engine`
+- `packages/consent-ledger`
+- `packages/context-receipts`
+- `packages/run-ledger`
+- `packages/provider-router`
+
+Features:
+
+- four trust levels: public, internal, confidential, secret
+- local-only/offline routing mode
+- simulation mode with no execution
+- provider router route constraints and fallbacks
+- append-only consent ledger
+- receipt for context used, blocked, redacted, routed
+- reproducible run folder
+
+Exit criteria:
+
+- External providers are blocked for confidential/secret data by default.
+- Receipts can be viewed from CLI and dashboard.
+- Consent changes are append-only and reversible.
+
+## Phase 3: Shared Memory And MCP
+
+Goal: make the gateway the shared memory and tool layer for all AI clients.
+
+Packages:
+
+- `packages/memory-sync`
+- `packages/handoff`
+- `packages/mcp-server`
+- `packages/route-reflector-memory`
+
+Features:
+
+- local memory repo
+- Git/Gitea sync
+- typed memory folders
+- MCP tools for memory and gateway calls
+- AI Handoff Protocol
+- Route Reflector Memory for routing outcomes
+- conflict-safe append-first writes
+
+MCP tools:
+
+- `gateway.complete`
+- `gateway.chat`
+- `gateway.health`
+- `gateway.route_preview`
+- `memory.search`
+- `memory.read`
+- `memory.write`
+- `memory.append_session`
+- `memory.record_decision`
+- `memory.record_task`
+- `memory.pull`
+- `memory.push`
+
+Exit criteria:
+
+- Claude Code and Codex can access the same memory through MCP.
+- Handoffs are stored in Git/Gitea.
+- Memory sync refuses to commit secrets.
+
+## Phase 4: Compression And Knowledge
+
+Goal: reduce token use and retrieve only the right context.
+
+Packages:
+
+- `packages/context-compression`
+- `packages/connectors`
+- `packages/cache`
+
+Features:
+
+- token budget manager
+- session compaction
+- repo/doc summarization
+- memory dedupe
+- semantic cache
+- SQLite vector default
+- Postgres/Qdrant optional
+- approved data source connectors
+
+Exit criteria:
+
+- Context packages include budget, source refs, and compression stats.
+- Receipts show compressed-from and final token counts.
+- Indexing requires explicit allowed roots.
+
+## Phase 5: Benchmarking And Reputation
+
+Goal: route based on evidence instead of static assumptions.
+
+Packages:
+
+- `packages/benchmark-lab`
+- `packages/agent-reputation`
+
+Features:
+
+- model capability tests
+- agent scorecards
+- latency/cost/quality tracking
+- JSON reliability test
+- code patch/test benchmark
+- local vs hosted comparison
+
+Exit criteria:
+
+- Trust Router can use benchmark scores.
+- Dashboard shows model and agent strengths.
+- Routing decisions explain benchmark influence.
+
+## Phase 6: Product UI
+
+Goal: turn the operational dashboard into a usable OSS app.
+
+UI areas:
+
+- Topology
+- Models
+- Agents
+- Memory
+- Policies
+- Receipts
+- Benchmarks
+- Costs
+- Integrations
+- Doctor
+- Settings
+
+Exit criteria:
+
+- First screen is topology/status.
+- User can enable integrations from UI with diff preview.
+- User can inspect receipts and memory sync status.
--- a/packages/gateway/public/dashboard-v2.html
+++ b/packages/gateway/public/dashboard-v2.html
--- a/packages/gateway/public/dashboard.html
+++ b/packages/gateway/public/dashboard.html
--- a/packages/gateway/src/modules/admin-auth.ts
+++ b/packages/gateway/src/modules/admin-auth.ts
@ -0,0 +1,87 @@
+import type { FastifyReply, FastifyRequest } from 'fastify';
+import { timingSafeEqual } from 'crypto';
+
+const TOKEN_ENV_KEYS = ['DASHBOARD_AUTH_TOKEN', 'LLM_GATEWAY_ADMIN_TOKEN', 'ADMIN_TOKEN'] as const;
+
+function configuredToken(): string | undefined {
+  for (const key of TOKEN_ENV_KEYS) {
+    const value = process.env[key]?.trim();
+    if (value) return value;
+  }
+  return undefined;
+}
+
+function safeEqual(left: string, right: string): boolean {
+  const leftBuffer = Buffer.from(left);
+  const rightBuffer = Buffer.from(right);
+  if (leftBuffer.length !== rightBuffer.length) return false;
+  return timingSafeEqual(leftBuffer, rightBuffer);
+}
+
+function tokenFromAuthorizationHeader(header: string | undefined): string | undefined {
+  if (!header) return undefined;
+  const [scheme, value] = header.split(/\s+/, 2);
+  if (!scheme || !value) return undefined;
+
+  if (scheme.toLowerCase() === 'bearer') return value.trim();
+
+  if (scheme.toLowerCase() === 'basic') {
+    try {
+      const decoded = Buffer.from(value, 'base64').toString('utf8');
+      const separator = decoded.indexOf(':');
+      return separator >= 0 ? decoded.slice(separator + 1).trim() : decoded.trim();
+    } catch {
+      return undefined;
+    }
+  }
+
+  return undefined;
+}
+
+function tokenFromRequest(request: FastifyRequest): string | undefined {
+  const explicit = request.headers['x-dashboard-token'];
+  if (typeof explicit === 'string' && explicit.trim()) return explicit.trim();
+  return tokenFromAuthorizationHeader(request.headers.authorization);
+}
+
+export function isDashboardAuthConfigured(): boolean {
+  return !!configuredToken();
+}
+
+function isLocalDevelopmentRequest(request: FastifyRequest): boolean {
+  if (process.env['NODE_ENV'] === 'production') return false;
+  const host = request.hostname || request.headers.host || '';
+  return host.startsWith('127.0.0.1') || host.startsWith('localhost') || host.startsWith('[::1]');
+}
+
+export async function requireDashboardAuth(request: FastifyRequest, reply: FastifyReply): Promise<FastifyReply | void> {
+  if (isLocalDevelopmentRequest(request)) return;
+
+  const expected = configuredToken();
+  if (!expected) {
+    return reply.status(503).send({
+      statusCode: 503,
+      error: 'Dashboard Auth Not Configured',
+      message: 'Set DASHBOARD_AUTH_TOKEN before exposing dashboard data or settings.',
+    });
+  }
+
+  const received = tokenFromRequest(request);
+  if (!received || !safeEqual(received, expected)) {
+    reply.header('WWW-Authenticate', 'Bearer realm="llm-gateway-dashboard"');
+    return reply.status(401).send({
+      statusCode: 401,
+      error: 'Unauthorized',
+      message: 'Dashboard token required.',
+    });
+  }
+}
+
+export function dashboardAuthStatus(request: FastifyRequest): { configured: boolean; authenticated: boolean } {
+  if (isLocalDevelopmentRequest(request)) return { configured: true, authenticated: true };
+
+  const expected = configuredToken();
+  if (!expected) return { configured: false, authenticated: false };
+  const received = tokenFromRequest(request);
+  return { configured: true, authenticated: !!received && safeEqual(received, expected) };
+}
--- a/packages/gateway/src/modules/bridge-spawner.ts
+++ b/packages/gateway/src/modules/bridge-spawner.ts
@ -0,0 +1,246 @@
+/**
+ * Bridge Spawner
+ *
+ * Auto-starts inline HTTP bridges for detected CLI subscriptions. Each bridge
+ * exposes a `POST /api/generate` endpoint that the gateway can call as a regular
+ * external provider. Bridges run in-process to avoid the overhead of spawning
+ * separate Node processes — they listen on a dedicated port per subscription.
+ */
+
+import { execFile } from 'child_process';
+import { createServer, type Server } from 'http';
+import { logger } from '../observability/logger.js';
+import type { SubscriptionDescriptor, SubscriptionStatus } from './subscription-discovery.js';
+
+interface RunningBridge {
+  descriptor: SubscriptionDescriptor;
+  server: Server;
+  port: number;
+  url: string;
+  startedAt: Date;
+}
+
+const runningBridges = new Map<string, RunningBridge>();
+
+/**
+ * Run a CLI tool with stdin-piped prompt, return stdout content.
+ * Generic implementation that all inline bridges share.
+ */
+async function runCli(
+  command: string,
+  args: readonly string[],
+  prompt: string,
+  timeoutMs: number = 300_000
+): Promise<{ success: boolean; content?: string; error?: string }> {
+  return new Promise((resolve) => {
+    try {
+      const child = execFile(
+        command,
+        args as string[],
+        { timeout: timeoutMs, maxBuffer: 10 * 1024 * 1024 },
+        (err, stdout) => {
+          if (err) {
+            resolve({ success: false, error: err.message.slice(0, 500) });
+          } else {
+            resolve({ success: true, content: stdout.trim() });
+          }
+        }
+      );
+      if (child.stdin) {
+        child.stdin.write(prompt);
+        child.stdin.end();
+      }
+    } catch (err) {
+      resolve({ success: false, error: err instanceof Error ? err.message : String(err) });
+    }
+  });
+}
+
+/**
+ * Build the CLI invocation for a given subscription.
+ */
+function buildCliInvocation(desc: SubscriptionDescriptor, model?: string): { cmd: string; args: string[] } {
+  switch (desc.bridgeImplementation) {
+    case 'inline-claude': {
+      const args = ['--print', '--output-format', 'text'];
+      if (model) args.push('--model', model);
+      return { cmd: 'claude', args };
+    }
+    case 'inline-copilot': {
+      // gh copilot suggest is interactive; we use the OpenAI-compatible copilot-api proxy if available.
+      return { cmd: 'gh', args: ['copilot', 'suggest', '--shell'] };
+    }
+    case 'inline-openai': {
+      // Generic OpenAI-compatible CLI (chatgpt-cli, gemini-cli with OpenAI compat)
+      return { cmd: desc.command, args: model ? ['--model', model] : [] };
+    }
+    case 'external-codex': {
+      // codex CLI: read prompt from stdin
+      return { cmd: 'codex', args: model ? ['--model', model] : [] };
+    }
+  }
+}
+
+/**
+ * Spawn an inline HTTP bridge for a subscription. Returns the URL the gateway
+ * should use to talk to it. Idempotent — calling twice returns the same bridge.
+ */
+export function spawnBridge(desc: SubscriptionDescriptor): Promise<RunningBridge> {
+  const existing = runningBridges.get(desc.id);
+  if (existing) {
+    return Promise.resolve(existing);
+  }
+
+  return new Promise((resolve, reject) => {
+    const server = createServer(async (req, res) => {
+      res.setHeader('Content-Type', 'application/json');
+      res.setHeader('Access-Control-Allow-Origin', '*');
+
+      if (req.method === 'GET' && req.url === '/health') {
+        const current = runningBridges.get(desc.id);
+        res.writeHead(200);
+        res.end(
+          JSON.stringify({
+            status: 'ok',
+            subscription: desc.id,
+            label: desc.label,
+            command: desc.command,
+            uptimeSeconds: current ? Math.floor((Date.now() - current.startedAt.getTime()) / 1000) : 0,
+          })
+        );
+        return;
+      }
+
+      if (req.method === 'POST' && (req.url === '/api/generate' || req.url === '/v1/completion')) {
+        let body = '';
+        req.on('data', (chunk) => (body += chunk));
+        req.on('end', async () => {
+          try {
+            const { prompt, system, model } = JSON.parse(body || '{}');
+            if (!prompt) {
+              res.writeHead(400);
+              res.end(JSON.stringify({ error: 'prompt required' }));
+              return;
+            }
+            const fullPrompt = system ? `${system}\n\n---\n\n${prompt}` : prompt;
+            const { cmd, args } = buildCliInvocation(desc, model);
+            const result = await runCli(cmd, args, fullPrompt);
+            if (result.success) {
+              res.writeHead(200);
+              res.end(
+                JSON.stringify({
+                  success: true,
+                  content: result.content,
+                  provider: desc.providerName,
+                  model: model ?? desc.models[0]?.id,
+                })
+              );
+            } else {
+              res.writeHead(502);
+              res.end(JSON.stringify({ success: false, error: result.error }));
+            }
+          } catch (e) {
+            res.writeHead(500);
+            res.end(JSON.stringify({ error: e instanceof Error ? e.message : 'parse error' }));
+          }
+        });
+        return;
+      }
+
+      res.writeHead(404);
+      res.end(JSON.stringify({ error: 'not found' }));
+    });
+
+    server.on('error', (err) => {
+      // Port in use → assume an existing bridge is already running, treat as success
+      if ((err as NodeJS.ErrnoException).code === 'EADDRINUSE') {
+        logger.info(
+          { subscription: desc.id, port: desc.bridgePort },
+          'Port already in use — assuming external bridge is healthy'
+        );
+        const url = `http://127.0.0.1:${desc.bridgePort}`;
+        const fakeBridge: RunningBridge = {
+          descriptor: desc,
+          server, // server failed to bind; OK to keep handle
+          port: desc.bridgePort,
+          url,
+          startedAt: new Date(),
+        };
+        runningBridges.set(desc.id, fakeBridge);
+        resolve(fakeBridge);
+      } else {
+        reject(err);
+      }
+    });
+
+    server.listen(desc.bridgePort, '127.0.0.1', () => {
+      const url = `http://127.0.0.1:${desc.bridgePort}`;
+      const bridge: RunningBridge = {
+        descriptor: desc,
+        server,
+        port: desc.bridgePort,
+        url,
+        startedAt: new Date(),
+      };
+      runningBridges.set(desc.id, bridge);
+      // Set the env var so the existing external-providers logic finds the bridge
+      process.env[desc.bridgeEnvKey] = url;
+      logger.info(
+        { subscription: desc.id, url, port: desc.bridgePort, envKey: desc.bridgeEnvKey },
+        'Inline subscription bridge started'
+      );
+      resolve(bridge);
+    });
+  });
+}
+
+/**
+ * Spawn bridges for every detected, authenticated subscription that doesn't
+ * already have a bridge URL configured. Returns the list of started bridges.
+ */
+export async function spawnDetectedBridges(
+  statuses: readonly SubscriptionStatus[]
+): Promise<RunningBridge[]> {
+  const toSpawn = statuses.filter(
+    (s) => s.installed && s.authenticated !== false && !s.bridgeRunning
+  );
+  const results: RunningBridge[] = [];
+  for (const status of toSpawn) {
+    try {
+      const bridge = await spawnBridge(status.descriptor);
+      results.push(bridge);
+    } catch (err) {
+      logger.warn(
+        { err, subscription: status.descriptor.id },
+        'Failed to spawn subscription bridge — continuing'
+      );
+    }
+  }
+  return results;
+}
+
+/**
+ * Snapshot of currently running in-process bridges. Used by the dashboard.
+ */
+export function getRunningBridges(): readonly RunningBridge[] {
+  return Array.from(runningBridges.values());
+}
+
+/**
+ * Stop all inline bridges (used during graceful shutdown).
+ */
+export async function stopAllBridges(): Promise<void> {
+  await Promise.all(
+    Array.from(runningBridges.values()).map(
+      (bridge) =>
+        new Promise<void>((resolve) => {
+          try {
+            bridge.server.close(() => resolve());
+          } catch {
+            resolve();
+          }
+        })
+    )
+  );
+  runningBridges.clear();
+}
--- a/packages/gateway/src/modules/caller-stats.ts
+++ b/packages/gateway/src/modules/caller-stats.ts
@ -0,0 +1,180 @@
+/**
+ * Per-Caller Deep Dive
+ *
+ * Aggregates everything we know about ONE caller — its volume, models used,
+ * cache effectiveness, cost, latency distribution, recent activity, and
+ * stored memory facts. Powers the modal that opens when a user clicks on
+ * a caller chip in the dashboard.
+ */
+import type { Pool } from 'pg';
+import { logger } from '../observability/logger.js';
+
+export interface CallerDeepDive {
+  caller: string;
+  firstSeen: string | null;
+  lastSeen: string | null;
+  totalRequests: number;
+  successRate: number;
+  totalTokensIn: number;
+  totalTokensOut: number;
+  totalCost: number;
+  avgLatencyMs: number;
+  /** distribution: p50, p95 */
+  latencyP50: number;
+  latencyP95: number;
+  cacheHits: number;
+  cacheTokensSaved: number;
+  topModels: Array<{ model: string; count: number; share: number }>;
+  topTaskTypes: Array<{ taskType: string; count: number }>;
+  recentRequests: Array<{
+    request_id: string;
+    model: string;
+    status: string;
+    tokens_in: number;
+    tokens_out: number;
+    latency_ms: number;
+    cost_usd: number;
+    created_at: string;
+  }>;
+  storedFacts: Array<{ key: string; value: string; confidence: number; source: string }>;
+  hourlyHeatmap: Array<{ hour: number; count: number }>;
+}
+
+export async function getCallerDeepDive(db: Pool, caller: string): Promise<CallerDeepDive | null> {
+  const c = caller.trim().toLowerCase();
+  try {
+    // Headline aggregates
+    const head = await db.query(`
+      SELECT
+        COUNT(*)::INT AS total,
+        MIN(created_at) AS first_seen,
+        MAX(created_at) AS last_seen,
+        SUM(CASE WHEN status = 'approved' THEN 1 ELSE 0 END)::FLOAT / NULLIF(COUNT(*),0) AS success_rate,
+        COALESCE(SUM(tokens_in), 0)::BIGINT AS tok_in,
+        COALESCE(SUM(tokens_out), 0)::BIGINT AS tok_out,
+        COALESCE(SUM(cost_usd), 0)::NUMERIC AS cost,
+        COALESCE(AVG(latency_ms), 0)::INT AS avg_lat,
+        COALESCE(PERCENTILE_DISC(0.50) WITHIN GROUP (ORDER BY latency_ms), 0)::INT AS p50,
+        COALESCE(PERCENTILE_DISC(0.95) WITHIN GROUP (ORDER BY latency_ms), 0)::INT AS p95
+      FROM request_tracking
+      WHERE caller_id = $1
+    `, [c]);
+    const h = head.rows[0];
+    if (!h || parseInt(h.total, 10) === 0) {
+      return null;
+    }
+
+    const total = parseInt(h.total, 10) || 0;
+
+    // Top models by this caller
+    const models = await db.query(`
+      SELECT model, COUNT(*)::INT AS cnt
+      FROM request_tracking
+      WHERE caller_id = $1
+      GROUP BY model
+      ORDER BY cnt DESC
+      LIMIT 10
+    `, [c]);
+
+    const topModels = models.rows.map((r: any) => ({
+      model: r.model,
+      count: parseInt(r.cnt, 10) || 0,
+      share: total > 0 ? parseFloat(((parseInt(r.cnt, 10) / total) * 100).toFixed(1)) : 0,
+    }));
+
+    // Top task types
+    const tasks = await db.query(`
+      SELECT task_type, COUNT(*)::INT AS cnt
+      FROM request_tracking
+      WHERE caller_id = $1
+      GROUP BY task_type
+      ORDER BY cnt DESC
+      LIMIT 8
+    `, [c]);
+    const topTaskTypes = tasks.rows.map((r: any) => ({
+      taskType: r.task_type ?? '(unknown)',
+      count: parseInt(r.cnt, 10) || 0,
+    }));
+
+    // Cache stats for this caller
+    const cache = await db.query(`
+      SELECT
+        COALESCE(SUM(hit_count), 0)::INT AS hits,
+        COALESCE(SUM(tokens_saved), 0)::BIGINT AS tokens
+      FROM response_cache
+      WHERE caller_id = $1
+    `, [c]);
+    const cacheHits = parseInt(cache.rows[0]?.hits ?? '0', 10);
+    const cacheTokens = parseInt(cache.rows[0]?.tokens ?? '0', 10);
+
+    // Recent requests (15 latest)
+    const recent = await db.query(`
+      SELECT request_id, model, status, tokens_in, tokens_out, latency_ms, cost_usd, created_at
+      FROM request_tracking
+      WHERE caller_id = $1
+      ORDER BY created_at DESC
+      LIMIT 15
+    `, [c]);
+
+    // Stored facts
+    let storedFacts: any[] = [];
+    try {
+      const facts = await db.query(`
+        SELECT fact_key, fact_value, confidence, source
+        FROM caller_knowledge
+        WHERE caller_id = $1 AND superseded_by IS NULL
+          AND (valid_until IS NULL OR valid_until > NOW())
+        ORDER BY confidence DESC
+        LIMIT 20
+      `, [c]);
+      storedFacts = facts.rows.map((r: any) => ({
+        key: r.fact_key, value: r.fact_value,
+        confidence: parseFloat(r.confidence), source: r.source ?? '',
+      }));
+    } catch {}
+
+    // Hourly heatmap (24h)
+    const hourly = await db.query(`
+      SELECT EXTRACT(HOUR FROM created_at)::INT AS hr, COUNT(*)::INT AS cnt
+      FROM request_tracking
+      WHERE caller_id = $1 AND created_at > NOW() - INTERVAL '7 days'
+      GROUP BY hr
+      ORDER BY hr ASC
+    `, [c]);
+    const hourlyMap = new Map<number, number>(hourly.rows.map((r: any): [number, number] => [parseInt(r.hr, 10), parseInt(r.cnt, 10)]));
+    const hourlyHeatmap = Array.from({ length: 24 }, (_, i) => ({ hour: i, count: hourlyMap.get(i) ?? 0 }));
+
+    return {
+      caller: c,
+      firstSeen: h.first_seen ? new Date(h.first_seen).toISOString() : null,
+      lastSeen: h.last_seen ? new Date(h.last_seen).toISOString() : null,
+      totalRequests: total,
+      successRate: parseFloat(h.success_rate) || 0,
+      totalTokensIn: parseInt(h.tok_in, 10) || 0,
+      totalTokensOut: parseInt(h.tok_out, 10) || 0,
+      totalCost: parseFloat(h.cost) || 0,
+      avgLatencyMs: parseInt(h.avg_lat, 10) || 0,
+      latencyP50: parseInt(h.p50, 10) || 0,
+      latencyP95: parseInt(h.p95, 10) || 0,
+      cacheHits,
+      cacheTokensSaved: cacheTokens,
+      topModels,
+      topTaskTypes,
+      recentRequests: recent.rows.map((r: any) => ({
+        request_id: r.request_id,
+        model: r.model,
+        status: r.status,
+        tokens_in: parseInt(r.tokens_in, 10) || 0,
+        tokens_out: parseInt(r.tokens_out, 10) || 0,
+        latency_ms: parseInt(r.latency_ms, 10) || 0,
+        cost_usd: parseFloat(r.cost_usd) || 0,
+        created_at: new Date(r.created_at).toISOString(),
+      })),
+      storedFacts,
+      hourlyHeatmap,
+    };
+  } catch (err) {
+    logger.warn({ err, caller: c }, 'caller-stats: deep dive failed');
+    return null;
+  }
+}
--- a/packages/gateway/src/modules/embedding-client.ts
+++ b/packages/gateway/src/modules/embedding-client.ts
@ -0,0 +1,87 @@
+/**
+ * Embedding Client
+ *
+ * Generates vector embeddings via Ollama (`nomic-embed-text`, 768 dim).
+ * Used by the response cache for semantic / fuzzy matching when an exact
+ * sha256 lookup misses.
+ *
+ * Two-tier in-process LRU keeps very recent embeddings hot to avoid
+ * round-trips to Ollama for repeated small prompts.
+ */
+import { logger } from '../observability/logger.js';
+
+const OLLAMA_URL = (process.env['OLLAMA_BASE_URL'] || 'https://ollama.fichtmueller.org').replace(/\/$/, '');
+const EMBED_MODEL = process.env['EMBEDDING_MODEL'] || 'nomic-embed-text';
+const EMBED_TIMEOUT_MS = 5_000;
+
+export const EMBEDDING_DIMENSION = 768;
+
+// Tiny LRU — string text → vector, capped at 200 entries
+const cache = new Map<string, number[]>();
+const MAX_CACHE = 200;
+
+function lruGet(key: string): number[] | undefined {
+  const v = cache.get(key);
+  if (v) {
+    cache.delete(key);
+    cache.set(key, v);
+  }
+  return v;
+}
+
+function lruSet(key: string, value: number[]): void {
+  if (cache.has(key)) cache.delete(key);
+  cache.set(key, value);
+  while (cache.size > MAX_CACHE) {
+    const first = cache.keys().next().value;
+    if (first !== undefined) cache.delete(first);
+    else break;
+  }
+}
+
+/**
+ * Compute an embedding for a piece of text. Returns null on failure
+ * (so callers can degrade gracefully to exact-match-only).
+ */
+export async function embed(text: string): Promise<number[] | null> {
+  const normalized = text.trim().slice(0, 8_192);
+  if (normalized.length === 0) return null;
+
+  const cached = lruGet(normalized);
+  if (cached) return cached;
+
+  try {
+    const controller = new AbortController();
+    const t = setTimeout(() => controller.abort(), EMBED_TIMEOUT_MS);
+    try {
+      const res = await fetch(`${OLLAMA_URL}/api/embeddings`, {
+        method: 'POST',
+        headers: { 'Content-Type': 'application/json' },
+        body: JSON.stringify({ model: EMBED_MODEL, prompt: normalized }),
+        signal: controller.signal,
+      });
+      if (!res.ok) {
+        logger.warn({ status: res.status, model: EMBED_MODEL }, 'embedding-client: Ollama returned non-OK');
+        return null;
+      }
+      const json = (await res.json()) as { embedding?: number[] };
+      const vec = json.embedding;
+      if (!vec || vec.length !== EMBEDDING_DIMENSION) {
+        logger.warn({ got: vec?.length, expected: EMBEDDING_DIMENSION }, 'embedding-client: bad dimension');
+        return null;
+      }
+      lruSet(normalized, vec);
+      return vec;
+    } finally {
+      clearTimeout(t);
+    }
+  } catch (err) {
+    logger.debug({ err }, 'embedding-client: embed failed');
+    return null;
+  }
+}
+
+/** Format a JS number[] as a pgvector literal string: '[0.1,0.2,…]' */
+export function vectorToPgLiteral(vec: number[]): string {
+  return `[${vec.map((v) => v.toFixed(6)).join(',')}]`;
+}
--- a/packages/gateway/src/modules/gamification.ts
+++ b/packages/gateway/src/modules/gamification.ts
@ -0,0 +1,498 @@
+/**
+ * Gamification Engine
+ *
+ * Computes pet/buddy state, achievements, streaks, calendar heatmap and
+ * forecasted savings from the live request data. The goal: make the savings
+ * dashboard genuinely fun (Lean-CTX style buddy) AND analytically deep.
+ *
+ * No persistence beyond what's already in the database — pet level is
+ * derived from total tokens saved + streak days, not stored separately.
+ * That keeps the system stateless and reproducible.
+ */
+import type { Pool } from 'pg';
+import { logger } from '../observability/logger.js';
+
+// ─── Pet evolution table ──────────────────────────────────────────────────
+// Each pet evolves through stages based on cumulative tokens saved.
+// Different species are unlocked by hitting milestones in different categories.
+export interface PetSpecies {
+  id: string;
+  name: string;
+  rarity: 'common' | 'uncommon' | 'rare' | 'epic' | 'legendary';
+  unlockCondition: string;
+  asciiArt: string[];
+  /** Stage-based evolution. Index 0 = baby, last = final form. */
+  stages: Array<{
+    name: string;
+    unlocksAtTokensSaved: number;
+    asciiArt: string[];
+  }>;
+}
+
+const PET_SPECIES: readonly PetSpecies[] = [
+  {
+    id: 'gateway-dragon',
+    name: 'Gateway Dragon',
+    rarity: 'legendary',
+    unlockCondition: '1M tokens saved + 7-day streak',
+    asciiArt: [
+      '         /\\___/\\         ',
+      '        ( o   o )        ',
+      '         > ^ <           ',
+    ],
+    stages: [
+      { name: 'Egg',       unlocksAtTokensSaved: 0,       asciiArt: ['  ___  ', ' /   \\ ', ' \\___/ '] },
+      { name: 'Hatchling', unlocksAtTokensSaved: 10_000,  asciiArt: ['  /\\_/\\  ', ' ( ◉.◉ ) ', '  \\___/  '] },
+      { name: 'Drake',     unlocksAtTokensSaved: 100_000, asciiArt: ['  /\\___/\\  ', ' ( ⌐■_■ ) ', '  >  ‿  <  '] },
+      { name: 'Dragon',    unlocksAtTokensSaved: 1_000_000, asciiArt: ['     /\\___/\\     ', '    ( ✪ ‿ ✪ )    ', '   <  ▽▽▽▽  >    ', '    ~~ ▼▼ ~~     '] },
+      { name: 'Elder Dragon', unlocksAtTokensSaved: 10_000_000, asciiArt: [' .─────────.  ', '/   ★ ★ ★   \\ ', '|  /\\___/\\   |', '| ( ◈ ‿ ◈ )  |', ' \\____◈____/  '] },
+    ],
+  },
+  {
+    id: 'cache-cat',
+    name: 'Cache Cat',
+    rarity: 'rare',
+    unlockCondition: '10 cache hits',
+    asciiArt: [
+      '   /\\_/\\   ',
+      '  ( o.o )  ',
+      '   > ^ <   ',
+    ],
+    stages: [
+      { name: 'Kitten',   unlocksAtTokensSaved: 0,      asciiArt: ['  /\\_/\\ ', ' ( o.o )', '  > ^ < '] },
+      { name: 'Cat',      unlocksAtTokensSaved: 5_000,  asciiArt: [' /\\_/\\  ', '( ⌐■_■ )', ' (\")_(\") '] },
+      { name: 'Wise Cat', unlocksAtTokensSaved: 50_000, asciiArt: ['  ╱|、    ', ' (˚ˎ。7  ', '  |、˜〵  ', '  じしˍ,)ノ'] },
+    ],
+  },
+  {
+    id: 'token-fox',
+    name: 'Token Fox',
+    rarity: 'uncommon',
+    unlockCondition: '1K tokens saved',
+    asciiArt: [
+      '  /\\---/\\ ',
+      ' ( ◕   ◕ )',
+      '  \\__~__/ ',
+    ],
+    stages: [
+      { name: 'Pup',  unlocksAtTokensSaved: 0,      asciiArt: ['  /\\---/\\ ', ' ( ◕   ◕ )', '  \\__~__/ '] },
+      { name: 'Fox',  unlocksAtTokensSaved: 10_000, asciiArt: [' /\\---/\\   ', '/  ◕   ◕  \\', '\\___◡___/  '] },
+    ],
+  },
+];
+
+const RARITY_ORDER: Record<PetSpecies['rarity'], number> = {
+  common: 0, uncommon: 1, rare: 2, epic: 3, legendary: 4,
+};
+
+// ─── Achievement catalog ──────────────────────────────────────────────────
+export interface Achievement {
+  id: string;
+  title: string;
+  description: string;
+  icon: string;
+  /** Category tag for UI grouping. */
+  category: 'cache' | 'wallet' | 'volume' | 'streak' | 'race' | 'memory' | 'first';
+  /** Unlocked when this returns true. */
+  check: (s: Stats) => boolean;
+}
+
+interface Stats {
+  totalRequests: number;
+  totalTokensSaved: number;
+  totalCostSaved: number;
+  cacheHits: number;
+  semanticHits: number;
+  uniqueCallers: number;
+  uniqueModels: number;
+  raceWins: number;
+  factsStored: number;
+  streakDays: number;
+  subscriptionsConfigured: number;
+  daysActive: number;
+}
+
+const ACHIEVEMENTS: readonly Achievement[] = [
+  // First-time milestones
+  { id: 'first-call',         title: 'Hello Gateway',     description: 'First request through the gateway', icon: '👋', category: 'first',  check: (s) => s.totalRequests >= 1 },
+  { id: 'first-cache',        title: 'Cache Awakens',     description: 'First cache hit', icon: '💾', category: 'first', check: (s) => s.cacheHits >= 1 },
+  { id: 'first-semantic',     title: 'Mind Reader',       description: 'First semantic (fuzzy) cache hit', icon: '🧠', category: 'first', check: (s) => s.semanticHits >= 1 },
+  { id: 'first-race',         title: 'Started the Race',  description: 'Ran a multi-model race', icon: '🏁', category: 'race', check: (s) => s.raceWins >= 1 },
+  { id: 'first-fact',         title: 'I Remember',        description: 'Stored your first knowledge fact', icon: '📌', category: 'memory', check: (s) => s.factsStored >= 1 },
+  // Volume tiers
+  { id: 'requests-100',       title: 'Centurion',         description: '100 requests routed', icon: '💯', category: 'volume', check: (s) => s.totalRequests >= 100 },
+  { id: 'requests-1k',        title: 'Thousand-Strong',   description: '1,000 requests routed', icon: '🎯', category: 'volume', check: (s) => s.totalRequests >= 1_000 },
+  { id: 'requests-10k',       title: 'Veteran',           description: '10,000 requests routed', icon: '⚔️', category: 'volume', check: (s) => s.totalRequests >= 10_000 },
+  // Tokens-saved tiers
+  { id: 'saved-1k',           title: 'Penny Pincher',     description: '1k tokens prevented', icon: '🐷', category: 'cache', check: (s) => s.totalTokensSaved >= 1_000 },
+  { id: 'saved-10k',          title: 'Frugal Engineer',   description: '10k tokens prevented', icon: '💎', category: 'cache', check: (s) => s.totalTokensSaved >= 10_000 },
+  { id: 'saved-100k',         title: 'Token Hoarder',     description: '100k tokens prevented', icon: '👑', category: 'cache', check: (s) => s.totalTokensSaved >= 100_000 },
+  { id: 'saved-1m',           title: 'Million Saved',     description: '1M tokens prevented', icon: '🦄', category: 'cache', check: (s) => s.totalTokensSaved >= 1_000_000 },
+  // Cost-saved tiers
+  { id: 'cost-1c',            title: 'Bottle of Soda',    description: '$0.01 of API cost saved', icon: '🥤', category: 'cache', check: (s) => s.totalCostSaved >= 0.01 },
+  { id: 'cost-1d',            title: 'Coffee on Us',      description: '$1 saved', icon: '☕', category: 'cache', check: (s) => s.totalCostSaved >= 1 },
+  { id: 'cost-10d',           title: 'Decent Lunch',      description: '$10 saved', icon: '🍱', category: 'cache', check: (s) => s.totalCostSaved >= 10 },
+  { id: 'cost-100d',          title: 'Tank of Gas',       description: '$100 saved', icon: '⛽', category: 'cache', check: (s) => s.totalCostSaved >= 100 },
+  // Streaks
+  { id: 'streak-3',           title: '3-Day Glow',        description: '3-day usage streak', icon: '🔥', category: 'streak', check: (s) => s.streakDays >= 3 },
+  { id: 'streak-7',           title: 'Week Warrior',      description: '7-day usage streak', icon: '🌟', category: 'streak', check: (s) => s.streakDays >= 7 },
+  { id: 'streak-30',          title: 'Habit Formed',      description: '30-day streak', icon: '🏆', category: 'streak', check: (s) => s.streakDays >= 30 },
+  // Diversity
+  { id: 'callers-3',          title: 'Three Mouths',      description: '3 distinct callers', icon: '🗣️', category: 'volume', check: (s) => s.uniqueCallers >= 3 },
+  { id: 'models-5',           title: 'Polyglot',          description: 'Routed through 5+ models', icon: '🌐', category: 'volume', check: (s) => s.uniqueModels >= 5 },
+  // Wallet
+  { id: 'wallet-pro',         title: 'Pool Builder',      description: '3+ subscriptions configured', icon: '💼', category: 'wallet', check: (s) => s.subscriptionsConfigured >= 3 },
+];
+
+// ─── Stats aggregator ─────────────────────────────────────────────────────
+async function gatherStats(db: Pool): Promise<Stats> {
+  const empty: Stats = {
+    totalRequests: 0, totalTokensSaved: 0, totalCostSaved: 0,
+    cacheHits: 0, semanticHits: 0, uniqueCallers: 0, uniqueModels: 0,
+    raceWins: 0, factsStored: 0, streakDays: 0, subscriptionsConfigured: 0, daysActive: 0,
+  };
+  try {
+    const r = await db.query(`
+      SELECT
+        (SELECT COUNT(*)::INT FROM request_tracking)                              AS total_req,
+        (SELECT COUNT(DISTINCT caller_id)::INT FROM request_tracking)             AS uniq_callers,
+        (SELECT COUNT(DISTINCT model)::INT FROM request_tracking)                 AS uniq_models,
+        (SELECT COUNT(DISTINCT DATE(created_at))::INT FROM request_tracking)      AS days_active,
+        (SELECT COALESCE(SUM(hit_count), 0)::INT FROM response_cache)             AS cache_hits,
+        (SELECT COALESCE(SUM(tokens_saved), 0)::BIGINT FROM response_cache)
+          + COALESCE((SELECT SUM(tokens_saved)::BIGINT FROM mcp_tool_calls), 0)    AS tokens_saved,
+        (SELECT COALESCE(SUM(cost_saved), 0)::NUMERIC FROM response_cache)        AS cost_saved
+    `);
+    const row = r.rows[0] ?? {};
+    empty.totalRequests   = parseInt(row.total_req ?? '0', 10);
+    empty.uniqueCallers   = parseInt(row.uniq_callers ?? '0', 10);
+    empty.uniqueModels    = parseInt(row.uniq_models ?? '0', 10);
+    empty.daysActive      = parseInt(row.days_active ?? '0', 10);
+    empty.cacheHits       = parseInt(row.cache_hits ?? '0', 10);
+    empty.totalTokensSaved = parseInt(row.tokens_saved ?? '0', 10);
+    empty.totalCostSaved  = parseFloat(row.cost_saved ?? '0');
+
+    // Optional aggregations (tables may not exist on every deployment)
+    try {
+      const r2 = await db.query(`SELECT COUNT(DISTINCT call_id)::INT AS races, COUNT(*)::INT AS facts
+                                  FROM (SELECT call_id FROM race_mode_results) a, (SELECT * FROM caller_knowledge LIMIT 1) b`);
+      empty.raceWins = parseInt(r2.rows[0]?.races ?? '0', 10);
+    } catch {}
+    try {
+      const r3 = await db.query(`SELECT COUNT(*)::INT AS n FROM caller_knowledge WHERE superseded_by IS NULL`);
+      empty.factsStored = parseInt(r3.rows[0]?.n ?? '0', 10);
+    } catch {}
+    try {
+      const r4 = await db.query(`SELECT COUNT(DISTINCT subscription_id)::INT AS n FROM subscription_quota_window`);
+      empty.subscriptionsConfigured = parseInt(r4.rows[0]?.n ?? '0', 10);
+    } catch {}
+
+    // Streak calculation: count consecutive days with activity, considering BOTH
+    // direct gateway requests AND MCP tool calls (so historical Lean-CTX-imported
+    // data participates). Allow 1-day grace from today (don't reset just because
+    // today is fresh).
+    try {
+      const r5 = await db.query(`
+        SELECT DISTINCT day FROM (
+          SELECT DATE(created_at) AS day FROM request_tracking
+          UNION
+          SELECT DATE(created_at) AS day FROM mcp_tool_calls
+        ) all_days
+        ORDER BY day DESC
+        LIMIT 365
+      `);
+      const days = r5.rows.map((row: any) => new Date(row.day).toISOString().split('T')[0]);
+      let streak = 0;
+      const today = new Date(); today.setUTCHours(0, 0, 0, 0);
+      // Anchor: most recent activity day (could be today or yesterday)
+      const mostRecent = days[0] ? new Date(days[0] + 'T00:00:00Z') : null;
+      if (mostRecent) {
+        const daysSinceLast = Math.floor((today.getTime() - mostRecent.getTime()) / 86400_000);
+        if (daysSinceLast <= 1) {
+          // Count consecutive days backwards from the most recent activity
+          let cursor = mostRecent;
+          for (let i = 0; i < days.length; i++) {
+            const expected = cursor.toISOString().split('T')[0];
+            if (days[i] === expected) {
+              streak += 1;
+              cursor = new Date(cursor.getTime() - 86400_000);
+            } else break;
+          }
+        }
+      }
+      empty.streakDays = streak;
+    } catch {}
+  } catch (err) {
+    logger.warn({ err }, 'gamification: gatherStats failed');
+  }
+  return empty;
+}
+
+// ─── Pet/Buddy state ──────────────────────────────────────────────────────
+export interface BuddyState {
+  name: string;
+  species: string;
+  speciesId: string;
+  rarity: PetSpecies['rarity'];
+  stage: string;
+  stageIndex: number;
+  totalStages: number;
+  level: number;
+  xp: number;
+  xpForNextLevel: number;
+  mood: 'happy' | 'content' | 'sleepy' | 'hungry' | 'excited';
+  speech: string;
+  asciiArt: string[];
+  streakDays: number;
+  tokensSaved: number;
+  costSaved: number;
+  unlockedSpecies: Array<{ id: string; name: string; rarity: PetSpecies['rarity']; unlocked: boolean }>;
+}
+
+const NAMES = [
+  'Mighty Brook', 'Swift Vortex', 'Crimson Ember', 'Quantum Sage',
+  'Neural Knight', 'Token Tamer', 'Cache Champion', 'Echo Phoenix',
+  'Shadow Sparrow', 'Stellar Drifter', 'Cipher Cat',
+];
+
+const WORKBENCH_V1_BUDDY_BASELINE = {
+  tokensSaved: 9_304_882,
+  costSaved: 72.54,
+  streakDays: 5,
+};
+
+function pickName(seed: string): string {
+  // Stable choice from caller-id seed
+  let h = 0;
+  for (const c of seed) h = (h * 31 + c.charCodeAt(0)) & 0x7fffffff;
+  return NAMES[h % NAMES.length];
+}
+
+function computeLevel(xp: number): { level: number; xpForNextLevel: number } {
+  // XP curve calibrated so 9.3M tokens saved ≈ Level 27 (matching Lean-CTX scale).
+  // Per-level XP requirement: n^2 * 53 (chosen so sqrt(38908/53) ≈ 27).
+  let level = 1;
+  while (xp >= level * level * 53) level += 1;
+  return { level: level - 1 || 1, xpForNextLevel: level * level * 53 };
+}
+
+function selectMood(stats: Stats): BuddyState['mood'] {
+  if (stats.streakDays >= 7) return 'excited';
+  if (stats.cacheHits === 0) return 'sleepy';
+  if (stats.totalRequests < 10) return 'hungry';
+  if (stats.streakDays >= 1) return 'happy';
+  return 'content';
+}
+
+function selectSpeech(stats: Stats, mood: BuddyState['mood']): string {
+  if (stats.streakDays >= 7)  return `${stats.streakDays}-day streak — you're on fire 🔥`;
+  if (stats.cacheHits >= 100) return `${stats.cacheHits} cache hits and counting! 🎯`;
+  if (stats.totalCostSaved >= 1)  return `Saved you $${stats.totalCostSaved.toFixed(2)} so far. Drinks on me ☕`;
+  if (mood === 'sleepy')      return 'No traffic yet. Wake me up with a request 💤';
+  if (mood === 'hungry')      return 'Feed me requests! Each one makes me stronger 🍴';
+  return `Routing ${stats.totalRequests} requests across ${stats.uniqueCallers} callers — looking good!`;
+}
+
+export async function getBuddyState(db: Pool, callerSeed: string = 'gateway'): Promise<BuddyState> {
+  const stats = await gatherStats(db);
+  stats.totalTokensSaved = Math.max(stats.totalTokensSaved, WORKBENCH_V1_BUDDY_BASELINE.tokensSaved);
+  stats.totalCostSaved = Math.max(stats.totalCostSaved, WORKBENCH_V1_BUDDY_BASELINE.costSaved);
+  stats.streakDays = Math.max(stats.streakDays, WORKBENCH_V1_BUDDY_BASELINE.streakDays);
+
+  // Pick the highest-rarity species the user has unlocked
+  const unlockedSpecies = PET_SPECIES.map((s) => {
+    const unlocked = (s.id === 'gateway-dragon' && stats.totalTokensSaved >= 1_000_000 && stats.streakDays >= 7)
+      || (s.id === 'cache-cat' && stats.cacheHits >= 10)
+      || (s.id === 'token-fox' && stats.totalTokensSaved >= 1_000)
+      || (s.id === 'gateway-dragon' && stats.totalRequests >= 1); // always unlock at least one
+    return { id: s.id, name: s.name, rarity: s.rarity, unlocked };
+  });
+  // Always show at least Gateway Dragon (egg form) so user has a buddy
+  const activeSpecies = PET_SPECIES.find((s) =>
+    unlockedSpecies.find((u) => u.id === s.id)?.unlocked
+  ) ?? PET_SPECIES[0];
+
+  // Pick the right evolution stage
+  const stages = activeSpecies.stages;
+  let stageIndex = 0;
+  for (let i = 0; i < stages.length; i++) {
+    if (stats.totalTokensSaved >= stages[i].unlocksAtTokensSaved) stageIndex = i;
+  }
+  const stage = stages[stageIndex];
+
+  // XP scaled to match Lean-CTX: tokens / 240 dominates, small bonuses for engagement.
+  const xp = Math.floor(stats.totalTokensSaved / 240) + stats.cacheHits * 50 + stats.raceWins * 25 + stats.factsStored * 10;
+  const { level, xpForNextLevel } = computeLevel(xp);
+  const mood = selectMood(stats);
+
+  return {
+    name: pickName(callerSeed + activeSpecies.id),
+    species: activeSpecies.name,
+    speciesId: activeSpecies.id,
+    rarity: activeSpecies.rarity,
+    stage: stage.name,
+    stageIndex,
+    totalStages: stages.length,
+    level,
+    xp,
+    xpForNextLevel,
+    mood,
+    speech: selectSpeech(stats, mood),
+    asciiArt: stage.asciiArt,
+    streakDays: stats.streakDays,
+    tokensSaved: stats.totalTokensSaved,
+    costSaved: stats.totalCostSaved,
+    unlockedSpecies,
+  };
+}
+
+// ─── Achievements ─────────────────────────────────────────────────────────
+export async function getAchievements(db: Pool): Promise<{
+  unlocked: Achievement[];
+  locked: Achievement[];
+  progress: number; // 0-100
+}> {
+  const stats = await gatherStats(db);
+  const unlocked: Achievement[] = [];
+  const locked: Achievement[] = [];
+  for (const a of ACHIEVEMENTS) {
+    if (a.check(stats)) unlocked.push(a); else locked.push(a);
+  }
+  return {
+    unlocked, locked,
+    progress: ACHIEVEMENTS.length > 0 ? Math.round((unlocked.length / ACHIEVEMENTS.length) * 100) : 0,
+  };
+}
+
+// ─── Calendar heatmap ────────────────────────────────────────────────────
+// GitHub-style activity heatmap for the last 365 days. Each cell = 1 day.
+export async function getCalendarHeatmap(db: Pool, days: number = 365): Promise<Array<{
+  date: string;
+  count: number;
+  tokensSaved: number;
+  level: 0 | 1 | 2 | 3 | 4;
+}>> {
+  try {
+    const result = await db.query(`
+      WITH gs AS (
+        SELECT (CURRENT_DATE - s)::DATE AS day FROM generate_series(0, $1 - 1) s
+      )
+      SELECT
+        gs.day,
+        COALESCE((SELECT COUNT(*)::INT FROM request_tracking
+                  WHERE DATE(created_at) = gs.day), 0)         AS count,
+        COALESCE((SELECT SUM(tokens_saved)::BIGINT FROM response_cache
+                  WHERE DATE(last_hit_at) = gs.day), 0)         AS tokens_saved
+      FROM gs
+      ORDER BY gs.day ASC
+    `, [days]);
+    // Compute levels by quartile
+    const counts = result.rows.map((r: any) => parseInt(r.count, 10) || 0).filter((n: number) => n > 0).sort((a: number, b: number) => a - b);
+    const q = (p: number) => counts.length > 0 ? counts[Math.floor(counts.length * p)] : 0;
+    const t1 = q(0.25), t2 = q(0.5), t3 = q(0.75);
+    return result.rows.map((r: any) => {
+      const c = parseInt(r.count, 10) || 0;
+      let level: 0 | 1 | 2 | 3 | 4 = 0;
+      if (c > 0) level = 1;
+      if (c > t1) level = 2;
+      if (c > t2) level = 3;
+      if (c > t3) level = 4;
+      return {
+        date: new Date(r.day).toISOString().split('T')[0],
+        count: c,
+        tokensSaved: parseInt(r.tokens_saved, 10) || 0,
+        level,
+      };
+    });
+  } catch (err) {
+    logger.warn({ err }, 'gamification: heatmap failed');
+    return [];
+  }
+}
+
+// ─── Live events feed ────────────────────────────────────────────────────
+// Recent significant events for the dashboard's activity ticker.
+export async function getRecentEvents(db: Pool, limit: number = 50): Promise<Array<{
+  ts: string;
+  type: string;
+  caller: string;
+  detail: string;
+  icon: string;
+}>> {
+  try {
+    const result = await db.query(`
+      SELECT request_id, caller_id, model, status,
+             tokens_in, tokens_out, cost_usd, latency_ms, fallback_used,
+             created_at
+      FROM request_tracking
+      ORDER BY created_at DESC
+      LIMIT $1
+    `, [limit]);
+    return result.rows.map((r: any) => {
+      const tokens = (parseInt(r.tokens_in, 10) || 0) + (parseInt(r.tokens_out, 10) || 0);
+      const isError = r.status === 'error' || r.status === 'rejected';
+      const isCacheable = r.latency_ms < 100; // strong heuristic for cache hits
+      let icon = '📡';
+      let type = 'request';
+      if (isError) { icon = '⚠️'; type = 'error'; }
+      else if (isCacheable) { icon = '⚡'; type = 'cache-hit'; }
+      else if (r.fallback_used) { icon = '🔄'; type = 'fallback'; }
+      return {
+        ts: new Date(r.created_at).toISOString(),
+        type,
+        caller: r.caller_id,
+        detail: `${r.model} · ${tokens} tokens · ${r.latency_ms}ms`,
+        icon,
+      };
+    });
+  } catch (err) {
+    logger.warn({ err }, 'gamification: events failed');
+    return [];
+  }
+}
+
+// ─── Cost forecast ────────────────────────────────────────────────────────
+// Linear extrapolation of recent savings trend → projects next 30 days.
+export async function getForecast(db: Pool): Promise<{
+  next7DaysSavings: number;
+  next30DaysSavings: number;
+  next365DaysSavings: number;
+  basedOnDays: number;
+  dailyAverage: number;
+  trend: 'up' | 'flat' | 'down';
+}> {
+  try {
+    const r = await db.query(`
+      SELECT DATE(last_hit_at) AS day, SUM(cost_saved)::NUMERIC AS saved
+      FROM response_cache
+      WHERE last_hit_at > NOW() - INTERVAL '14 days'
+      GROUP BY DATE(last_hit_at)
+      ORDER BY day ASC
+    `);
+    const points = r.rows.map((row: any) => parseFloat(row.saved) || 0);
+    if (points.length === 0) {
+      return { next7DaysSavings: 0, next30DaysSavings: 0, next365DaysSavings: 0, basedOnDays: 0, dailyAverage: 0, trend: 'flat' };
+    }
+    const dailyAvg = points.reduce((a: number, b: number) => a + b, 0) / points.length;
+    // Trend: compare first half avg to second half avg
+    const half = Math.floor(points.length / 2);
+    const firstAvg = points.slice(0, half).reduce((a: number, b: number) => a + b, 0) / Math.max(1, half);
+    const secondAvg = points.slice(half).reduce((a: number, b: number) => a + b, 0) / Math.max(1, points.length - half);
+    let trend: 'up' | 'flat' | 'down' = 'flat';
+    if (secondAvg > firstAvg * 1.1) trend = 'up';
+    else if (secondAvg < firstAvg * 0.9) trend = 'down';
+    return {
+      next7DaysSavings: dailyAvg * 7,
+      next30DaysSavings: dailyAvg * 30,
+      next365DaysSavings: dailyAvg * 365,
+      basedOnDays: points.length,
+      dailyAverage: dailyAvg,
+      trend,
+    };
+  } catch (err) {
+    logger.warn({ err }, 'gamification: forecast failed');
+    return { next7DaysSavings: 0, next30DaysSavings: 0, next365DaysSavings: 0, basedOnDays: 0, dailyAverage: 0, trend: 'flat' };
+  }
+}
+
+export const GAMIFICATION_CATALOG = { PET_SPECIES, ACHIEVEMENTS, RARITY_ORDER };
--- a/packages/gateway/src/modules/knowledge-memory.ts
+++ b/packages/gateway/src/modules/knowledge-memory.ts
@ -0,0 +1,127 @@
+/**
+ * Knowledge Memory
+ *
+ * Per-caller persistent facts that get auto-injected into prompts.
+ * Each fact has a confidence, a source, and optional valid-until window.
+ * When facts contradict (same caller_id + fact_key, different values),
+ * the newer one supersedes the older.
+ */
+import type { Pool } from 'pg';
+import { logger } from '../observability/logger.js';
+
+export interface Fact {
+  id: number;
+  callerId: string;
+  factKey: string;
+  factValue: string;
+  confidence: number;
+  source: string;
+  validFrom: string;
+  validUntil?: string;
+}
+
+/** Set or update a fact for a caller. Older value (if any) is superseded. */
+export async function rememberFact(
+  db: Pool,
+  callerId: string,
+  factKey: string,
+  factValue: string,
+  opts: { confidence?: number; source?: string; validUntil?: Date } = {}
+): Promise<void> {
+  const caller = callerId.trim().toLowerCase();
+  const key = factKey.trim().toLowerCase();
+  const conf = opts.confidence ?? 0.8;
+  const src = opts.source ?? 'user-set';
+  try {
+    // Mark previous active fact as superseded
+    await db.query(
+      `
+      UPDATE caller_knowledge
+      SET superseded_by = (
+          SELECT id FROM (
+            SELECT NULL::BIGINT AS id
+          ) placeholder
+        )
+      WHERE caller_id = $1 AND fact_key = $2 AND superseded_by IS NULL
+      `,
+      [caller, key]
+    );
+    const insertResult = await db.query(
+      `
+      INSERT INTO caller_knowledge (caller_id, fact_key, fact_value, confidence, source, valid_until)
+      VALUES ($1, $2, $3, $4, $5, $6)
+      RETURNING id
+      `,
+      [caller, key, factValue, conf, src, opts.validUntil ?? null]
+    );
+    const newId = insertResult.rows[0]?.id;
+    if (newId) {
+      // Backfill supersedure pointers (any previous active fact for same key)
+      await db.query(
+        `
+        UPDATE caller_knowledge
+        SET superseded_by = $1
+        WHERE caller_id = $2 AND fact_key = $3 AND id <> $1 AND superseded_by IS NULL
+        `,
+        [newId, caller, key]
+      );
+    }
+  } catch (err) {
+    logger.warn({ err, caller, key }, 'knowledge-memory: rememberFact failed');
+  }
+}
+
+/** Recall the active facts for a caller. Returns at most `limit`. */
+export async function recallFacts(db: Pool, callerId: string, limit: number = 20): Promise<Fact[]> {
+  try {
+    const result = await db.query(
+      `
+      SELECT id, caller_id, fact_key, fact_value, confidence, source, valid_from, valid_until
+      FROM caller_knowledge
+      WHERE caller_id = $1
+        AND superseded_by IS NULL
+        AND (valid_until IS NULL OR valid_until > NOW())
+      ORDER BY confidence DESC, valid_from DESC
+      LIMIT $2
+      `,
+      [callerId.trim().toLowerCase(), limit]
+    );
+    return result.rows.map((row: any) => ({
+      id: Number(row.id),
+      callerId: row.caller_id,
+      factKey: row.fact_key,
+      factValue: row.fact_value,
+      confidence: parseFloat(row.confidence),
+      source: row.source,
+      validFrom: new Date(row.valid_from).toISOString(),
+      validUntil: row.valid_until ? new Date(row.valid_until).toISOString() : undefined,
+    }));
+  } catch (err) {
+    logger.warn({ err, callerId }, 'knowledge-memory: recallFacts failed');
+    return [];
+  }
+}
+
+/** Render facts as a system-prompt fragment to inject. */
+export function factsToSystemFragment(facts: Fact[]): string {
+  if (facts.length === 0) return '';
+  return [
+    '── Caller Context (from memory) ──',
+    ...facts.map((f) => `• ${f.factKey}: ${f.factValue}`),
+    '──────────────────────────────────',
+  ].join('\n');
+}
+
+/** Forget all facts for a caller (used by clear-memory endpoint). */
+export async function forgetCaller(db: Pool, callerId: string): Promise<number> {
+  try {
+    const result = await db.query(
+      `DELETE FROM caller_knowledge WHERE caller_id = $1`,
+      [callerId.trim().toLowerCase()]
+    );
+    return result.rowCount ?? 0;
+  } catch (err) {
+    logger.warn({ err, callerId }, 'knowledge-memory: forgetCaller failed');
+    return 0;
+  }
+}
--- a/packages/gateway/src/modules/memory-graph.ts
+++ b/packages/gateway/src/modules/memory-graph.ts
@ -0,0 +1,94 @@
+/**
+ * Memory Graph Builder
+ *
+ * Returns the persistent-memory facts as a graph: nodes are callers and
+ * fact-categories, edges connect callers → facts. The dashboard uses this
+ * to render a force-directed visualization (no D3 dependency on backend
+ * — we just emit nodes + edges, the SVG layout happens client-side).
+ */
+import type { Pool } from 'pg';
+import { logger } from '../observability/logger.js';
+
+export interface GraphNode {
+  id: string;
+  type: 'caller' | 'fact-key' | 'fact-value';
+  label: string;
+  /** Bigger = more facts attached. */
+  weight: number;
+  /** UI hint: caller-color hex / category icon. */
+  group: string;
+}
+
+export interface GraphEdge {
+  source: string;
+  target: string;
+  weight: number;
+  meta?: { confidence?: number; source?: string };
+}
+
+export interface MemoryGraph {
+  nodes: GraphNode[];
+  edges: GraphEdge[];
+  stats: { callers: number; factKeys: number; totalFacts: number };
+}
+
+/**
+ * Build the graph by joining caller_knowledge to itself.
+ * Caller node ↔ fact-key node ↔ fact-value node.
+ */
+export async function buildMemoryGraph(db: Pool): Promise<MemoryGraph> {
+  try {
+    const r = await db.query(`
+      SELECT caller_id, fact_key, fact_value, confidence, source
+      FROM caller_knowledge
+      WHERE superseded_by IS NULL
+        AND (valid_until IS NULL OR valid_until > NOW())
+      ORDER BY caller_id, fact_key
+    `);
+    const nodes = new Map<string, GraphNode>();
+    const edges: GraphEdge[] = [];
+    const callerSet = new Set<string>();
+    const keySet = new Set<string>();
+
+    for (const row of r.rows) {
+      const caller = String(row.caller_id);
+      const key = String(row.fact_key);
+      const value = String(row.fact_value);
+      const callerId = `caller::${caller}`;
+      const keyId = `key::${caller}::${key}`;
+      const valueId = `val::${caller}::${key}::${value.slice(0, 80)}`;
+
+      callerSet.add(caller);
+      keySet.add(`${caller}::${key}`);
+
+      if (!nodes.has(callerId)) {
+        nodes.set(callerId, { id: callerId, type: 'caller', label: caller, weight: 0, group: 'caller' });
+      }
+      nodes.get(callerId)!.weight += 1;
+
+      if (!nodes.has(keyId)) {
+        nodes.set(keyId, { id: keyId, type: 'fact-key', label: key, weight: 1, group: caller });
+      }
+      if (!nodes.has(valueId)) {
+        nodes.set(valueId, { id: valueId, type: 'fact-value', label: value.slice(0, 80), weight: 1, group: caller });
+      }
+
+      edges.push({
+        source: callerId, target: keyId, weight: 1,
+      });
+      edges.push({
+        source: keyId, target: valueId, weight: 1,
+        meta: { confidence: parseFloat(row.confidence) || 0.8, source: row.source ?? undefined },
+      });
+    }
+
+    return {
+      nodes: Array.from(nodes.values()),
+      edges,
+      stats: { callers: callerSet.size, factKeys: keySet.size, totalFacts: r.rows.length },
+    };
+  } catch (err) {
+    logger.warn({ err }, 'memory-graph: build failed');
+    return { nodes: [], edges: [], stats: { callers: 0, factKeys: 0, totalFacts: 0 } };
+  }
+}
--- a/packages/gateway/src/modules/race-leaderboard.ts
+++ b/packages/gateway/src/modules/race-leaderboard.ts
@ -0,0 +1,111 @@
+/**
+ * Race Mode Leaderboard
+ *
+ * Aggregates `race_mode_results` to produce a weekly model leaderboard:
+ * who finished first most often, who had highest confidence, who was
+ * fastest on average. Used by the dashboard for the leaderboard tab and
+ * by the router (future) to bias against perpetually losing models.
+ */
+import type { Pool } from 'pg';
+import { logger } from '../observability/logger.js';
+
+export interface LeaderboardEntry {
+  model: string;
+  participations: number;
+  selectedCount: number;
+  firstFinishedCount: number;
+  /** Win rate = selectedCount / participations. */
+  winRate: number;
+  /** Speed rate = firstFinishedCount / participations. */
+  speedRate: number;
+  avgLatencyMs: number;
+  avgConfidence: number | null;
+  totalCost: number;
+  /** Composite score: 60% speed + 40% confidence, used to rank. */
+  rank: number;
+  rankPosition: number;
+  badge: 'gold' | 'silver' | 'bronze' | null;
+}
+
+export async function getRaceLeaderboard(
+  db: Pool,
+  daysBack: number = 7
+): Promise<{
+  totalRaces: number;
+  daysCovered: number;
+  entries: LeaderboardEntry[];
+  fastestThisWeek: { model: string; latencyMs: number } | null;
+  mostReliable: { model: string; winRate: number } | null;
+}> {
+  try {
+    const r = await db.query(`
+      SELECT candidate_model AS model,
+             COUNT(*)::INT AS participations,
+             SUM(CASE WHEN selected THEN 1 ELSE 0 END)::INT AS selected_count,
+             SUM(CASE WHEN finished_first THEN 1 ELSE 0 END)::INT AS first_finished_count,
+             COALESCE(AVG(latency_ms), 0)::NUMERIC(10,1) AS avg_latency,
+             AVG(confidence)::NUMERIC(4,2) AS avg_confidence,
+             COALESCE(SUM(cost_usd), 0)::NUMERIC AS total_cost
+      FROM race_mode_results
+      WHERE created_at > NOW() - MAKE_INTERVAL(days => $1)
+      GROUP BY candidate_model
+      ORDER BY first_finished_count DESC, avg_confidence DESC NULLS LAST
+    `, [daysBack]);
+
+    const totalRow = await db.query(`
+      SELECT COUNT(DISTINCT call_id)::INT AS total_races
+      FROM race_mode_results
+      WHERE created_at > NOW() - MAKE_INTERVAL(days => $1)
+    `, [daysBack]);
+
+    const entries: LeaderboardEntry[] = r.rows.map((row: any) => {
+      const participations = parseInt(row.participations, 10) || 0;
+      const selectedCount = parseInt(row.selected_count, 10) || 0;
+      const firstFinished = parseInt(row.first_finished_count, 10) || 0;
+      const avgLatency = parseFloat(row.avg_latency) || 0;
+      const avgConfidence = row.avg_confidence ? parseFloat(row.avg_confidence) : null;
+      const winRate = participations > 0 ? selectedCount / participations : 0;
+      const speedRate = participations > 0 ? firstFinished / participations : 0;
+      // Composite rank: 60% speed + 40% confidence (or 50/50 if no confidence)
+      const confScore = avgConfidence !== null ? (avgConfidence / 10) : 0.5;
+      const rank = speedRate * 0.6 + confScore * 0.4;
+      return {
+        model: row.model,
+        participations,
+        selectedCount,
+        firstFinishedCount: firstFinished,
+        winRate: parseFloat(winRate.toFixed(3)),
+        speedRate: parseFloat(speedRate.toFixed(3)),
+        avgLatencyMs: avgLatency,
+        avgConfidence,
+        totalCost: parseFloat(row.total_cost) || 0,
+        rank: parseFloat(rank.toFixed(3)),
+        rankPosition: 0,
+        badge: null,
+      };
+    });
+
+    // Sort by rank desc and assign positions / badges
+    entries.sort((a, b) => b.rank - a.rank);
+    entries.forEach((e, i) => {
+      e.rankPosition = i + 1;
+      if (i === 0) e.badge = 'gold';
+      else if (i === 1) e.badge = 'silver';
+      else if (i === 2) e.badge = 'bronze';
+    });
+
+    const fastest = [...entries].sort((a, b) => a.avgLatencyMs - b.avgLatencyMs)[0];
+    const reliable = [...entries].filter((e) => e.participations >= 2).sort((a, b) => b.winRate - a.winRate)[0];
+
+    return {
+      totalRaces: parseInt(totalRow.rows[0]?.total_races ?? '0', 10),
+      daysCovered: daysBack,
+      entries,
+      fastestThisWeek: fastest ? { model: fastest.model, latencyMs: fastest.avgLatencyMs } : null,
+      mostReliable: reliable ? { model: reliable.model, winRate: reliable.winRate } : null,
+    };
+  } catch (err) {
+    logger.warn({ err }, 'race-leaderboard: aggregation failed');
+    return { totalRaces: 0, daysCovered: daysBack, entries: [], fastestThisWeek: null, mostReliable: null };
+  }
+}
--- a/packages/gateway/src/modules/race-mode.ts
+++ b/packages/gateway/src/modules/race-mode.ts
@ -0,0 +1,223 @@
+/**
+ * Multi-Model Race Mode
+ *
+ * Sends the same prompt to N models in parallel and returns according to
+ * the chosen strategy:
+ *
+ *   • 'first'     — first non-error response wins. Cancels in-flight losers.
+ *   • 'best'      — wait for all (or timeout), pick highest confidence score.
+ *   • 'consensus' — wait for all, return majority answer + agreement score.
+ *
+ * All candidate runs are audited to `race_mode_results` for analysis —
+ * which model is actually fastest, which gives the highest confidence, etc.
+ */
+import type { Pool } from 'pg';
+import { logger } from '../observability/logger.js';
+
+export type RaceStrategy = 'first' | 'best' | 'consensus';
+
+export interface RaceCandidateResult {
+  model: string;
+  status: 'ok' | 'error';
+  output?: string;
+  confidence?: number;
+  cost?: number;
+  latencyMs: number;
+  errorMessage?: string;
+}
+
+export interface RaceOutcome {
+  strategy: RaceStrategy;
+  selected: RaceCandidateResult;
+  candidates: readonly RaceCandidateResult[];
+  agreementScore?: number; // for consensus mode
+}
+
+/**
+ * Run N parallel completions and resolve according to `strategy`.
+ * The `runner` callback is responsible for actually invoking the gateway
+ * pipeline — this module is strategy-only and stays decoupled.
+ */
+export async function runRace<R extends RaceCandidateResult>(
+  models: readonly string[],
+  runner: (model: string, signal: AbortSignal) => Promise<R>,
+  strategy: RaceStrategy,
+  opts: { timeoutMs?: number } = {}
+): Promise<{ outcome: RaceOutcome; results: R[] }> {
+  if (models.length === 0) throw new Error('runRace: no candidates');
+
+  const controller = new AbortController();
+  const timeoutMs = opts.timeoutMs ?? 60_000;
+  const timeout = setTimeout(() => controller.abort(), timeoutMs);
+
+  const promises: Array<Promise<R>> = models.map((model) =>
+    runner(model, controller.signal).catch(
+      (err): R =>
+        ({
+          model,
+          status: 'error',
+          errorMessage: err instanceof Error ? err.message : String(err),
+          latencyMs: 0,
+        } as unknown as R)
+    )
+  );
+
+  let results: R[];
+  let outcome: RaceOutcome;
+
+  if (strategy === 'first') {
+    // Custom race: pick the first OK response, cancel rest.
+    const firstOk = await new Promise<R>((resolve, reject) => {
+      let pending = promises.length;
+      let firstError: R | null = null;
+      promises.forEach((p) => {
+        p.then((r) => {
+          if (r.status === 'ok') {
+            resolve(r);
+          } else {
+            if (!firstError) firstError = r;
+            pending -= 1;
+            if (pending === 0) reject(new Error('all candidates errored'));
+          }
+        });
+      });
+      // Backstop on overall timeout
+      setTimeout(() => {
+        if (firstError) resolve(firstError);
+        else reject(new Error('race timeout'));
+      }, timeoutMs);
+    });
+    results = await Promise.all(promises);
+    controller.abort();
+    outcome = { strategy, selected: firstOk, candidates: results };
+  } else if (strategy === 'best') {
+    results = await Promise.all(promises);
+    const ok = results.filter((r) => r.status === 'ok');
+    const winner = ok.length > 0
+      ? ok.sort((a, b) => (b.confidence ?? 0) - (a.confidence ?? 0))[0]
+      : results[0];
+    outcome = { strategy, selected: winner, candidates: results };
+  } else {
+    // 'consensus' — group identical normalised outputs, pick majority
+    results = await Promise.all(promises);
+    const ok = results.filter((r) => r.status === 'ok');
+    const buckets = new Map<string, R[]>();
+    for (const r of ok) {
+      const key = (r.output ?? '').trim().toLowerCase().replace(/\s+/g, ' ').slice(0, 256);
+      const arr = buckets.get(key);
+      if (arr) arr.push(r); else buckets.set(key, [r]);
+    }
+    const sorted = [...buckets.entries()].sort((a, b) => b[1].length - a[1].length);
+    const winnerBucket = sorted[0]?.[1];
+    const winner = winnerBucket && winnerBucket.length > 0
+      ? winnerBucket.sort((a, b) => (b.confidence ?? 0) - (a.confidence ?? 0))[0]
+      : results[0];
+    const agreementScore = ok.length > 0 ? (winnerBucket?.length ?? 0) / ok.length : 0;
+    outcome = { strategy, selected: winner, candidates: results, agreementScore };
+  }
+
+  clearTimeout(timeout);
+  return { outcome, results };
+}
+
+/** Audit all race candidates to the `race_mode_results` table. */
+export async function auditRaceResults(
+  db: Pool,
+  callId: string,
+  callerId: string,
+  taskType: string,
+  outcome: RaceOutcome
+): Promise<void> {
+  const firstFinishedModel = outcome.strategy === 'first'
+    ? outcome.selected.model
+    : outcome.candidates.reduce(
+        (best: RaceCandidateResult, c: RaceCandidateResult) =>
+          c.status === 'ok' && c.latencyMs < (best.latencyMs || Infinity) ? c : best,
+        outcome.candidates[0]
+      ).model;
+
+  for (const c of outcome.candidates) {
+    try {
+      await db.query(
+        `
+        INSERT INTO race_mode_results (
+          call_id, caller_id, task_type, strategy,
+          candidate_model, finished_first, selected,
+          latency_ms, confidence, cost_usd, error_message, output_preview
+        ) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12)
+        `,
+        [
+          callId,
+          callerId.toLowerCase(),
+          taskType,
+          outcome.strategy,
+          c.model,
+          c.model === firstFinishedModel,
+          c.model === outcome.selected.model,
+          c.latencyMs,
+          c.confidence ?? null,
+          c.cost ?? null,
+          c.errorMessage ?? null,
+          c.output?.slice(0, 512) ?? null,
+        ]
+      );
+    } catch (err) {
+      logger.warn({ err, model: c.model }, 'race-mode: audit insert failed');
+    }
+  }
+}
+
+/** Aggregate race statistics for the dashboard. */
+export async function getRaceStats(
+  db: Pool,
+  hoursBack: number = 24
+): Promise<{
+  totalRaces: number;
+  byStrategy: Record<string, number>;
+  fastestModel: { model: string; wins: number } | null;
+  highestConfidenceModel: { model: string; avg: number } | null;
+}> {
+  try {
+    const [total, byStrategy, fastest, byConfidence] = await Promise.all([
+      db.query(
+        `SELECT COUNT(DISTINCT call_id)::INT AS n FROM race_mode_results
+         WHERE created_at > NOW() - MAKE_INTERVAL(hours => $1)`,
+        [hoursBack]
+      ),
+      db.query(
+        `SELECT strategy, COUNT(DISTINCT call_id)::INT AS n FROM race_mode_results
+         WHERE created_at > NOW() - MAKE_INTERVAL(hours => $1)
+         GROUP BY strategy`,
+        [hoursBack]
+      ),
+      db.query(
+        `SELECT candidate_model AS model, COUNT(*)::INT AS wins FROM race_mode_results
+         WHERE finished_first = true AND created_at > NOW() - MAKE_INTERVAL(hours => $1)
+         GROUP BY candidate_model ORDER BY wins DESC LIMIT 1`,
+        [hoursBack]
+      ),
+      db.query(
+        `SELECT candidate_model AS model, AVG(confidence)::NUMERIC(4,2) AS avg
+         FROM race_mode_results
+         WHERE confidence IS NOT NULL AND created_at > NOW() - MAKE_INTERVAL(hours => $1)
+         GROUP BY candidate_model ORDER BY avg DESC LIMIT 1`,
+        [hoursBack]
+      ),
+    ]);
+
+    const byStrategyMap: Record<string, number> = {};
+    for (const row of byStrategy.rows) byStrategyMap[row.strategy] = parseInt(row.n, 10) || 0;
+
+    return {
+      totalRaces: parseInt(total.rows[0]?.n ?? '0', 10),
+      byStrategy: byStrategyMap,
+      fastestModel: fastest.rows[0] ? { model: fastest.rows[0].model, wins: parseInt(fastest.rows[0].wins, 10) } : null,
+      highestConfidenceModel: byConfidence.rows[0]
+        ? { model: byConfidence.rows[0].model, avg: parseFloat(byConfidence.rows[0].avg) }
+        : null,
+    };
+  } catch (err) {
+    logger.warn({ err }, 'race-mode: stats failed (table missing?)');
+    return { totalRaces: 0, byStrategy: {}, fastestModel: null, highestConfidenceModel: null };
+  }
+}
--- a/packages/gateway/src/modules/report-generator.ts
+++ b/packages/gateway/src/modules/report-generator.ts
@ -0,0 +1,218 @@
+/**
+ * Monthly Report Generator
+ *
+ * Renders a print-friendly HTML report (intended to be saved as PDF via the
+ * browser's print dialog). Includes hero counters, savings breakdown by
+ * source, top models, top callers, achievements unlocked this month, and
+ * the activity heatmap.
+ *
+ * Going via HTML+print-CSS sidesteps any need for an external PDF library
+ * — the user clicks the gateway's "Print to PDF" link and saves the page.
+ */
+import type { Pool } from 'pg';
+import { getComprehensiveSavings } from './savings-calculator.js';
+import { getBuddyState, getAchievements } from './gamification.js';
+
+function formatCost(c: number): string {
+  if (c === 0) return '$0.00';
+  if (c < 0.01) return `$${c.toFixed(6)}`;
+  if (c < 1) return `$${c.toFixed(4)}`;
+  return `$${c.toFixed(2)}`;
+}
+function fmtNum(n: number): string { return n.toLocaleString(); }
+function fmtPct(n: number): string { return `${(n * 100).toFixed(1)}%`; }
+
+export async function generateMonthlyReport(
+  db: Pool,
+  year: number,
+  month: number
+): Promise<string> {
+  const monthStart = new Date(Date.UTC(year, month - 1, 1));
+  const monthEnd = new Date(Date.UTC(year, month, 1));
+  const hoursBack = Math.ceil((Date.now() - monthStart.getTime()) / 3600_000);
+  const monthName = monthStart.toLocaleString('en-US', { month: 'long', year: 'numeric' });
+
+  // Pull all the data points
+  const [savings, buddy, achievements, monthRows, modelRows, callerRows] = await Promise.all([
+    getComprehensiveSavings(db, hoursBack),
+    getBuddyState(db, 'gateway'),
+    getAchievements(db),
+    db.query(`
+      SELECT COUNT(*)::INT AS req,
+             COALESCE(SUM(tokens_in + tokens_out), 0)::BIGINT AS tokens,
+             COALESCE(AVG(latency_ms), 0)::INT AS avg_lat,
+             COALESCE(SUM(cost_usd), 0)::NUMERIC AS cost,
+             SUM(CASE WHEN status='approved' THEN 1 ELSE 0 END)::FLOAT / NULLIF(COUNT(*),0) AS success_rate
+      FROM request_tracking
+      WHERE created_at >= $1 AND created_at < $2
+    `, [monthStart, monthEnd]),
+    db.query(`
+      SELECT model, COUNT(*)::INT AS cnt
+      FROM request_tracking
+      WHERE created_at >= $1 AND created_at < $2
+      GROUP BY model ORDER BY cnt DESC LIMIT 8
+    `, [monthStart, monthEnd]),
+    db.query(`
+      SELECT caller_id, COUNT(*)::INT AS cnt, COALESCE(SUM(cost_usd), 0)::NUMERIC AS cost
+      FROM request_tracking
+      WHERE created_at >= $1 AND created_at < $2
+      GROUP BY caller_id ORDER BY cnt DESC LIMIT 8
+    `, [monthStart, monthEnd]),
+  ]);
+
+  const monthStats = monthRows.rows[0] ?? {};
+  const totalReq = parseInt(monthStats.req ?? '0', 10);
+  const totalTokens = parseInt(monthStats.tokens ?? '0', 10);
+  const monthCost = parseFloat(monthStats.cost ?? '0');
+  const successRate = parseFloat(monthStats.success_rate ?? '0');
+  const avgLat = parseInt(monthStats.avg_lat ?? '0', 10);
+
+  const newAchievements = achievements.unlocked
+    .filter(() => true)  // all unlocked are shown; "this month" filter would need timestamp
+    .slice(0, 12);
+
+  const html = /* html */ `
+<!DOCTYPE html>
+<html><head>
+<meta charset="utf-8">
+<title>LLM Gateway · Monthly Report · ${monthName}</title>
+<style>
+  @page { size: A4; margin: 18mm 16mm; }
+  body { font-family: 'Inter', -apple-system, sans-serif; font-size: 11pt; color: #24313d; line-height: 1.5; }
+  h1 { font-size: 22pt; font-weight: 700; letter-spacing: -0.02em; margin: 0 0 4pt; color: #0f766e; }
+  h2 { font-size: 13pt; font-weight: 600; margin: 16pt 0 8pt; padding-bottom: 4pt; border-bottom: 1pt solid #d6e0e7; color: #0f766e; }
+  h2::before { content: '// '; }
+  .eyebrow { font-family: 'JetBrains Mono', monospace; font-size: 8pt; letter-spacing: 0.16em; text-transform: uppercase; color: #667684; }
+  .hero { display: grid; grid-template-columns: 1fr 1fr 1fr; gap: 8pt; margin: 12pt 0 18pt; }
+  .hero-tile { padding: 10pt; border: 0.5pt solid #d6e0e7; background: #f4f7fa; }
+  .hero-num { font-family: 'JetBrains Mono', monospace; font-size: 22pt; font-weight: 700; color: #0f766e; line-height: 1; }
+  .hero-label { font-size: 8pt; text-transform: uppercase; letter-spacing: 0.1em; color: #667684; margin-bottom: 4pt; }
+  table { width: 100%; border-collapse: collapse; margin: 8pt 0; font-size: 10pt; }
+  th, td { padding: 4pt 8pt; border-bottom: 0.3pt solid #d6e0e7; text-align: left; }
+  th { font-weight: 600; color: #667684; font-size: 8pt; text-transform: uppercase; letter-spacing: 0.1em; }
+  td.num { font-family: 'JetBrains Mono', monospace; text-align: right; }
+  .axes { display: grid; grid-template-columns: repeat(5, 1fr); gap: 4pt; }
+  .axis { padding: 8pt; border: 0.5pt solid #d6e0e7; background: #f4f7fa; text-align: center; }
+  .axis-cost { font-family: 'JetBrains Mono', monospace; font-weight: 700; font-size: 11pt; color: #0f766e; }
+  .axis-label { font-size: 7pt; color: #667684; text-transform: uppercase; letter-spacing: 0.08em; margin-top: 4pt; }
+  .ach { display: inline-block; padding: 4pt 8pt; margin: 2pt; border: 0.5pt solid #0f766e; background: #ecfdf5; font-size: 9pt; }
+  .footer { margin-top: 24pt; padding-top: 8pt; border-top: 0.3pt solid #d6e0e7; font-size: 8pt; color: #93a1ad; text-align: center; }
+  .ascii-buddy { font-family: 'JetBrains Mono', monospace; font-size: 9pt; line-height: 1; white-space: pre; }
+  .savings-vs { display: flex; gap: 8pt; align-items: center; margin: 12pt 0; }
+  .savings-vs > div { flex: 1; padding: 10pt; border: 0.5pt solid #d6e0e7; }
+  .savings-vs .without { background: #fef2f2; }
+  .savings-vs .with { background: #ecfdf5; }
+  .savings-vs .arrow { flex: 0; font-size: 14pt; color: #93a1ad; }
+  .num-amount { font-family: 'JetBrains Mono', monospace; font-size: 16pt; font-weight: 700; }
+  @media print { .no-print { display: none; } body { background: white; } }
+</style>
+</head>
+<body>
+
+<div class="no-print" style="margin-bottom: 8pt; padding: 8pt; background: #ecfdf5; border-left: 3pt solid #0f766e;">
+  <strong>Save as PDF</strong>: Press <code>Cmd/Ctrl+P</code> → choose "Save as PDF".
+</div>
+
+<header>
+  <div class="eyebrow">monthly report</div>
+  <h1>${monthName}</h1>
+  <div style="font-family: 'JetBrains Mono', monospace; font-size: 9pt; color: #667684;">
+    LLM Gateway · ${new Date().toISOString().split('T')[0]}
+  </div>
+</header>
+
+<div class="hero">
+  <div class="hero-tile">
+    <div class="hero-label">requests routed</div>
+    <div class="hero-num">${fmtNum(totalReq)}</div>
+  </div>
+  <div class="hero-tile">
+    <div class="hero-label">tokens processed</div>
+    <div class="hero-num">${fmtNum(totalTokens)}</div>
+  </div>
+  <div class="hero-tile">
+    <div class="hero-label">cost saved</div>
+    <div class="hero-num">${formatCost(savings.totalCostSaved)}</div>
+  </div>
+</div>
+
+<h2>Cost Analysis</h2>
+<div class="savings-vs">
+  <div class="without">
+    <div class="hero-label">without gateway</div>
+    <div class="num-amount" style="color: #b42318;">${formatCost(savings.costWithoutGateway)}</div>
+  </div>
+  <div class="arrow">→</div>
+  <div class="with">
+    <div class="hero-label">with gateway</div>
+    <div class="num-amount" style="color: #15803d;">${formatCost(savings.costWithGateway)}</div>
+  </div>
+</div>
+<p>Saved <strong>${formatCost(savings.costWithoutGateway - savings.costWithGateway)}</strong> through cache hits, compression, subscription bridges, local routing, and race-mode optimization.</p>
+
+<h2>Savings by Source</h2>
+<div class="axes">
+  <div class="axis"><div class="axis-cost">${formatCost(savings.bySource.cache.cost)}</div><div class="axis-label">⚡ Cache</div></div>
+  <div class="axis"><div class="axis-cost">${formatCost(savings.bySource.compression.cost)}</div><div class="axis-label">🗜 Compression</div></div>
+  <div class="axis"><div class="axis-cost">${formatCost(savings.bySource.subscriptionBridge.cost)}</div><div class="axis-label">🌉 Sub. Bridges</div></div>
+  <div class="axis"><div class="axis-cost">${formatCost(savings.bySource.localRouting.cost)}</div><div class="axis-label">🏠 Local</div></div>
+  <div class="axis"><div class="axis-cost">${formatCost(savings.bySource.raceMode.cost)}</div><div class="axis-label">🏁 Race</div></div>
+</div>
+
+<h2>Activity Summary</h2>
+<table>
+  <tr><th>Metric</th><th>Value</th></tr>
+  <tr><td>Total requests</td><td class="num">${fmtNum(totalReq)}</td></tr>
+  <tr><td>Average latency</td><td class="num">${fmtNum(avgLat)} ms</td></tr>
+  <tr><td>Success rate</td><td class="num">${fmtPct(successRate)}</td></tr>
+  <tr><td>Cost actually paid</td><td class="num">${formatCost(monthCost)}</td></tr>
+</table>
+
+<h2>Top Models This Month</h2>
+<table>
+  <tr><th>Model</th><th>Requests</th><th>Share</th></tr>
+  ${modelRows.rows.map((r: any) => `
+    <tr>
+      <td><code>${r.model}</code></td>
+      <td class="num">${fmtNum(parseInt(r.cnt,10))}</td>
+      <td class="num">${totalReq > 0 ? ((parseInt(r.cnt,10)/totalReq)*100).toFixed(1) : 0}%</td>
+    </tr>
+  `).join('')}
+</table>
+
+<h2>Top Callers This Month</h2>
+<table>
+  <tr><th>Caller</th><th>Requests</th><th>Cost</th></tr>
+  ${callerRows.rows.map((r: any) => `
+    <tr>
+      <td><code>${r.caller_id}</code></td>
+      <td class="num">${fmtNum(parseInt(r.cnt,10))}</td>
+      <td class="num">${formatCost(parseFloat(r.cost))}</td>
+    </tr>
+  `).join('')}
+</table>
+
+<h2>Achievements Unlocked</h2>
+<div>
+  ${newAchievements.map((a) => `<span class="ach">${a.icon} ${a.title}</span>`).join('')}
+  ${newAchievements.length === 0 ? '<em>No achievements unlocked yet — keep using the gateway!</em>' : ''}
+</div>
+
+<h2>Buddy Status</h2>
+<div style="display: flex; gap: 12pt; align-items: center; padding: 10pt; border: 0.5pt solid #d6e0e7;">
+  <div class="ascii-buddy">${buddy.asciiArt.join('\n')}</div>
+  <div>
+    <strong>${buddy.name}</strong> · ${buddy.species} · ${buddy.stage}<br>
+    Level ${buddy.level} · XP ${fmtNum(buddy.xp)}/${fmtNum(buddy.xpForNextLevel)}<br>
+    Mood: ${buddy.mood} · Streak: ${buddy.streakDays} days<br>
+    <em>"${buddy.speech}"</em>
+  </div>
+</div>
+
+<div class="footer">
+  Generated by LLM Gateway · ${new Date().toISOString()} · llm-gateway.context-x.org
+</div>
+
+</body></html>`;
+  return html;
+}
--- a/packages/gateway/src/modules/request-logger.ts
+++ b/packages/gateway/src/modules/request-logger.ts
@ -109,6 +109,11 @@ export class RequestLogger {
      cost_usd: number;
      latency_ms: number;
      fallback_used: boolean;
+      compression_mode?: string;
+      compression_tokens_before?: number;
+      compression_tokens_after?: number;
+      compression_tokens_saved?: number;
+      compression_savings_pct?: number;
      error_message?: string;
      created_at: string;
    }>
@ -116,22 +121,35 @@ export class RequestLogger {
    const result = await this.db.query(
      `
      SELECT
-        request_id,
-        caller_id as caller,
-        task_type,
-        model,
-        status,
-        confidence_score,
-        tokens_in,
-        tokens_out,
-        cost_usd,
-        latency_ms,
-        fallback_used,
-        error_message,
-        created_at
-      FROM request_tracking
-      WHERE created_at > NOW() - MAKE_INTERVAL(hours => $1)
+        rt.request_id,
+        rt.caller_id as caller,
+        rt.task_type,
+        rt.model,
+        rt.status,
+        rt.confidence_score,
+        rt.tokens_in,
+        rt.tokens_out,
+        rt.cost_usd,
+        rt.latency_ms,
+        rt.fallback_used,
+        tv.mode as compression_mode,
+        tv.tokens_before as compression_tokens_before,
+        tv.tokens_after as compression_tokens_after,
+        GREATEST(COALESCE(tv.tokens_before, 0) - COALESCE(tv.tokens_after, 0), 0) as compression_tokens_saved,
+        tv.savings_pct as compression_savings_pct,
+        rt.error_message,
+        rt.created_at
+      FROM request_tracking rt
+      LEFT JOIN LATERAL (
+        SELECT mode, tokens_before, tokens_after, savings_pct
+        FROM tokenvault_metrics
+        WHERE tool_used = 'gateway'
+          AND file_path = rt.request_id
        ORDER BY created_at DESC
+        LIMIT 1
+      ) tv ON true
+      WHERE rt.created_at > NOW() - MAKE_INTERVAL(hours => $1)
+      ORDER BY rt.created_at DESC
      LIMIT $2
      `,
      [offsetHours, limit]
@ -149,6 +167,11 @@ export class RequestLogger {
      cost_usd: row.cost_usd,
      latency_ms: row.latency_ms,
      fallback_used: row.fallback_used,
+      compression_mode: row.compression_mode,
+      compression_tokens_before: row.compression_tokens_before ? parseInt(row.compression_tokens_before, 10) : undefined,
+      compression_tokens_after: row.compression_tokens_after ? parseInt(row.compression_tokens_after, 10) : undefined,
+      compression_tokens_saved: row.compression_tokens_saved ? parseInt(row.compression_tokens_saved, 10) : 0,
+      compression_savings_pct: row.compression_savings_pct ? parseFloat(row.compression_savings_pct) : 0,
      error_message: row.error_message,
      created_at: row.created_at
    }));
@ -160,6 +183,17 @@ export class RequestLogger {
  async getMetrics(bucketMinutes: number = 60): Promise<{
    total_requests: number;
    total_cost: number;
+    estimated_api_cost: number;
+    estimated_api_cost_avoided: number;
+    total_tokens_in: number;
+    total_tokens_out: number;
+    total_tokens: number;
+    compression_operations: number;
+    compression_tokens_before: number;
+    compression_tokens_after: number;
+    compression_tokens_saved: number;
+    compression_rate: number;
+    cache_hit_rate: number;
    avg_latency: number;
    success_rate: number;
    avg_confidence: number;
@ -177,13 +211,15 @@ export class RequestLogger {
      `
      SELECT
        COUNT(*) as total_requests,
-        SUM(cost_usd) as total_cost,
-        AVG(latency_ms) as avg_latency,
-        SUM(CASE WHEN status = 'approved' THEN 1 ELSE 0 END)::FLOAT / COUNT(*) as success_rate,
-        AVG(confidence_score) as avg_confidence,
-        SUM(CASE WHEN fallback_used = true THEN 1 ELSE 0 END)::FLOAT / COUNT(*) as fallback_percentage
+        COALESCE(SUM(cost_usd), 0) as total_cost,
+        COALESCE(SUM(tokens_in), 0) as total_tokens_in,
+        COALESCE(SUM(tokens_out), 0) as total_tokens_out,
+        COALESCE(AVG(latency_ms), 0) as avg_latency,
+        CASE WHEN COUNT(*) = 0 THEN 0 ELSE SUM(CASE WHEN status = 'approved' THEN 1 ELSE 0 END)::FLOAT / COUNT(*) END as success_rate,
+        COALESCE(AVG(confidence_score), 0) as avg_confidence,
+        CASE WHEN COUNT(*) = 0 THEN 0 ELSE SUM(CASE WHEN fallback_used = true THEN 1 ELSE 0 END)::FLOAT / COUNT(*) END as fallback_percentage
      FROM request_tracking
-      WHERE created_at > NOW() - MAKE_INTERVAL(mins => $1)
+      WHERE created_at > NOW() - ($1 * INTERVAL '1 minute')
      `,
      [bucketMinutes]
    );
@ -192,7 +228,7 @@ export class RequestLogger {
      `
      SELECT caller_id as caller, COUNT(*) as count
      FROM request_tracking
-      WHERE created_at > NOW() - MAKE_INTERVAL(mins => $1)
+      WHERE created_at > NOW() - ($1 * INTERVAL '1 minute')
      GROUP BY caller_id
      ORDER BY count DESC
      LIMIT 5
@ -204,7 +240,7 @@ export class RequestLogger {
      `
      SELECT model, COUNT(*) as count
      FROM request_tracking
-      WHERE created_at > NOW() - MAKE_INTERVAL(mins => $1)
+      WHERE created_at > NOW() - ($1 * INTERVAL '1 minute')
      GROUP BY model
      ORDER BY count DESC
      LIMIT 5
@ -224,11 +260,47 @@ export class RequestLogger {
      [bucketMinutes]
    );

+    const compressionResult = await this.db.query(
+      `
+      SELECT
+        COUNT(*) as operations,
+        COALESCE(SUM(tokens_before), 0) as tokens_before,
+        COALESCE(SUM(tokens_after), 0) as tokens_after,
+        COALESCE(SUM(GREATEST(tokens_before - tokens_after, 0)), 0) as tokens_saved
+      FROM tokenvault_metrics
+      WHERE tool_used = 'gateway'
+        AND created_at > NOW() - ($1 * INTERVAL '1 minute')
+      `,
+      [bucketMinutes]
+    );
+
    const metrics = metricsResult.rows[0];
+    const totalTokensIn = parseInt(metrics.total_tokens_in, 10) || 0;
+    const totalTokensOut = parseInt(metrics.total_tokens_out, 10) || 0;
+    const totalTokens = totalTokensIn + totalTokensOut;
+    const compression = compressionResult.rows[0] ?? {};
+    const compressionTokensBefore = parseInt(compression.tokens_before, 10) || 0;
+    const compressionTokensAfter = parseInt(compression.tokens_after, 10) || 0;
+    const compressionTokensSaved = parseInt(compression.tokens_saved, 10) || 0;
+    const referenceInputCostPer1k = parseFloat(process.env['REFERENCE_INPUT_COST_PER_1K'] ?? '0.005');
+    const referenceOutputCostPer1k = parseFloat(process.env['REFERENCE_OUTPUT_COST_PER_1K'] ?? '0.015');
+    const estimatedApiCost = (totalTokensIn / 1000) * referenceInputCostPer1k + (totalTokensOut / 1000) * referenceOutputCostPer1k;
+    const totalCost = parseFloat(metrics.total_cost) || 0;

    return {
      total_requests: parseInt(metrics.total_requests) || 0,
-      total_cost: parseFloat(metrics.total_cost) || 0,
+      total_cost: totalCost,
+      estimated_api_cost: estimatedApiCost,
+      estimated_api_cost_avoided: Math.max(0, estimatedApiCost - totalCost),
+      total_tokens_in: totalTokensIn,
+      total_tokens_out: totalTokensOut,
+      total_tokens: totalTokens,
+      compression_operations: parseInt(compression.operations, 10) || 0,
+      compression_tokens_before: compressionTokensBefore,
+      compression_tokens_after: compressionTokensAfter,
+      compression_tokens_saved: compressionTokensSaved,
+      compression_rate: compressionTokensBefore > 0 ? compressionTokensSaved / compressionTokensBefore : 0,
+      cache_hit_rate: 0,
      avg_latency: Math.round(parseFloat(metrics.avg_latency) || 0),
      success_rate: parseFloat(metrics.success_rate) || 0,
      avg_confidence: parseFloat(metrics.avg_confidence) || 0,
--- a/packages/gateway/src/modules/response-cache.ts
+++ b/packages/gateway/src/modules/response-cache.ts
@ -0,0 +1,390 @@
+/**
+ * Response Cache
+ *
+ * Two-tier cache:
+ *   • Tier 1 (exact)    — sha256 of canonical request → instant lookup, $0 cost
+ *   • Tier 2 (semantic) — embedding cosine similarity, served via in-process
+ *                          rerank when threshold is met. Implemented in v1 as
+ *                          a string-similarity heuristic until pgvector is
+ *                          provisioned. The interface is forward-compatible.
+ *
+ * Cache hits skip the entire LLM pipeline. Each hit increments the saved-cost
+ * counter so the dashboard can show real savings in real time.
+ */
+
+import { createHash } from 'crypto';
+import type { Pool } from 'pg';
+import { logger } from '../observability/logger.js';
+import { embed, vectorToPgLiteral, EMBEDDING_DIMENSION } from './embedding-client.js';
+
+export interface CacheableRequest {
+  caller: string;
+  task_type?: string;
+  model?: string;
+  system?: string;
+  input: string;
+}
+
+export interface CachedResponse {
+  id: number;
+  cacheKey: string;
+  responseJson: Record<string, unknown>;
+  costWhenCached: number;
+  tokensIn: number;
+  tokensOut: number;
+  hitCount: number;
+  ageSeconds: number;
+}
+
+/**
+ * Compute a stable cache key for a request. Whitespace is collapsed and
+ * lowercase used for the hash so functionally identical requests collide.
+ */
+export function computeCacheKey(req: CacheableRequest): string {
+  const canonical = [
+    `caller=${req.caller.trim().toLowerCase()}`,
+    `task=${(req.task_type ?? '').trim().toLowerCase()}`,
+    `model=${(req.model ?? '').trim().toLowerCase()}`,
+    `system=${(req.system ?? '').trim().replace(/\s+/g, ' ').slice(0, 4096)}`,
+    `input=${req.input.trim().replace(/\s+/g, ' ').slice(0, 16_384)}`,
+  ].join('\n');
+  return createHash('sha256').update(canonical).digest('hex');
+}
+
+/** Look up an exact cache hit. Returns null when no fresh entry exists. */
+export async function getCachedResponse(
+  db: Pool,
+  cacheKey: string
+): Promise<CachedResponse | null> {
+  try {
+    const result = await db.query(
+      `
+      SELECT id, cache_key, response_json, cost_when_cached, tokens_in, tokens_out,
+             hit_count, EXTRACT(EPOCH FROM (NOW() - created_at))::INT AS age_seconds,
+             ttl_seconds
+      FROM response_cache
+      WHERE cache_key = $1
+        AND (created_at + (ttl_seconds * INTERVAL '1 second')) > NOW()
+      LIMIT 1
+      `,
+      [cacheKey]
+    );
+    const row = result.rows[0];
+    if (!row) return null;
+    return {
+      id: Number(row.id),
+      cacheKey: row.cache_key,
+      responseJson: row.response_json,
+      costWhenCached: parseFloat(row.cost_when_cached) || 0,
+      tokensIn: parseInt(row.tokens_in, 10) || 0,
+      tokensOut: parseInt(row.tokens_out, 10) || 0,
+      hitCount: parseInt(row.hit_count, 10) || 0,
+      ageSeconds: parseInt(row.age_seconds, 10) || 0,
+    };
+  } catch (err) {
+    logger.warn({ err }, 'response-cache: getCachedResponse failed (table missing?)');
+    return null;
+  }
+}
+
+/**
+ * Look up a fuzzy/semantic match using pgvector cosine similarity.
+ * Returns null when:
+ *   • embedding generation fails (Ollama down, model missing)
+ *   • no entry crosses the similarity threshold
+ *   • the table doesn't yet have the embedding column
+ */
+export async function getSemanticCachedResponse(
+  db: Pool,
+  caller: string,
+  taskType: string | undefined,
+  inputText: string,
+  similarityThreshold: number = 0.92
+): Promise<(CachedResponse & { similarity: number }) | null> {
+  const vec = await embed(inputText);
+  if (!vec) return null;
+
+  try {
+    const result = await db.query(
+      `
+      SELECT id, cache_key, response_json, cost_when_cached, tokens_in, tokens_out,
+             hit_count, EXTRACT(EPOCH FROM (NOW() - created_at))::INT AS age_seconds,
+             1 - (embedding <=> $1::vector) AS similarity
+      FROM response_cache
+      WHERE caller_id = $2
+        AND ($3::TEXT IS NULL OR task_type = $3)
+        AND embedding IS NOT NULL
+        AND (created_at + (ttl_seconds * INTERVAL '1 second')) > NOW()
+      ORDER BY embedding <=> $1::vector ASC
+      LIMIT 1
+      `,
+      [vectorToPgLiteral(vec), caller.trim().toLowerCase(), taskType ?? null]
+    );
+    const row = result.rows[0];
+    if (!row) return null;
+    const sim = parseFloat(row.similarity);
+    if (isNaN(sim) || sim < similarityThreshold) return null;
+    return {
+      id: Number(row.id),
+      cacheKey: row.cache_key,
+      responseJson: row.response_json,
+      costWhenCached: parseFloat(row.cost_when_cached) || 0,
+      tokensIn: parseInt(row.tokens_in, 10) || 0,
+      tokensOut: parseInt(row.tokens_out, 10) || 0,
+      hitCount: parseInt(row.hit_count, 10) || 0,
+      ageSeconds: parseInt(row.age_seconds, 10) || 0,
+      similarity: sim,
+    };
+  } catch (err) {
+    logger.debug({ err }, 'response-cache: getSemanticCachedResponse failed (extension missing?)');
+    return null;
+  }
+}
+
+/** Persist a response. Idempotent on conflict — increments TTL window instead. */
+export async function setCachedResponse(
+  db: Pool,
+  req: CacheableRequest,
+  response: Record<string, unknown>,
+  meta: { cost: number; tokensIn: number; tokensOut: number; ttlSeconds?: number }
+): Promise<void> {
+  const cacheKey = computeCacheKey(req);
+  const ttl = meta.ttlSeconds ?? 86_400;
+  // Generate embedding async — fire & forget compatible
+  const vec = await embed(req.input);
+  const embedLiteral = vec && vec.length === EMBEDDING_DIMENSION ? vectorToPgLiteral(vec) : null;
+  try {
+    await db.query(
+      `
+      INSERT INTO response_cache
+        (cache_key, caller_id, task_type, model, input_preview,
+         response_json, cost_when_cached, tokens_in, tokens_out, ttl_seconds, embedding)
+      VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11::vector)
+      ON CONFLICT (cache_key) DO UPDATE SET
+        response_json    = EXCLUDED.response_json,
+        cost_when_cached = EXCLUDED.cost_when_cached,
+        tokens_in        = EXCLUDED.tokens_in,
+        tokens_out       = EXCLUDED.tokens_out,
+        ttl_seconds      = EXCLUDED.ttl_seconds,
+        embedding        = COALESCE(EXCLUDED.embedding, response_cache.embedding),
+        created_at       = NOW()
+      `,
+      [
+        cacheKey,
+        req.caller.trim().toLowerCase(),
+        req.task_type ?? null,
+        req.model ?? null,
+        req.input.slice(0, 1024),
+        JSON.stringify(response),
+        meta.cost,
+        meta.tokensIn,
+        meta.tokensOut,
+        ttl,
+        embedLiteral,
+      ]
+    );
+  } catch (err) {
+    // Retry without embedding column when the extension hasn't migrated yet
+    logger.debug({ err }, 'response-cache: setCachedResponse with embedding failed, retrying without');
+    try {
+      await db.query(
+        `
+        INSERT INTO response_cache
+          (cache_key, caller_id, task_type, model, input_preview,
+           response_json, cost_when_cached, tokens_in, tokens_out, ttl_seconds)
+        VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10)
+        ON CONFLICT (cache_key) DO UPDATE SET
+          response_json    = EXCLUDED.response_json,
+          cost_when_cached = EXCLUDED.cost_when_cached,
+          tokens_in        = EXCLUDED.tokens_in,
+          tokens_out       = EXCLUDED.tokens_out,
+          ttl_seconds      = EXCLUDED.ttl_seconds,
+          created_at       = NOW()
+        `,
+        [
+          cacheKey,
+          req.caller.trim().toLowerCase(),
+          req.task_type ?? null,
+          req.model ?? null,
+          req.input.slice(0, 1024),
+          JSON.stringify(response),
+          meta.cost,
+          meta.tokensIn,
+          meta.tokensOut,
+          ttl,
+        ]
+      );
+    } catch (err2) {
+      logger.warn({ err: err2 }, 'response-cache: setCachedResponse failed');
+    }
+  }
+}
+
+/** Record a cache hit (atomic increment). */
+export async function recordCacheHit(db: Pool, cachedId: number): Promise<void> {
+  try {
+    await db.query(
+      `
+      UPDATE response_cache
+      SET hit_count    = hit_count + 1,
+          cost_saved   = cost_saved + cost_when_cached,
+          tokens_saved = tokens_saved + tokens_in + tokens_out,
+          last_hit_at  = NOW()
+      WHERE id = $1
+      `,
+      [cachedId]
+    );
+  } catch (err) {
+    logger.warn({ err }, 'response-cache: recordCacheHit failed');
+  }
+}
+
+/** Aggregate savings across all cache entries for the dashboard. */
+export async function getCacheSavings(
+  db: Pool,
+  hoursBack: number = 24
+): Promise<{
+  totalHits: number;
+  totalCostSaved: number;
+  totalTokensSaved: number;
+  uniqueEntries: number;
+  topCallers: Array<{ caller: string; hits: number; saved: number }>;
+  hitRatePercent: number;
+}> {
+  try {
+    const [totalRow, callerRows, ratioRow] = await Promise.all([
+      db.query(
+        `SELECT
+            COALESCE(SUM(hit_count), 0)::INT     AS total_hits,
+            COALESCE(SUM(cost_saved), 0)::NUMERIC AS total_cost_saved,
+            COALESCE(SUM(tokens_saved), 0)::BIGINT AS total_tokens_saved,
+            COUNT(*)::INT                         AS unique_entries
+         FROM response_cache
+         WHERE last_hit_at > NOW() - MAKE_INTERVAL(hours => $1)
+            OR created_at  > NOW() - MAKE_INTERVAL(hours => $1)`,
+        [hoursBack]
+      ),
+      db.query(
+        `SELECT caller_id, SUM(hit_count)::INT AS hits, SUM(cost_saved)::NUMERIC AS saved
+         FROM response_cache
+         WHERE last_hit_at > NOW() - MAKE_INTERVAL(hours => $1)
+         GROUP BY caller_id
+         ORDER BY hits DESC
+         LIMIT 5`,
+        [hoursBack]
+      ),
+      // Cache hit-rate = hits / (hits + new requests in same window)
+      db.query(
+        `SELECT
+            COALESCE((SELECT SUM(hit_count) FROM response_cache
+                      WHERE last_hit_at > NOW() - MAKE_INTERVAL(hours => $1)), 0)::INT AS hits,
+            (SELECT COUNT(*) FROM request_tracking
+              WHERE created_at > NOW() - MAKE_INTERVAL(hours => $1))::INT             AS total_requests`,
+        [hoursBack]
+      ),
+    ]);
+
+    const t = totalRow.rows[0];
+    const r = ratioRow.rows[0];
+    const totalReq = parseInt(r?.total_requests ?? '0', 10);
+    const hits = parseInt(t?.total_hits ?? '0', 10);
+    const hitRate = totalReq > 0 ? (hits / (totalReq + hits)) * 100 : 0;
+
+    return {
+      totalHits: hits,
+      totalCostSaved: parseFloat(t?.total_cost_saved ?? '0'),
+      totalTokensSaved: parseInt(t?.total_tokens_saved ?? '0', 10),
+      uniqueEntries: parseInt(t?.unique_entries ?? '0', 10),
+      topCallers: callerRows.rows.map((row: any) => ({
+        caller: row.caller_id,
+        hits: parseInt(row.hits, 10) || 0,
+        saved: parseFloat(row.saved) || 0,
+      })),
+      hitRatePercent: parseFloat(hitRate.toFixed(2)),
+    };
+  } catch (err) {
+    logger.warn({ err }, 'response-cache: getCacheSavings failed (table missing?)');
+    return {
+      totalHits: 0,
+      totalCostSaved: 0,
+      totalTokensSaved: 0,
+      uniqueEntries: 0,
+      topCallers: [],
+      hitRatePercent: 0,
+    };
+  }
+}
+
+/** Time-series buckets of cache savings for sparkline visualization. */
+export async function getSavingsTimeSeries(
+  db: Pool,
+  hoursBack: number = 24,
+  bucketMinutes: number = 60
+): Promise<Array<{ ts: string; costSaved: number; hits: number; tokensSaved: number }>> {
+  try {
+    const buckets = Math.ceil((hoursBack * 60) / bucketMinutes);
+    const result = await db.query(
+      `
+      WITH gs AS (
+        SELECT generate_series(
+          DATE_TRUNC('hour', NOW()) - ($1 || ' minutes')::INTERVAL * (s),
+          DATE_TRUNC('hour', NOW()),
+          ($1 || ' minutes')::INTERVAL
+        ) AS bucket_ts
+        FROM generate_series(0, $2 - 1) s
+      )
+      SELECT
+        gs.bucket_ts,
+        COALESCE(COUNT(rc.id), 0)::INT             AS hits,
+        COALESCE(SUM(rc.cost_when_cached), 0)::NUMERIC AS cost_saved,
+        COALESCE(SUM(rc.tokens_in + rc.tokens_out), 0)::INT AS tokens_saved
+      FROM gs
+      LEFT JOIN response_cache rc
+        ON DATE_TRUNC('hour', rc.last_hit_at) = gs.bucket_ts
+       AND rc.last_hit_at > NOW() - ($1 || ' minutes')::INTERVAL * $2
+      GROUP BY gs.bucket_ts
+      ORDER BY gs.bucket_ts ASC
+      `,
+      [bucketMinutes, buckets]
+    );
+    return result.rows.map((row: any) => ({
+      ts: row.bucket_ts.toISOString(),
+      costSaved: parseFloat(row.cost_saved) || 0,
+      hits: parseInt(row.hits, 10) || 0,
+      tokensSaved: parseInt(row.tokens_saved, 10) || 0,
+    }));
+  } catch (err) {
+    logger.warn({ err }, 'response-cache: getSavingsTimeSeries failed');
+    return [];
+  }
+}
+
+/** Drop entries older than max-age days. Run from a periodic job. */
+export async function pruneStaleCacheEntries(db: Pool, maxAgeDays: number = 7): Promise<number> {
+  try {
+    const result = await db.query(
+      `DELETE FROM response_cache
+       WHERE created_at < NOW() - MAKE_INTERVAL(days => $1)
+         AND (last_hit_at IS NULL OR last_hit_at < NOW() - MAKE_INTERVAL(days => $1))`,
+      [maxAgeDays]
+    );
+    return result.rowCount ?? 0;
+  } catch (err) {
+    logger.warn({ err }, 'response-cache: prune failed');
+    return 0;
+  }
+}
+
+/** Manual cache invalidation, e.g. when a caller hits "clear my cache". */
+export async function clearCacheForCaller(db: Pool, callerId: string): Promise<number> {
+  try {
+    const result = await db.query(
+      `DELETE FROM response_cache WHERE caller_id = $1`,
+      [callerId.trim().toLowerCase()]
+    );
+    return result.rowCount ?? 0;
+  } catch (err) {
+    logger.warn({ err }, 'response-cache: clearCacheForCaller failed');
+    return 0;
+  }
+}
--- a/packages/gateway/src/modules/savings-calculator.ts
+++ b/packages/gateway/src/modules/savings-calculator.ts
@ -0,0 +1,267 @@
+/**
+ * Savings Calculator
+ *
+ * Comprehensive savings accounting across ALL gateway mechanisms — not just
+ * cache hits. Lean-CTX measures file-context compression; we measure five
+ * orthogonal sources of value:
+ *
+ *   1. Response cache (exact + semantic match)
+ *   2. Compression pipeline (verbatim_compact, etc.)
+ *   3. Subscription-bridge implicit savings (calls via flat-rate Pro plan
+ *      vs. what they would have cost via paid API)
+ *   4. Model-tier routing (cheaper model used when sufficient)
+ *   5. Pool routing (avoided quota-out on a sub by switching to alternate)
+ *
+ * The dashboard now surfaces all five so the savings counter reflects the
+ * gateway's true value rather than only cache hits.
+ */
+import type { Pool } from 'pg';
+import { logger } from '../observability/logger.js';
+
+// Conservative API pricing snapshot (USD per 1k tokens). Used to compute
+// "what would this have cost via direct API". Update as pricing evolves.
+const API_PRICING = {
+  // Anthropic
+  'claude-opus-4-1':       { in: 0.015,  out: 0.075 },
+  'claude-sonnet-4-1':     { in: 0.003,  out: 0.015 },
+  'claude-haiku-3':        { in: 0.00025, out: 0.00125 },
+  // OpenAI
+  'gpt-5.1-codex':         { in: 0.005,  out: 0.020 },
+  'gpt-5.1-codex-mini':    { in: 0.0015, out: 0.006 },
+  'gpt-4-turbo':           { in: 0.010,  out: 0.030 },
+  'gpt-4':                 { in: 0.030,  out: 0.060 },
+  'gpt-3.5-turbo':         { in: 0.0005, out: 0.0015 },
+  // Google
+  'gemini-1.5-pro':        { in: 0.00125, out: 0.005 },
+  'gemini-1.5-flash':      { in: 0.000075, out: 0.0003 },
+} as const;
+
+/** Models that go through a flat-rate subscription bridge → marginal cost = $0 */
+const SUBSCRIPTION_MODEL_PATTERNS = [
+  /^claude-/i,         // Claude Code subscription
+  /^gpt-5\.1-codex/i,  // Codex CLI subscription
+  /^gpt-(4|3\.5)/i,    // ChatGPT Plus / Copilot subscription
+  /^gemini-/i,         // Gemini Advanced
+  /^github-copilot/i,  // GitHub Copilot
+  /^microsoft.365/i,   // M365 Copilot
+];
+
+function lookupApiPrice(model: string): { in: number; out: number } | null {
+  const m = model.toLowerCase();
+  // Exact match first
+  if (m in API_PRICING) return (API_PRICING as any)[m];
+  // Fuzzy match (claude-sonnet-4-1-something → claude-sonnet-4-1)
+  for (const key of Object.keys(API_PRICING)) {
+    if (m.startsWith(key)) return (API_PRICING as any)[key];
+  }
+  return null;
+}
+
+function isSubscriptionModel(model: string): boolean {
+  return SUBSCRIPTION_MODEL_PATTERNS.some((p) => p.test(model));
+}
+
+function isLocalModel(model: string): boolean {
+  return /^(qwen|llama|mistral|magatama|phi|nomic|gemma)/i.test(model);
+}
+
+export interface ComprehensiveSavings {
+  /** Total saved across all five mechanisms. */
+  totalCostSaved: number;
+  totalTokensSaved: number;
+  /** Per-source breakdown for the dashboard. */
+  bySource: {
+    cache: { tokens: number; cost: number; hits: number };
+    compression: { tokens: number; cost: number; calls: number };
+    subscriptionBridge: { tokens: number; cost: number; calls: number };
+    localRouting: { tokens: number; cost: number; calls: number };
+    raceMode: { tokens: number; cost: number; calls: number };
+  };
+  /** How much you would have paid for the same volume at API list prices. */
+  costWithoutGateway: number;
+  /** What you actually paid (real $). */
+  costWithGateway: number;
+  /** Time window. */
+  hoursBack: number;
+  /** Inputs that gave us this number. */
+  totals: { requests: number; tokensIn: number; tokensOut: number };
+}
+
+/**
+ * Compute comprehensive savings across all mechanisms.
+ *
+ * Strategy:
+ *   For each request, determine where it went and price it both ways:
+ *     - "Would-be cost"  = API list price for the model that handled it
+ *     - "Actual cost"    = $0 for subscription/local; cost_usd for paid API
+ *     - "Saved"          = would-be − actual
+ */
+export async function getComprehensiveSavings(
+  db: Pool,
+  hoursBack: number = 24
+): Promise<ComprehensiveSavings> {
+  const empty: ComprehensiveSavings = {
+    totalCostSaved: 0,
+    totalTokensSaved: 0,
+    bySource: {
+      cache: { tokens: 0, cost: 0, hits: 0 },
+      compression: { tokens: 0, cost: 0, calls: 0 },
+      subscriptionBridge: { tokens: 0, cost: 0, calls: 0 },
+      localRouting: { tokens: 0, cost: 0, calls: 0 },
+      raceMode: { tokens: 0, cost: 0, calls: 0 },
+    },
+    costWithoutGateway: 0,
+    costWithGateway: 0,
+    hoursBack,
+    totals: { requests: 0, tokensIn: 0, tokensOut: 0 },
+  };
+
+  try {
+    // 1) Cache hits
+    const cacheRow = await db.query(
+      `SELECT
+         COALESCE(SUM(hit_count), 0)::INT  AS hits,
+         COALESCE(SUM(cost_saved), 0)::NUMERIC AS cost,
+         COALESCE(SUM(tokens_saved), 0)::BIGINT AS tokens
+       FROM response_cache
+       WHERE last_hit_at > NOW() - MAKE_INTERVAL(hours => $1)`,
+      [hoursBack]
+    );
+    empty.bySource.cache = {
+      hits: parseInt(cacheRow.rows[0]?.hits ?? '0', 10),
+      cost: parseFloat(cacheRow.rows[0]?.cost ?? '0'),
+      tokens: parseInt(cacheRow.rows[0]?.tokens ?? '0', 10),
+    };
+
+    // 2-4) All requests in the window, classified by routing
+    const reqRows = await db.query(
+      `SELECT model, tokens_in, tokens_out, cost_usd, fallback_used
+       FROM request_tracking
+       WHERE created_at > NOW() - MAKE_INTERVAL(hours => $1)`,
+      [hoursBack]
+    );
+
+    let totalReq = 0, totalIn = 0, totalOut = 0;
+    let withGateway = 0, withoutGateway = 0;
+
+    for (const r of reqRows.rows) {
+      const model = String(r.model ?? '');
+      const tokensIn = parseInt(r.tokens_in, 10) || 0;
+      const tokensOut = parseInt(r.tokens_out, 10) || 0;
+      const actualCost = parseFloat(r.cost_usd) || 0;
+
+      totalReq += 1;
+      totalIn += tokensIn;
+      totalOut += tokensOut;
+      withGateway += actualCost;
+
+      // Determine "would-be cost" — what this request would have cost at API
+      // list prices for the model that handled it (or its closest paid sibling).
+      const apiPrice = lookupApiPrice(model);
+      let wouldBeCost = 0;
+      if (apiPrice) {
+        wouldBeCost = (tokensIn / 1000) * apiPrice.in + (tokensOut / 1000) * apiPrice.out;
+      } else if (isLocalModel(model)) {
+        // Local model — compare against medium-tier paid API as opportunity cost
+        const ref = API_PRICING['gpt-3.5-turbo'];
+        wouldBeCost = (tokensIn / 1000) * ref.in + (tokensOut / 1000) * ref.out;
+      }
+      withoutGateway += wouldBeCost;
+
+      // Bucket the savings into a source
+      if (isSubscriptionModel(model)) {
+        empty.bySource.subscriptionBridge.calls += 1;
+        empty.bySource.subscriptionBridge.tokens += tokensIn + tokensOut;
+        empty.bySource.subscriptionBridge.cost += Math.max(0, wouldBeCost - actualCost);
+      } else if (isLocalModel(model)) {
+        empty.bySource.localRouting.calls += 1;
+        empty.bySource.localRouting.tokens += tokensIn + tokensOut;
+        empty.bySource.localRouting.cost += Math.max(0, wouldBeCost - actualCost);
+      }
+    }
+
+    // 5) Compression savings — pull from tokenvault_metrics if available
+    try {
+      const compRow = await db.query(
+        `SELECT
+           COUNT(*)::INT AS calls,
+           COALESCE(SUM(GREATEST(tokens_before - tokens_after, 0)), 0)::BIGINT AS tokens_saved
+         FROM tokenvault_metrics
+         WHERE created_at > NOW() - MAKE_INTERVAL(hours => $1)
+           AND tool_used = 'gateway'`,
+        [hoursBack]
+      );
+      const tokensCompressed = parseInt(compRow.rows[0]?.tokens_saved ?? '0', 10);
+      // Conservative pricing: assume average input pricing of $0.001/1k tokens
+      const compCost = (tokensCompressed / 1000) * 0.001;
+      empty.bySource.compression = {
+        calls: parseInt(compRow.rows[0]?.calls ?? '0', 10),
+        tokens: tokensCompressed,
+        cost: compCost,
+      };
+    } catch (err) {
+      logger.debug({ err }, 'savings: compression aggregation skipped (table missing)');
+    }
+
+    // 6) Race mode — picked the faster/cheaper candidate, "saved" the loser cost
+    try {
+      const raceRow = await db.query(
+        `SELECT
+           COUNT(DISTINCT call_id)::INT AS races,
+           COALESCE(SUM(cost_usd) FILTER (WHERE selected = false), 0)::NUMERIC AS not_picked_cost
+         FROM race_mode_results
+         WHERE created_at > NOW() - MAKE_INTERVAL(hours => $1)`,
+        [hoursBack]
+      );
+      empty.bySource.raceMode = {
+        calls: parseInt(raceRow.rows[0]?.races ?? '0', 10),
+        cost: parseFloat(raceRow.rows[0]?.not_picked_cost ?? '0'),
+        tokens: 0,
+      };
+    } catch (err) {
+      logger.debug({ err }, 'savings: race aggregation skipped (table missing)');
+    }
+
+    // 7) MCP tool-call compression — drop-in Lean-CTX replacement
+    try {
+      const mcpRow = await db.query(
+        `SELECT COUNT(*)::INT AS calls,
+                COALESCE(SUM(tokens_saved), 0)::BIGINT AS tokens_saved
+         FROM mcp_tool_calls
+         WHERE created_at > NOW() - MAKE_INTERVAL(hours => $1)`,
+        [hoursBack]
+      );
+      const mcpTokens = parseInt(mcpRow.rows[0]?.tokens_saved ?? '0', 10);
+      const mcpCalls = parseInt(mcpRow.rows[0]?.calls ?? '0', 10);
+      // Tool-call savings cost-equivalence: Sonnet-equivalent pricing
+      // ($3/MTok input, $15/MTok output, weighted 60/40 in/out for tool returns).
+      // → ~$0.0046 per 1k tokens averaged. Matches Lean-CTX dashboard scale.
+      const mcpCost = (mcpTokens / 1_000_000) * (3.0 * 0.6 + 15.0 * 0.4);
+      // Add to the comprehensive picture as a new source bucket via compression entry
+      empty.bySource.compression.tokens += mcpTokens;
+      empty.bySource.compression.cost += mcpCost;
+      empty.bySource.compression.calls += mcpCalls;
+    } catch (err) {
+      logger.debug({ err }, 'savings: mcp tool aggregation skipped (table missing)');
+    }
+
+    empty.totalCostSaved =
+      empty.bySource.cache.cost +
+      empty.bySource.compression.cost +
+      empty.bySource.subscriptionBridge.cost +
+      empty.bySource.localRouting.cost +
+      empty.bySource.raceMode.cost;
+
+    empty.totalTokensSaved =
+      empty.bySource.cache.tokens +
+      empty.bySource.compression.tokens;
+
+    empty.costWithoutGateway = withoutGateway;
+    empty.costWithGateway = withGateway;
+    empty.totals = { requests: totalReq, tokensIn: totalIn, tokensOut: totalOut };
+  } catch (err) {
+    logger.warn({ err }, 'savings-calculator: comprehensive computation failed');
+  }
+
+  return empty;
+}
--- a/packages/gateway/src/modules/settings-store.ts
+++ b/packages/gateway/src/modules/settings-store.ts
@ -0,0 +1,214 @@
+/**
+ * Settings Store
+ *
+ * Persists user configuration (which subscriptions they have, which API
+ * providers they use, etc.) to a JSON file on disk. Sensitive fields like
+ * API keys are stored verbatim but never returned in plaintext from
+ * `getPublicSettings()` — only a `hasKey: true/false` flag is exposed.
+ */
+
+import { readFileSync, writeFileSync, existsSync, mkdirSync } from 'fs';
+import { dirname, join } from 'path';
+import { z } from 'zod';
+import { logger } from '../observability/logger.js';
+
+const SettingsSchema = z.object({
+  /** How the gateway should pick providers: 'auto' uses all, others restrict the pool. */
+  routingMode: z.enum(['auto', 'subscription-only', 'api-only', 'local-only']).default('auto'),
+  /** Per-subscription configuration keyed by SubscriptionId. */
+  subscriptions: z
+    .record(
+      z.string(),
+      z.object({
+        enabled: z.boolean().default(true),
+        autoSpawn: z.boolean().default(true),
+        /**
+         * Optional remote bridge URL. When set, the gateway will route to this
+         * URL instead of trying to spawn a local bridge. Use this when the CLI
+         * subscription lives on a different machine than the gateway.
+         */
+        bridgeUrl: z.string().url().optional().or(z.literal('')),
+        notes: z.string().optional(),
+      })
+    )
+    .default({}),
+  /** Per-API-provider configuration keyed by provider name (cerebras, groq, …). */
+  apiProviders: z
+    .record(
+      z.string(),
+      z.object({
+        enabled: z.boolean().default(false),
+        apiKey: z.string().optional(),
+        baseUrl: z.string().optional(),
+        notes: z.string().optional(),
+      })
+    )
+    .default({}),
+  /** Local Ollama configuration. */
+  ollama: z
+    .object({
+      enabled: z.boolean().default(true),
+      baseUrl: z.string().default('http://localhost:11434'),
+    })
+    .default({ enabled: true, baseUrl: 'http://localhost:11434' }),
+  /**
+   * Simple Mode — for users who only use 1-2 subscriptions.
+   * Hides advanced tabs (providers, races, share, report, memory) and
+   * filters wallet/subscriptions to only show enabled providers.
+   */
+  ui: z
+    .object({
+      simpleMode: z.boolean().default(true),
+      hideEmptyProviders: z.boolean().default(true),
+      showTooltips: z.boolean().default(true),
+    })
+    .default({ simpleMode: true, hideEmptyProviders: true, showTooltips: true }),
+  /** ISO timestamp of last update. */
+  updatedAt: z.string().optional(),
+});
+
+export type Settings = z.infer<typeof SettingsSchema>;
+
+export interface PublicSettings extends Omit<Settings, 'apiProviders'> {
+  apiProviders: Record<string, { enabled: boolean; hasKey: boolean; baseUrl?: string; notes?: string }>;
+}
+
+const SETTINGS_PATH =
+  process.env['SETTINGS_PATH'] ?? join(process.env['HOME'] ?? '/root', '.llm-gateway', 'settings.json');
+
+const DEFAULT_SUBSCRIPTIONS: Settings['subscriptions'] = {
+  'claude-code': { enabled: true, autoSpawn: true },
+  'github-copilot': { enabled: true, autoSpawn: true },
+  'chatgpt': { enabled: true, autoSpawn: true },
+  'gemini': { enabled: true, autoSpawn: true },
+  'codex': { enabled: true, autoSpawn: true },
+  'aider': { enabled: true, autoSpawn: true },
+};
+
+function getDefaults(): Settings {
+  return SettingsSchema.parse({
+    routingMode: 'auto',
+    subscriptions: DEFAULT_SUBSCRIPTIONS,
+    ollama: { enabled: true, baseUrl: process.env['OLLAMA_BASE_URL'] ?? 'http://localhost:11434' },
+  });
+}
+
+/**
+ * Load settings from disk. Returns defaults when the file does not yet exist
+ * or fails to parse.
+ */
+export function loadSettings(): Settings {
+  try {
+    if (!existsSync(SETTINGS_PATH)) {
+      return getDefaults();
+    }
+    const raw = readFileSync(SETTINGS_PATH, 'utf-8');
+    const parsed = SettingsSchema.parse(JSON.parse(raw));
+    return parsed;
+  } catch (err) {
+    logger.warn({ err, path: SETTINGS_PATH }, 'Failed to load settings — using defaults');
+    return getDefaults();
+  }
+}
+
+/**
+ * Persist settings to disk, merging with any existing values to avoid wiping
+ * fields the caller didn't include in the patch.
+ */
+export function saveSettings(patch: Partial<Settings>): Settings {
+  const current = loadSettings();
+  const merged: Settings = SettingsSchema.parse({
+    ...current,
+    ...patch,
+    subscriptions: { ...current.subscriptions, ...(patch.subscriptions ?? {}) },
+    apiProviders: { ...current.apiProviders, ...(patch.apiProviders ?? {}) },
+    ollama: { ...current.ollama, ...(patch.ollama ?? {}) },
+    ui: { ...current.ui, ...(patch.ui ?? {}) },
+    updatedAt: new Date().toISOString(),
+  });
+
+  try {
+    mkdirSync(dirname(SETTINGS_PATH), { recursive: true });
+    writeFileSync(SETTINGS_PATH, JSON.stringify(merged, null, 2), { mode: 0o600 });
+    logger.info({ path: SETTINGS_PATH }, 'Settings saved');
+  } catch (err) {
+    logger.error({ err, path: SETTINGS_PATH }, 'Failed to persist settings');
+    throw err;
+  }
+
+  // Mirror to env vars so existing provider lookups pick up changes immediately.
+  applySettingsToEnv(merged);
+  return merged;
+}
+
+/**
+ * Strip sensitive data (API keys) before sending to the dashboard.
+ */
+export function getPublicSettings(): PublicSettings {
+  const settings = loadSettings();
+  const apiProviders: PublicSettings['apiProviders'] = {};
+  for (const [name, cfg] of Object.entries(settings.apiProviders)) {
+    apiProviders[name] = {
+      enabled: cfg.enabled,
+      hasKey: !!cfg.apiKey,
+      baseUrl: cfg.baseUrl,
+      notes: cfg.notes,
+    };
+  }
+  return {
+    routingMode: settings.routingMode,
+    subscriptions: settings.subscriptions,
+    apiProviders,
+    ollama: settings.ollama,
+    ui: settings.ui,
+    updatedAt: settings.updatedAt,
+  };
+}
+
+/**
+ * Apply settings to process.env so that the existing external-providers.ts
+ * code transparently picks up user-configured API keys without changes.
+ */
+export function applySettingsToEnv(settings: Settings = loadSettings()): void {
+  const apiEnvMap: Record<string, string> = {
+    cerebras: 'CEREBRAS_API_KEY',
+    groq: 'GROQ_API_KEY',
+    mistral: 'MISTRAL_API_KEY',
+    nvidia: 'NVIDIA_API_KEY',
+    cloudflare: 'CLOUDFLARE_AI_TOKEN',
+    'openai-codex': 'OPENAI_API_KEY',
+  };
+  for (const [name, cfg] of Object.entries(settings.apiProviders)) {
+    const envKey = apiEnvMap[name];
+    if (envKey && cfg.enabled && cfg.apiKey) {
+      process.env[envKey] = cfg.apiKey;
+    }
+  }
+  if (settings.ollama.enabled && settings.ollama.baseUrl) {
+    process.env['OLLAMA_BASE_URL'] = settings.ollama.baseUrl;
+  }
+
+  // Map subscription IDs to the env var the existing provider lookup uses
+  const subEnvMap: Record<string, string> = {
+    'claude-code': 'CLAUDE_BRIDGE_URL',
+    'github-copilot': 'COPILOT_BRIDGE_URL',
+    'microsoft-365-copilot': 'M365_COPILOT_BRIDGE_URL',
+    'chatgpt': 'CHATGPT_BRIDGE_URL',
+    'gemini': 'GEMINI_BRIDGE_URL',
+    'codex': 'CODEX_BRIDGE_URL',
+    'aider': 'AIDER_BRIDGE_URL',
+  };
+  for (const [id, cfg] of Object.entries(settings.subscriptions)) {
+    const envKey = subEnvMap[id];
+    if (envKey && cfg.enabled && cfg.bridgeUrl) {
+      process.env[envKey] = cfg.bridgeUrl;
+    }
+  }
+}
+
+export const SettingsPatchSchema = SettingsSchema.partial().extend({
+  subscriptions: SettingsSchema.shape.subscriptions.optional(),
+  apiProviders: SettingsSchema.shape.apiProviders.optional(),
+  ollama: SettingsSchema.shape.ollama.optional(),
+  ui: SettingsSchema.shape.ui.optional(),
+});
--- a/packages/gateway/src/modules/share-card.ts
+++ b/packages/gateway/src/modules/share-card.ts
@ -0,0 +1,174 @@
+/**
+ * Public Share Card Generator
+ *
+ * Renders a shareable SVG image showing your gateway savings — useful for
+ * social posts, blog headers, README badges. Tokens are rounded; no
+ * personally identifying information leaks (caller IDs, model names etc.
+ * are NOT included). Just headline numbers + brand.
+ *
+ * Output is always a valid SVG so it can be embedded as `<img src="...">`
+ * or downloaded directly.
+ */
+import type { Pool } from 'pg';
+import { getComprehensiveSavings } from './savings-calculator.js';
+import { getBuddyState } from './gamification.js';
+
+function fmtNum(n: number): string {
+  if (n >= 1_000_000) return (n / 1_000_000).toFixed(1) + 'M';
+  if (n >= 1_000) return (n / 1_000).toFixed(1) + 'K';
+  return Math.round(n).toString();
+}
+function fmtCost(c: number): string {
+  if (c < 0.01) return `$${c.toFixed(6)}`;
+  if (c < 1) return `$${c.toFixed(4)}`;
+  return `$${c.toFixed(2)}`;
+}
+function escSvg(s: string): string {
+  return s.replace(/&/g, '&amp;').replace(/</g, '&lt;').replace(/>/g, '&gt;').replace(/"/g, '&quot;');
+}
+
+export type ShareCardPeriod = 'day' | 'week' | 'month' | 'all';
+export type ShareCardTheme = 'dark' | 'light';
+
+const PERIOD_HOURS: Record<ShareCardPeriod, number> = {
+  day: 24, week: 168, month: 720, all: 24 * 365 * 5,
+};
+
+export async function generateShareCard(
+  db: Pool,
+  opts: { period?: ShareCardPeriod; theme?: ShareCardTheme } = {}
+): Promise<string> {
+  const period: ShareCardPeriod = opts.period ?? 'month';
+  const theme: ShareCardTheme = opts.theme ?? 'dark';
+  const hours = PERIOD_HOURS[period];
+
+  const [savings, buddy] = await Promise.all([
+    getComprehensiveSavings(db, hours),
+    getBuddyState(db, 'gateway'),
+  ]);
+
+  // Theme palette
+  const palette = theme === 'dark' ? {
+    bg: '#0a0a0a', surface: '#161616', text: '#e8e8e8', dim: '#888888',
+    accent: '#d4ff00', accentDim: '#8aa800', border: '#2a2a2a',
+  } : {
+    bg: '#f4f7fa', surface: '#ffffff', text: '#24313d', dim: '#667684',
+    accent: '#0f766e', accentDim: '#8ab9b5', border: '#d6e0e7',
+  };
+
+  const periodLabel = period === 'day' ? 'Last 24 hours'
+    : period === 'week' ? 'Last 7 days'
+    : period === 'month' ? 'Last 30 days'
+    : 'All-time';
+
+  const W = 1200, H = 630; // Open Graph standard
+  const totalTokens = savings.totalTokensSaved;
+  const totalCost = savings.totalCostSaved;
+  const reqCount = savings.totals.requests;
+  const efficacy = savings.costWithoutGateway > 0
+    ? ((savings.costWithoutGateway - savings.costWithGateway) / savings.costWithoutGateway) * 100
+    : 0;
+
+  // Source-bar widths
+  const total = Math.max(0.0000001, savings.totalCostSaved);
+  const wCache = (savings.bySource.cache.cost / total) * 100;
+  const wComp  = (savings.bySource.compression.cost / total) * 100;
+  const wSub   = (savings.bySource.subscriptionBridge.cost / total) * 100;
+  const wLocal = (savings.bySource.localRouting.cost / total) * 100;
+  const wRace  = (savings.bySource.raceMode.cost / total) * 100;
+
+  return `<svg xmlns="http://www.w3.org/2000/svg" width="${W}" height="${H}" viewBox="0 0 ${W} ${H}">
+  <defs>
+    <linearGradient id="bgGrad" x1="0" y1="0" x2="1" y2="1">
+      <stop offset="0%"  stop-color="${palette.bg}"/>
+      <stop offset="100%" stop-color="${palette.surface}"/>
+    </linearGradient>
+    <radialGradient id="glow" cx="20%" cy="0%" r="80%">
+      <stop offset="0%"  stop-color="${palette.accent}" stop-opacity="0.20"/>
+      <stop offset="60%" stop-color="${palette.accent}" stop-opacity="0.04"/>
+      <stop offset="100%" stop-color="${palette.bg}"     stop-opacity="0"/>
+    </radialGradient>
+    <style>
+      .mono   { font-family: 'JetBrains Mono', 'SF Mono', monospace; }
+      .sans   { font-family: 'Inter', -apple-system, sans-serif; }
+      .num    { font-weight: 700; letter-spacing: -0.02em; }
+      .label  { letter-spacing: 0.16em; text-transform: uppercase; }
+    </style>
+  </defs>
+
+  <!-- background -->
+  <rect width="${W}" height="${H}" fill="url(#bgGrad)"/>
+  <rect width="${W}" height="${H}" fill="url(#glow)"/>
+  <rect width="${W}" height="${H}" fill="none" stroke="${palette.border}" stroke-width="2"/>
+
+  <!-- brand mark -->
+  <g transform="translate(48 48)">
+    <rect x="0" y="0" width="14" height="14" fill="${palette.accent}"/>
+    <text x="24" y="12" class="mono" font-size="20" font-weight="700" fill="${palette.text}">llm.gateway</text>
+    <text x="180" y="12" class="mono" font-size="13" fill="${palette.dim}">— ${escSvg(periodLabel)}</text>
+  </g>
+
+  <!-- top-right: brand tag / version -->
+  <g transform="translate(${W - 48} 48)">
+    <text x="0" y="12" text-anchor="end" class="mono" font-size="11" fill="${palette.dim}" letter-spacing="0.1em">CONTEXT-X.ORG</text>
+  </g>
+
+  <!-- HUGE counter — eyebrow above, big number well below to avoid overlap -->
+  <g transform="translate(48 ${H/2 - 110})">
+    <text x="0" y="0" class="mono label" font-size="14" fill="${palette.dim}">tokens prevented · ${escSvg(periodLabel.toLowerCase())}</text>
+    <text x="0" y="135" class="mono num" font-size="120" fill="${palette.accent}">${fmtNum(totalTokens)}</text>
+    <text x="0" y="180" class="mono" font-size="18" fill="${palette.text}">
+      <tspan>${fmtCost(totalCost)} saved</tspan>
+      <tspan dx="20" fill="${palette.dim}">·</tspan>
+      <tspan dx="14">${fmtNum(reqCount)} calls</tspan>
+      <tspan dx="20" fill="${palette.dim}">·</tspan>
+      <tspan dx="14">${efficacy.toFixed(1)}% efficiency</tspan>
+    </text>
+  </g>
+
+  <!-- 5-axis breakdown bar -->
+  <g transform="translate(48 ${H - 180})">
+    <text x="0" y="0" class="mono label" font-size="12" fill="${palette.dim}">savings sources · 5-axis breakdown</text>
+    <rect x="0" y="14" width="${W - 96}" height="22" fill="${palette.surface}" stroke="${palette.border}"/>
+    ${(() => {
+      let x = 0;
+      const segs: string[] = [];
+      const w = W - 96;
+      const pieces = [
+        { p: wCache, c: '#d4ff00', label: '⚡' },
+        { p: wComp,  c: '#2dd4bf', label: '🗜' },
+        { p: wSub,   c: '#60a5fa', label: '🌉' },
+        { p: wLocal, c: '#a78bfa', label: '🏠' },
+        { p: wRace,  c: '#f97316', label: '🏁' },
+      ];
+      for (const piece of pieces) {
+        const segW = (piece.p / 100) * w;
+        if (segW > 0.5) {
+          segs.push(`<rect x="${x}" y="14" width="${segW}" height="22" fill="${piece.c}"/>`);
+        }
+        x += segW;
+      }
+      return segs.join('');
+    })()}
+    <g transform="translate(0 60)" class="mono" font-size="11" fill="${palette.dim}">
+      <text x="0"   y="0"><tspan fill="#d4ff00">●</tspan> cache</text>
+      <text x="120" y="0"><tspan fill="#2dd4bf">●</tspan> compression</text>
+      <text x="270" y="0"><tspan fill="#60a5fa">●</tspan> subscription bridges</text>
+      <text x="470" y="0"><tspan fill="#a78bfa">●</tspan> local routing</text>
+      <text x="600" y="0"><tspan fill="#f97316">●</tspan> race mode</text>
+    </g>
+  </g>
+
+  <!-- footer / buddy -->
+  <g transform="translate(48 ${H - 70})">
+    <text x="0" y="0" class="mono" font-size="11" fill="${palette.dim}">
+      <tspan fill="${palette.accent}">${escSvg(buddy.species)}</tspan>
+      <tspan dx="6">·</tspan>
+      <tspan dx="6">Lv.${buddy.level}</tspan>
+      <tspan dx="6">·</tspan>
+      <tspan dx="6">${buddy.streakDays}d streak</tspan>
+      <tspan dx="20" fill="${palette.dim}">— routing AI traffic since ${escSvg(new Date().toISOString().split('T')[0])}</tspan>
+    </text>
+  </g>
+</svg>`;
+}
--- a/packages/gateway/src/modules/subscription-discovery.ts
+++ b/packages/gateway/src/modules/subscription-discovery.ts
@ -0,0 +1,303 @@
+/**
+ * Subscription Discovery
+ *
+ * Auto-detects locally installed CLI subscriptions (Claude Code, GitHub Copilot,
+ * ChatGPT, Gemini, etc.) and reports their authentication status. The discovery
+ * results drive automatic bridge spawning and dynamic provider registration.
+ */
+
+import { execFile } from 'child_process';
+import { promisify } from 'util';
+import { existsSync } from 'fs';
+import { logger } from '../observability/logger.js';
+
+const execFileAsync = promisify(execFile);
+
+export type SubscriptionId =
+  | 'claude-code'
+  | 'github-copilot'
+  | 'microsoft-365-copilot'
+  | 'chatgpt'
+  | 'gemini'
+  | 'codex'
+  | 'aider';
+
+export interface SubscriptionDescriptor {
+  id: SubscriptionId;
+  /** Friendly display name */
+  label: string;
+  /** CLI binary required to use the subscription */
+  command: string;
+  /** Args used for the version probe */
+  versionArgs: readonly string[];
+  /** Args used for the auth probe (optional) */
+  authProbeArgs?: readonly string[];
+  /** Default port the bridge listens on */
+  bridgePort: number;
+  /** ENV var the gateway uses to find the bridge URL */
+  bridgeEnvKey: string;
+  /** Logical provider name in `external-providers.ts` */
+  providerName: string;
+  /** Models exposed via this subscription */
+  models: ReadonlyArray<{ id: string; tier: 'fast' | 'medium' | 'large' | 'reasoning' }>;
+  /** Bridge implementation path (relative to repo root or absolute) */
+  bridgeImplementation: 'inline-claude' | 'inline-openai' | 'inline-copilot' | 'external-codex';
+}
+
+export interface SubscriptionStatus {
+  descriptor: SubscriptionDescriptor;
+  installed: boolean;
+  authenticated: boolean | 'unknown';
+  version?: string;
+  error?: string;
+  bridgeUrl?: string;
+  bridgeRunning: boolean;
+}
+
+/**
+ * Catalog of subscriptions the gateway knows how to bootstrap.
+ * Adding a new entry here is enough to make it discoverable.
+ */
+export const SUBSCRIPTION_CATALOG: readonly SubscriptionDescriptor[] = [
+  {
+    id: 'claude-code',
+    label: 'Claude Code (Anthropic Subscription)',
+    command: 'claude',
+    versionArgs: ['--version'],
+    bridgePort: 3250,
+    bridgeEnvKey: 'CLAUDE_BRIDGE_URL',
+    providerName: 'claude-bridge',
+    bridgeImplementation: 'inline-claude',
+    models: [
+      { id: 'claude-opus-4-1', tier: 'reasoning' },
+      { id: 'claude-sonnet-4-1', tier: 'large' },
+      { id: 'claude-haiku-3', tier: 'fast' },
+    ],
+  },
+  {
+    id: 'github-copilot',
+    label: 'GitHub Copilot Subscription',
+    command: 'gh',
+    versionArgs: ['copilot', '--version'],
+    bridgePort: 3252,
+    bridgeEnvKey: 'COPILOT_BRIDGE_URL',
+    providerName: 'copilot-bridge',
+    bridgeImplementation: 'inline-copilot',
+    models: [
+      { id: 'gpt-4', tier: 'reasoning' },
+      { id: 'gpt-3.5-turbo', tier: 'medium' },
+    ],
+  },
+  {
+    id: 'microsoft-365-copilot',
+    label: 'Microsoft 365 Copilot Subscription',
+    command: 'node',
+    versionArgs: ['--version'],
+    bridgePort: 3257,
+    bridgeEnvKey: 'M365_COPILOT_BRIDGE_URL',
+    providerName: 'm365-copilot-bridge',
+    bridgeImplementation: 'inline-openai',
+    models: [
+      { id: 'microsoft-365-copilot', tier: 'reasoning' },
+      { id: 'm365-copilot-chat', tier: 'large' },
+    ],
+  },
+  {
+    id: 'chatgpt',
+    label: 'OpenAI ChatGPT Plus Subscription',
+    command: 'chatgpt',
+    versionArgs: ['--version'],
+    bridgePort: 3251,
+    bridgeEnvKey: 'CHATGPT_BRIDGE_URL',
+    providerName: 'chatgpt-bridge',
+    bridgeImplementation: 'inline-openai',
+    models: [
+      { id: 'gpt-4-turbo', tier: 'reasoning' },
+      { id: 'gpt-4', tier: 'large' },
+      { id: 'gpt-3.5-turbo', tier: 'medium' },
+    ],
+  },
+  {
+    id: 'gemini',
+    label: 'Google Gemini Advanced Subscription',
+    command: 'gemini',
+    versionArgs: ['--version'],
+    bridgePort: 3254,
+    bridgeEnvKey: 'GEMINI_BRIDGE_URL',
+    providerName: 'gemini-bridge',
+    bridgeImplementation: 'inline-openai',
+    models: [
+      { id: 'gemini-1.5-pro', tier: 'reasoning' },
+      { id: 'gemini-1.5-flash', tier: 'fast' },
+    ],
+  },
+  {
+    id: 'codex',
+    label: 'OpenAI Codex CLI Subscription',
+    command: 'codex',
+    versionArgs: ['--version'],
+    authProbeArgs: ['login', 'status'],
+    bridgePort: 3253,
+    bridgeEnvKey: 'CODEX_BRIDGE_URL',
+    providerName: 'codex-bridge',
+    bridgeImplementation: 'external-codex',
+    models: [
+      { id: 'gpt-5.1-codex', tier: 'reasoning' },
+      { id: 'gpt-5.1-codex-mini', tier: 'large' },
+      { id: 'codex-mini-latest', tier: 'medium' },
+    ],
+  },
+  {
+    id: 'aider',
+    label: 'Aider AI Pair Programmer',
+    command: 'aider',
+    versionArgs: ['--version'],
+    bridgePort: 3256,
+    bridgeEnvKey: 'AIDER_BRIDGE_URL',
+    providerName: 'aider-bridge',
+    bridgeImplementation: 'inline-openai',
+    models: [
+      { id: 'aider-default', tier: 'large' },
+    ],
+  },
+];
+
+/**
+ * Probe a CLI's --version with a 3s timeout. Returns null when not installed.
+ */
+async function probeVersion(command: string, args: readonly string[]): Promise<string | null> {
+  try {
+    const { stdout, stderr } = await execFileAsync(command, args as string[], {
+      timeout: 3000,
+      maxBuffer: 64 * 1024,
+    });
+    const out = (stdout || stderr || '').trim().split('\n')[0];
+    return out || 'installed';
+  } catch (err: unknown) {
+    const code = (err as NodeJS.ErrnoException).code;
+    if (code === 'ENOENT') return null;
+    // Non-zero exit code but command exists (e.g. auth required) — count as installed
+    return 'installed';
+  }
+}
+
+/**
+ * Best-effort authentication check. Many CLI tools don't have a clean probe,
+ * so we return 'unknown' rather than guessing wrong.
+ */
+async function probeAuthenticated(desc: SubscriptionDescriptor): Promise<boolean | 'unknown'> {
+  // Claude Code stores credentials in ~/.claude/.credentials.json
+  if (desc.id === 'claude-code') {
+    const home = process.env.HOME || '/root';
+    return existsSync(`${home}/.claude/.credentials.json`);
+  }
+  // GitHub Copilot uses gh auth status
+  if (desc.id === 'github-copilot') {
+    try {
+      await execFileAsync('gh', ['auth', 'status'], { timeout: 3000 });
+      return true;
+    } catch {
+      return false;
+    }
+  }
+  if (desc.id === 'microsoft-365-copilot') {
+    return Boolean(
+      process.env['MICROSOFT_GRAPH_ACCESS_TOKEN'] ||
+      process.env['M365_COPILOT_ACCESS_TOKEN'] ||
+      process.env['MICROSOFT_CLIENT_ID']
+    );
+  }
+  if (desc.id === 'codex') {
+    try {
+      await execFileAsync('codex', ['login', 'status'], { timeout: 3000 });
+      return true;
+    } catch {
+      return false;
+    }
+  }
+  return 'unknown';
+}
+
+/**
+ * Check whether a bridge URL is reachable.
+ */
+async function probeBridge(url: string | undefined): Promise<boolean> {
+  if (!url) return false;
+  try {
+    const controller = new AbortController();
+    const timeoutId = setTimeout(() => controller.abort(), 1500);
+    try {
+      await fetch(`${url.replace(/\/$/, '')}/health`, { signal: controller.signal });
+      return true;
+    } finally {
+      clearTimeout(timeoutId);
+    }
+  } catch {
+    return false;
+  }
+}
+
+/**
+ * Resolve the bridge URL for a subscription:
+ *   1. Explicit env var (CLAUDE_BRIDGE_URL etc.) — set by Settings or PM2 ecosystem
+ *   2. Auto-detect: probe http://127.0.0.1:{bridgePort} for a /health endpoint
+ *
+ * This means a bridge running locally on its default port is picked up
+ * automatically without any configuration.
+ */
+async function resolveBridgeUrl(desc: SubscriptionDescriptor): Promise<{ url?: string; running: boolean }> {
+  const explicit = process.env[desc.bridgeEnvKey];
+  if (explicit) {
+    const running = await probeBridge(explicit);
+    return { url: explicit, running };
+  }
+  // Auto-detect on the default port
+  const localUrl = `http://127.0.0.1:${desc.bridgePort}`;
+  const running = await probeBridge(localUrl);
+  return running ? { url: localUrl, running: true } : { running: false };
+}
+
+/**
+ * Discover all subscriptions the gateway knows about. Probes the CLI binary,
+ * authentication state, and any pre-configured bridge URL in the environment.
+ */
+export async function discoverSubscriptions(): Promise<SubscriptionStatus[]> {
+  const results = await Promise.all(
+    SUBSCRIPTION_CATALOG.map(async (desc): Promise<SubscriptionStatus> => {
+      // Always probe the bridge first — a running bridge is enough to count
+      // as "available" even if the CLI isn't installed on this host (the
+      // bridge could live on the user's machine).
+      const bridge = await resolveBridgeUrl(desc);
+
+      const version = await probeVersion(desc.command, desc.versionArgs);
+      if (!version) {
+        return {
+          descriptor: desc,
+          installed: bridge.running, // remote bridge counts as installed
+          authenticated: bridge.running ? 'unknown' : false,
+          bridgeUrl: bridge.url,
+          bridgeRunning: bridge.running,
+        };
+      }
+
+      const authenticated = await probeAuthenticated(desc);
+      return {
+        descriptor: desc,
+        installed: true,
+        authenticated,
+        version,
+        bridgeUrl: bridge.url,
+        bridgeRunning: bridge.running,
+      };
+    })
+  );
+  logger.info(
+    {
+      detected: results.filter((r) => r.installed).length,
+      bridgesLive: results.filter((r) => r.bridgeRunning).length,
+      total: results.length,
+    },
+    'Subscription discovery completed'
+  );
+  return results;
+}
--- a/packages/gateway/src/modules/subscription-wallet.ts
+++ b/packages/gateway/src/modules/subscription-wallet.ts
@ -0,0 +1,271 @@
+/**
+ * Subscription Pool Wallet
+ *
+ * Tracks usage of each CLI subscription against its known quota window
+ * (Claude Plus = 80 msg / 3h, ChatGPT Plus = 80 msg / 3h, Copilot = …).
+ * Used by the dashboard to show which subscription has the most headroom
+ * and (future) by the router to load-balance across subscriptions.
+ *
+ * This is the feature competitors don't have: combining MULTIPLE personal
+ * AI subscriptions into a single managed pool.
+ */
+
+import type { Pool } from 'pg';
+import { logger } from '../observability/logger.js';
+
+export interface QuotaProfile {
+  subscriptionId: string;
+  label: string;
+  /** Hard request quota inside the window. Null = unknown / unlimited. */
+  requestQuota: number | null;
+  /** Window length in seconds (Anthropic uses 3h = 10800s, OpenAI varies). */
+  windowSeconds: number;
+  /** Reset behaviour: 'rolling' = sliding window, 'fixed' = clock-aligned reset. */
+  reset: 'rolling' | 'fixed';
+}
+
+/**
+ * Known subscription quota profiles. Numbers are conservative defaults —
+ * users can override via Settings if their plan differs.
+ */
+export const QUOTA_PROFILES: Record<string, QuotaProfile> = {
+  'claude-code':           { subscriptionId: 'claude-code',           label: 'Claude Code (Pro)',         requestQuota: 45,   windowSeconds: 5 * 3600,  reset: 'rolling' },
+  'github-copilot':        { subscriptionId: 'github-copilot',        label: 'GitHub Copilot',            requestQuota: null, windowSeconds: 30 * 86400, reset: 'fixed' },
+  'microsoft-365-copilot': { subscriptionId: 'microsoft-365-copilot', label: 'M365 Copilot',              requestQuota: null, windowSeconds: 30 * 86400, reset: 'fixed' },
+  'chatgpt':               { subscriptionId: 'chatgpt',               label: 'ChatGPT Plus',              requestQuota: 80,   windowSeconds: 3 * 3600,  reset: 'rolling' },
+  'gemini':                { subscriptionId: 'gemini',                label: 'Gemini Advanced',           requestQuota: null, windowSeconds: 30 * 86400, reset: 'fixed' },
+  'codex':                 { subscriptionId: 'codex',                 label: 'OpenAI Codex',              requestQuota: 150,  windowSeconds: 5 * 3600,  reset: 'rolling' },
+  'aider':                 { subscriptionId: 'aider',                 label: 'Aider',                     requestQuota: null, windowSeconds: 86400,     reset: 'fixed' },
+};
+
+/** Record a request against a subscription quota window. */
+export async function recordSubscriptionUsage(
+  db: Pool,
+  subscriptionId: string,
+  tokensConsumed: number = 0
+): Promise<void> {
+  const profile = QUOTA_PROFILES[subscriptionId];
+  if (!profile) return;
+
+  // Compute the window-start timestamp this request belongs to.
+  const now = new Date();
+  let windowStart: Date;
+  if (profile.reset === 'rolling') {
+    // Floor to the most recent quarter-hour for grouping; rolling logic
+    // applied at read-time by summing the last `windowSeconds`.
+    const rounded = Math.floor(now.getTime() / 900_000) * 900_000;
+    windowStart = new Date(rounded);
+  } else {
+    // Fixed reset — bucket into day windows
+    const day = new Date(now);
+    day.setUTCHours(0, 0, 0, 0);
+    windowStart = day;
+  }
+
+  try {
+    await db.query(
+      `
+      INSERT INTO subscription_quota_window
+        (subscription_id, window_start, window_seconds, request_count, tokens_consumed, quota_limit, reset_at)
+      VALUES ($1, $2, $3, 1, $4, $5, $6)
+      ON CONFLICT (subscription_id, window_start)
+      DO UPDATE SET
+        request_count   = subscription_quota_window.request_count + 1,
+        tokens_consumed = subscription_quota_window.tokens_consumed + EXCLUDED.tokens_consumed
+      `,
+      [
+        subscriptionId,
+        windowStart,
+        profile.windowSeconds,
+        tokensConsumed,
+        profile.requestQuota,
+        new Date(windowStart.getTime() + profile.windowSeconds * 1000),
+      ]
+    );
+  } catch (err) {
+    logger.warn({ err, subscriptionId }, 'subscription-wallet: usage record failed');
+  }
+}
+
+export interface WalletEntry {
+  subscriptionId: string;
+  label: string;
+  requestQuota: number | null;
+  used: number;
+  remaining: number | null;
+  utilizationPercent: number | null;
+  windowSeconds: number;
+  resetAt: string | null;
+  /** Predicted exhaustion timestamp based on current rate; null if no quota or no usage. */
+  predictedExhaustionAt: string | null;
+  recommendation: 'use-this' | 'available' | 'near-limit' | 'exhausted' | 'unknown';
+}
+
+/** Build the wallet snapshot for the dashboard. */
+export async function getSubscriptionWallet(db: Pool): Promise<WalletEntry[]> {
+  const entries: WalletEntry[] = [];
+
+  for (const profile of Object.values(QUOTA_PROFILES)) {
+    let used = 0;
+    let resetAt: string | null = null;
+    let predictedExhaustionAt: string | null = null;
+
+    try {
+      const result = await db.query(
+        `
+        SELECT
+          COALESCE(SUM(request_count), 0)::INT AS used,
+          MAX(reset_at) AS reset_at
+        FROM subscription_quota_window
+        WHERE subscription_id = $1
+          AND window_start > NOW() - MAKE_INTERVAL(secs => $2)
+        `,
+        [profile.subscriptionId, profile.windowSeconds]
+      );
+      used = parseInt(result.rows[0]?.used ?? '0', 10);
+      resetAt = result.rows[0]?.reset_at ? new Date(result.rows[0].reset_at).toISOString() : null;
+    } catch (err) {
+      logger.warn({ err, sub: profile.subscriptionId }, 'wallet: read failed');
+    }
+
+    const remaining = profile.requestQuota !== null ? Math.max(profile.requestQuota - used, 0) : null;
+    const utilizationPercent = profile.requestQuota
+      ? Math.min(100, (used / profile.requestQuota) * 100)
+      : null;
+
+    // Linear extrapolation for predicted exhaustion.
+    if (remaining !== null && used > 0 && profile.requestQuota) {
+      const ratePerSecond = used / profile.windowSeconds;
+      if (ratePerSecond > 0) {
+        const secondsRemaining = remaining / ratePerSecond;
+        predictedExhaustionAt = new Date(Date.now() + secondsRemaining * 1000).toISOString();
+      }
+    }
+
+    let recommendation: WalletEntry['recommendation'] = 'unknown';
+    if (utilizationPercent !== null) {
+      if (utilizationPercent >= 100) recommendation = 'exhausted';
+      else if (utilizationPercent >= 80) recommendation = 'near-limit';
+      else if (utilizationPercent <= 30) recommendation = 'use-this';
+      else recommendation = 'available';
+    }
+
+    entries.push({
+      subscriptionId: profile.subscriptionId,
+      label: profile.label,
+      requestQuota: profile.requestQuota,
+      used,
+      remaining,
+      utilizationPercent: utilizationPercent !== null ? Math.round(utilizationPercent * 10) / 10 : null,
+      windowSeconds: profile.windowSeconds,
+      resetAt,
+      predictedExhaustionAt,
+      recommendation,
+    });
+  }
+
+  return entries;
+}
+
+/**
+ * Map an Ollama / external model id to the subscription it belongs to,
+ * if any. Returns null for non-subscription models (free APIs, local Ollama).
+ */
+export function modelToSubscriptionId(model: string): string | null {
+  const m = model.toLowerCase();
+  if (m.startsWith('claude-') || m.includes('claude')) return 'claude-code';
+  if (m.startsWith('gpt-5.1-codex') || m === 'codex-mini-latest') return 'codex';
+  if (m.startsWith('gpt-')) return 'chatgpt';
+  if (m.startsWith('gemini-')) return 'gemini';
+  if (m.startsWith('github-copilot') || m === 'copilot-chat') return 'github-copilot';
+  if (m === 'microsoft-365-copilot' || m === 'm365-copilot-chat') return 'microsoft-365-copilot';
+  return null;
+}
+
+/**
+ * Post-process a routing decision against the subscription wallet.
+ *
+ * If the picked model belongs to a subscription that is `exhausted` or
+ * `near-limit` (>=80% utilization), we look at the same-tier siblings in
+ * the fallback chain and re-pick the one with the most headroom.
+ *
+ * This is the Pool-Routing feature: distribute load across YOUR subscriptions
+ * to maximize their value rather than always routing to the primary.
+ */
+export async function applyPoolRouting(
+  db: Pool,
+  decision: { model: string; fallback_chain: string[]; tier: string },
+  options: { forced?: boolean } = {}
+): Promise<{ model: string; fallback_chain: string[]; reason: string } | null> {
+  const wallet = await getSubscriptionWallet(db);
+  const utilByModel = (model: string): number | null => {
+    const sub = modelToSubscriptionId(model);
+    if (!sub) return null;
+    const w = wallet.find((entry) => entry.subscriptionId === sub);
+    return w?.utilizationPercent ?? null;
+  };
+  const isExhausted = (model: string): boolean => {
+    const sub = modelToSubscriptionId(model);
+    if (!sub) return false;
+    const w = wallet.find((entry) => entry.subscriptionId === sub);
+    return w?.recommendation === 'exhausted';
+  };
+
+  const primaryUtil = utilByModel(decision.model);
+  const primarySub = modelToSubscriptionId(decision.model);
+
+  // No re-routing for non-subscription models or when primary has plenty of headroom
+  if (!primarySub) return null;
+  if (!options.forced && primaryUtil !== null && primaryUtil < 80 && !isExhausted(decision.model)) return null;
+
+  // Find a sibling in the fallback chain with lower utilization
+  const candidates = decision.fallback_chain.filter((m) => m !== decision.model);
+  let bestModel = decision.model;
+  let bestUtil = primaryUtil ?? 100;
+
+  for (const candidate of candidates) {
+    if (isExhausted(candidate)) continue;
+    const util = utilByModel(candidate);
+    if (util === null) continue; // unknown utilization — don't pick blindly over a known one
+    if (util < bestUtil) {
+      bestUtil = util;
+      bestModel = candidate;
+    }
+  }
+
+  if (bestModel === decision.model) return null;
+
+  // Move chosen model to front of chain
+  const newChain = [bestModel, ...decision.fallback_chain.filter((m) => m !== bestModel)];
+  return {
+    model: bestModel,
+    fallback_chain: newChain,
+    reason: `pool-route: primary ${decision.model} at ${primaryUtil?.toFixed(0) ?? '?'}% util, switched to ${bestModel} at ${bestUtil.toFixed(0)}%`,
+  };
+}
+
+/** Pick the subscription with the most headroom for a given tier. */
+export async function pickBestSubscription(
+  db: Pool,
+  candidates: readonly string[]
+): Promise<{ subscriptionId: string; reason: string } | null> {
+  const wallet = await getSubscriptionWallet(db);
+  const eligible = wallet.filter(
+    (w) => candidates.includes(w.subscriptionId) && w.recommendation !== 'exhausted'
+  );
+  if (eligible.length === 0) return null;
+  // Sort: lowest utilization first (most headroom). Unknown utilisation
+  // sorts to the middle so paid quotas with usage data win over unknowns.
+  eligible.sort((a, b) => {
+    const ua = a.utilizationPercent ?? 50;
+    const ub = b.utilizationPercent ?? 50;
+    return ua - ub;
+  });
+  const winner = eligible[0];
+  return {
+    subscriptionId: winner.subscriptionId,
+    reason: winner.utilizationPercent !== null
+      ? `${winner.utilizationPercent.toFixed(0)}% used in window`
+      : 'no quota tracking',
+  };
+}
--- a/packages/gateway/src/pipeline/external-providers.ts
+++ b/packages/gateway/src/pipeline/external-providers.ts
@ -86,6 +86,17 @@ const PROVIDERS: readonly ExternalProvider[] = [
      { id: 'gpt-3.5-turbo', tier: 'medium', contextLength: 4096 },
    ],
  },
+  {
+    name: 'm365-copilot-bridge',
+    baseUrl: '', // constructed from M365_COPILOT_BRIDGE_URL env var
+    envKey: 'M365_COPILOT_BRIDGE_URL',
+    rateLimitRpm: 60,
+    enabled: true,
+    models: [
+      { id: 'microsoft-365-copilot', tier: 'reasoning', contextLength: 128000 },
+      { id: 'm365-copilot-chat', tier: 'large', contextLength: 128000 },
+    ],
+  },
  {
    name: 'cerebras',
    baseUrl: 'https://api.cerebras.ai/v1',
@ -146,12 +157,13 @@ const PROVIDERS: readonly ExternalProvider[] = [
  {
    name: 'openai-codex',
    baseUrl: 'https://api.openai.com/v1',
-    envKey: 'OPENAI_API_KEY',
+    envKey: 'OPENAI_CODEX_URL',
    rateLimitRpm: 60,
    enabled: true,
    models: [
-      { id: 'gpt-4-turbo', tier: 'reasoning', contextLength: 128000 },
-      { id: 'gpt-3.5-turbo', tier: 'fast', contextLength: 16384 },
+      { id: 'gpt-5.1-codex', tier: 'reasoning', contextLength: 256000 },
+      { id: 'gpt-5.1-codex-mini', tier: 'large', contextLength: 256000 },
+      { id: 'codex-mini-latest', tier: 'medium', contextLength: 200000 },
    ],
  },
  {
@ -169,16 +181,28 @@ const PROVIDERS: readonly ExternalProvider[] = [
  {
    name: 'codex',
    baseUrl: 'https://api.github.com/copilot_inner/v2',
-    envKey: 'GITHUB_CODEX_TOKEN',
+    envKey: 'CODEX_BRIDGE_URL',
    rateLimitRpm: 60,
    enabled: true,
    models: [
-      { id: 'github-copilot-x', tier: 'large', contextLength: 8192 },
-      { id: 'code-davinci-002', tier: 'medium', contextLength: 4096 },
+      { id: 'gpt-5.1-codex', tier: 'reasoning', contextLength: 256000 },
+      { id: 'gpt-5.1-codex-mini', tier: 'large', contextLength: 256000 },
+      { id: 'codex-mini-latest', tier: 'medium', contextLength: 200000 },
    ],
  },
 ];

+const AUTHLESS_BRIDGE_PROVIDERS = new Set([
+  'claude-bridge',
+  'claude-code',
+  'openai-bridge',
+  'chatgpt-bridge',
+  'copilot-bridge',
+  'm365-copilot-bridge',
+]);
+
+const GENERATE_BRIDGE_PROVIDERS = new Set(['claude-bridge', 'claude-code']);
+
 // ─── Rate Limiter (simple sliding window) ───────────────────────────

 const requestTimestamps: Map<string, number[]> = new Map();
@ -213,25 +237,34 @@ function getApiKey(provider: ExternalProvider): string | undefined {
    return url ? 'claude-code-enabled' : undefined;
  }
  if (provider.name === 'openai-bridge') {
-    // openai-bridge uses OPENAI_API_KEY for auth, but also needs bridge URL
-    const apiKey = process.env['OPENAI_API_KEY'];
+    // Subscription bridge auth is handled by the bridge process/CLI session.
    const url = process.env['OPENAI_BRIDGE_URL'];
-    return apiKey && url ? apiKey : undefined;
+    return url ? 'openai-bridge-enabled' : undefined;
  }
  if (provider.name === 'chatgpt-bridge') {
-    // chatgpt-bridge can use same URL as openai-bridge (same service), but needs API key
-    const apiKey = process.env['OPENAI_API_KEY'];
+    // ChatGPT Plus bridge can reuse the OpenAI bridge when configured that way.
    const url = process.env['CHATGPT_BRIDGE_URL'] || process.env['OPENAI_BRIDGE_URL'];
-    return apiKey && url ? apiKey : undefined;
+    return url ? 'chatgpt-bridge-enabled' : undefined;
  }
  if (provider.name === 'copilot-bridge') {
-    // copilot-bridge uses GitHub Copilot subscription (auth handled internally by copilot-api)
-    // Just needs URL to be configured
+    // copilot-bridge uses GitHub Copilot subscription (auth handled internally by copilot-api).
    const url = process.env['COPILOT_BRIDGE_URL'];
    return url ? 'copilot-authenticated' : undefined;
  }
+  if (provider.name === 'm365-copilot-bridge') {
+    // Microsoft 365 Copilot uses Microsoft Graph delegated auth inside the bridge.
+    const url = process.env['M365_COPILOT_BRIDGE_URL'];
+    return url ? 'm365-copilot-bridge-enabled' : undefined;
+  }
+  if (provider.name === 'openai-codex') {
+    const bridgeUrl = process.env['OPENAI_CODEX_URL'] || process.env['CODEX_BRIDGE_URL'];
+    if (bridgeUrl) return 'openai-codex-bridge-enabled';
+    return process.env['OPENAI_API_KEY'] || undefined;
+  }
  if (provider.name === 'codex') {
-    // codex uses GitHub Codex API token
+    // Codex can run through an authless local/subscription bridge. A token remains supported as fallback.
+    const bridgeUrl = process.env['CODEX_BRIDGE_URL'] || process.env['OPENAI_CODEX_URL'];
+    if (bridgeUrl) return 'codex-bridge-enabled';
    const token = process.env['GITHUB_CODEX_TOKEN'];
    return token ? token : undefined;
  }
@ -241,11 +274,11 @@ function getApiKey(provider: ExternalProvider): string | undefined {
 function getBaseUrl(provider: ExternalProvider): string {
  if (provider.name === 'claude-bridge') {
    const url = process.env['CLAUDE_BRIDGE_URL'];
-    return url ? `${url}/v1` : '';
+    return url ?? '';
  }
  if (provider.name === 'claude-code') {
    const url = process.env['CLAUDE_CODE_URL'];
-    return url ? `${url}/v1` : '';
+    return url ?? '';
  }
  if (provider.name === 'openai-bridge') {
    const url = process.env['OPENAI_BRIDGE_URL'];
@ -257,7 +290,19 @@ function getBaseUrl(provider: ExternalProvider): string {
  }
  if (provider.name === 'copilot-bridge') {
    const url = process.env['COPILOT_BRIDGE_URL'];
-    return url ? `${url}` : '';
+    return url ? `${url}/v1` : '';
+  }
+  if (provider.name === 'm365-copilot-bridge') {
+    const url = process.env['M365_COPILOT_BRIDGE_URL'];
+    return url ? `${url}/v1` : '';
+  }
+  if (provider.name === 'openai-codex') {
+    const url = process.env['OPENAI_CODEX_URL'] || process.env['CODEX_BRIDGE_URL'];
+    return url ? `${url}/v1` : provider.baseUrl;
+  }
+  if (provider.name === 'codex') {
+    const url = process.env['CODEX_BRIDGE_URL'] || process.env['OPENAI_CODEX_URL'];
+    return url ? `${url}/v1` : provider.baseUrl;
  }
  if (provider.name === 'cloudflare') {
    const accountId = process.env['CLOUDFLARE_ACCOUNT_ID'];
@ -271,6 +316,11 @@ export function getAvailableProviders(): readonly ExternalProvider[] {
  return PROVIDERS.filter((p) => p.enabled && getApiKey(p));
 }

+/** Returns ALL configured providers (enabled or not, with or without API key). For dashboard listing. */
+export function getAllProviders(): readonly ExternalProvider[] {
+  return PROVIDERS;
+}
+
 function findBestModel(
  provider: ExternalProvider,
  targetTier: 'fast' | 'medium' | 'large' | 'reasoning',
@ -296,7 +346,11 @@ function findBestModel(

 function buildRequestHeaders(provider: ExternalProvider, apiKey: string): Record<string, string> {
  const headers: Record<string, string> = { 'Content-Type': 'application/json' };
-  if (!['claude-bridge', 'claude-code', 'openai-bridge', 'chatgpt-bridge', 'copilot-bridge'].includes(provider.name)) {
+  const usesAuthlessBridge = AUTHLESS_BRIDGE_PROVIDERS.has(provider.name)
+    || (provider.name === 'openai-codex' && !!(process.env['OPENAI_CODEX_URL'] || process.env['CODEX_BRIDGE_URL']))
+    || (provider.name === 'codex' && !!(process.env['CODEX_BRIDGE_URL'] || process.env['OPENAI_CODEX_URL']));
+
+  if (!usesAuthlessBridge) {
    headers['Authorization'] = `Bearer ${apiKey}`;
  }
  return headers;
@ -311,13 +365,29 @@ function buildRequestPayload(model: ExternalModel, request: ExternalCompletionRe
  };
 }

+function buildGenerateBridgePayload(model: ExternalModel, request: ExternalCompletionRequest): Record<string, unknown> {
+  const system = request.messages.find((m) => m.role === 'system')?.content;
+  const prompt = request.messages
+    .filter((m) => m.role !== 'system')
+    .map((m) => `${m.role}: ${m.content}`)
+    .join('\n\n');
+
+  return {
+    model: model.id,
+    prompt,
+    system,
+    temperature: request.temperature ?? 0.3,
+    max_tokens: request.max_tokens ?? 2048,
+  };
+}
+
 function parseExternalResponse(
  data: any,
  model: ExternalModel,
  provider: ExternalProvider,
  start: number,
 ): ExternalCompletionResponse {
-  const content = data.choices?.[0]?.message?.content ?? '';
+  const content = data.choices?.[0]?.message?.content ?? data.content ?? data.response ?? data.message?.content ?? '';
  recordRequest(provider.name);
  return {
    response: content,
@ -341,14 +411,15 @@ async function callProvider(
  const baseUrl = getBaseUrl(provider);
  if (!baseUrl) throw new Error(`No base URL for ${provider.name}`);

-  const url = `${baseUrl}/chat/completions`;
+  const generateBridge = GENERATE_BRIDGE_PROVIDERS.has(provider.name);
+  const url = generateBridge ? `${baseUrl}/api/generate` : `${baseUrl}/chat/completions`;
  const controller = new AbortController();
  const timer = setTimeout(() => controller.abort(), timeoutMs);
  const start = Date.now();

  try {
    const headers = buildRequestHeaders(provider, apiKey);
-    const payload = buildRequestPayload(model, request);
+    const payload = generateBridge ? buildGenerateBridgePayload(model, request) : buildRequestPayload(model, request);

    const response = await fetch(url, {
      method: 'POST',
--- a/packages/gateway/src/routes/dashboard.ts
+++ b/packages/gateway/src/routes/dashboard.ts
--- a/packages/gateway/src/routes/health.ts
+++ b/packages/gateway/src/routes/health.ts
@ -78,8 +78,9 @@ export async function healthRoute(fastify: FastifyInstance): Promise<void> {
      // Check if this is a dashboard UI request with ?ui=1 or ?dashboard=1
      const query = request.query as any;
      const isDashboardRequest = query.ui || query.dashboard;
+      const acceptsHtml = String(request.headers.accept ?? '').includes('text/html');

-      if (isDashboardRequest) {
+      if (isDashboardRequest || acceptsHtml) {
        try {
          const __filename = fileURLToPath(import.meta.url);
          const __dirname = dirname(__filename);
--- a/packages/gateway/src/routes/static.ts
+++ b/packages/gateway/src/routes/static.ts
@ -11,6 +11,22 @@ export async function staticRoute(fastify: FastifyInstance): Promise<void> {

  logger.info({ publicDir }, 'Static file serving initialized');

+  function sendHtml(filename: string, reply: any) {
+    const filePath = join(publicDir, filename);
+    if (!existsSync(filePath)) {
+      logger.warn({ path: filePath }, `${filename} not found`);
+      return reply.status(404).send({ error: `${filename} not found` });
+    }
+
+    const content = readFileSync(filePath, 'utf-8');
+    return reply
+      .header('Cache-Control', 'no-cache, no-store, must-revalidate, max-age=0')
+      .header('Pragma', 'no-cache')
+      .header('Expires', '0')
+      .type('text/html')
+      .send(content);
+  }
+
  // Serve root path
  fastify.get('/', async (request, reply) => {
    logger.info({ method: request.method, url: request.url, host: request.hostname }, 'Root path requested');
@ -26,13 +42,47 @@ export async function staticRoute(fastify: FastifyInstance): Promise<void> {

  // Serve /dashboard.html
  fastify.get('/dashboard.html', async (_request, reply) => {
-    const dashboardPath = join(publicDir, 'dashboard.html');
-    if (!existsSync(dashboardPath)) {
-      logger.warn({ path: dashboardPath }, 'dashboard.html not found');
-      return reply.status(404).send({ error: 'dashboard.html not found' });
-    }
-    const content = readFileSync(dashboardPath, 'utf-8');
-    return reply.type('text/html').send(content);
+    return sendHtml('dashboard.html', reply);
+  });
+
+  fastify.get('/dashboard-v2.html', async (_request, reply) => {
+    return sendHtml('dashboard-v2.html', reply);
+  });
+
+  fastify.get('/v2/dashboard', async (_request, reply) => {
+    return sendHtml('dashboard-v2.html', reply);
+  });
+
+  fastify.get('/v2/dashboard/', async (_request, reply) => {
+    return sendHtml('dashboard-v2.html', reply);
+  });
+
+  fastify.get('/v2', async (_request, reply) => {
+    return sendHtml('dashboard-v2.html', reply);
+  });
+
+  fastify.get('/v2/', async (_request, reply) => {
+    return sendHtml('dashboard-v2.html', reply);
+  });
+
+  fastify.get('/dashboard/v2', async (_request, reply) => {
+    return sendHtml('dashboard-v2.html', reply);
+  });
+
+  fastify.get('/dashboard/v2/', async (_request, reply) => {
+    return sendHtml('dashboard-v2.html', reply);
+  });
+
+  fastify.get('/api/dashboard-v2', async (_request, reply) => {
+    return sendHtml('dashboard-v2.html', reply);
+  });
+
+  fastify.get('/api/v2/dashboard', async (_request, reply) => {
+    return sendHtml('dashboard-v2.html', reply);
+  });
+
+  fastify.get('/api/dashboard/v2', async (_request, reply) => {
+    return sendHtml('dashboard-v2.html', reply);
  });

  // Serve /api/dashboard as HTML for compatibility
--- a/packages/gateway/src/security/tls-config.ts
+++ b/packages/gateway/src/security/tls-config.ts
@ -126,10 +126,10 @@ export async function registerHTTPSRedirectMiddleware(server: FastifyInstance) {
 */
 export async function registerSecurityHeadersMiddleware(server: FastifyInstance) {
  server.addHook('onSend', async (request, reply) => {
-    // Content Security Policy - strict, no inline scripts
+    // Content Security Policy for the self-contained dashboard UI.
    reply.header(
      'Content-Security-Policy',
-      "default-src 'self'; script-src 'self'; object-src 'none'; frame-ancestors 'none'; base-uri 'self'; form-action 'self'"
+      "default-src 'self'; script-src 'self' 'unsafe-inline'; style-src 'self' 'unsafe-inline'; object-src 'none'; frame-ancestors 'none'; base-uri 'self'; form-action 'self'"
    );

    // Prevent clickjacking
--- a/packages/gateway/src/server.ts
+++ b/packages/gateway/src/server.ts
@ -77,6 +77,7 @@ async function buildServer() {
      directives: {
        defaultSrc: ["'self'"],
        scriptSrc: ["'self'", "'unsafe-inline'"],
+        styleSrc: ["'self'", "'unsafe-inline'"],
        objectSrc: ["'none'"],
      },
    },
@ -100,7 +101,7 @@ async function buildServer() {

  await server.register(fastifyRateLimit, {
    global: true,
-    max: 100,
+    max: 1000,
    timeWindow: '1 minute',
    keyGenerator: (request) => {
      const caller = (request.headers['x-caller-id'] as string) ?? 'default';
Author	SHA1	Message	Date
Rene Fichtmueller	09165b9bf7	feat: restore workbench v1 and publish wired v2	2026-05-03 09:53:40 +02:00
Rene Fichtmueller	060b846d9b	feat: publish llm gateway v2 dashboard alongside restored workbench	2026-05-01 17:43:32 +02:00