Compare commits
2 Commits
e272105bcf
...
09165b9bf7
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
09165b9bf7 | ||
|
|
060b846d9b |
426
AI_CONTROL_PLANE_SYSTEM_DESIGN.md
Normal file
426
AI_CONTROL_PLANE_SYSTEM_DESIGN.md
Normal file
@ -0,0 +1,426 @@
|
||||
# AI Control Plane System Design
|
||||
|
||||
## 1. Purpose
|
||||
|
||||
LLM Gateway is a deterministic, observable, policy-driven routing layer for AI execution with memory and cost control.
|
||||
|
||||
It routes requests from clients to the right model, provider, agent, or tool based on:
|
||||
|
||||
- policy
|
||||
- cost
|
||||
- availability
|
||||
- context
|
||||
- memory
|
||||
- trust level
|
||||
- historical route success
|
||||
|
||||
It also provides:
|
||||
|
||||
- full observability through immutable receipts
|
||||
- reproducible AI runs
|
||||
- shared memory persistence
|
||||
- route memory
|
||||
- token and cost optimization
|
||||
|
||||
## 2. High-Level Architecture
|
||||
|
||||
```text
|
||||
Input Layer
|
||||
clients, APIs, MCP, internal connectors
|
||||
|
|
||||
v
|
||||
Control Plane
|
||||
trust routing, policy, compression, memory, provider routing
|
||||
|
|
||||
v
|
||||
Execution Layer
|
||||
local models, external providers, tools, services
|
||||
|
|
||||
v
|
||||
Output
|
||||
response to caller
|
||||
|
|
||||
v
|
||||
Receipts + Memory Update
|
||||
|
||||
Side System:
|
||||
Memory Layer
|
||||
global memory, project memory, route memory, semantic cache
|
||||
```
|
||||
|
||||
## 3. Components
|
||||
|
||||
### 3.1 Client Entry
|
||||
|
||||
Clients connect via API, MCP, OpenAI-compatible endpoints, or internal connectors.
|
||||
|
||||
Supported client targets:
|
||||
|
||||
- Codex
|
||||
- Claude Code
|
||||
- ChatGPT
|
||||
- Cursor
|
||||
- VS Code and Continue-style IDEs
|
||||
- automation pipelines
|
||||
- n8n
|
||||
- internal services
|
||||
|
||||
Each request should include:
|
||||
|
||||
- payload: prompt, input, files, tool call, or task
|
||||
- metadata: user, project, agent, task type
|
||||
- optional routing hints
|
||||
- optional policy hints
|
||||
|
||||
### 3.2 Trust Router
|
||||
|
||||
The Trust Router is the first decision point.
|
||||
|
||||
Responsibilities:
|
||||
|
||||
- validate client identity
|
||||
- assign trust level
|
||||
- classify request type
|
||||
- classify data sensitivity
|
||||
- apply initial routing hints
|
||||
- attach enriched request context
|
||||
|
||||
Example classification labels:
|
||||
|
||||
- code
|
||||
- infra
|
||||
- legal
|
||||
- security
|
||||
- general
|
||||
- document
|
||||
- automation
|
||||
|
||||
Output:
|
||||
|
||||
- enriched request context
|
||||
- trust score
|
||||
- sensitivity label
|
||||
- classification label
|
||||
|
||||
### 3.3 Policy Engine
|
||||
|
||||
The Policy Engine is the core decision system.
|
||||
|
||||
It evaluates:
|
||||
|
||||
- data sensitivity
|
||||
- allowed providers
|
||||
- allowed models
|
||||
- allowed tools
|
||||
- cost constraints
|
||||
- project rules
|
||||
- compliance rules
|
||||
- offline/simulation/live mode
|
||||
|
||||
Example policies:
|
||||
|
||||
- never send legal data to public APIs
|
||||
- prefer local models for internal code
|
||||
- use external models only if confidence is below a threshold
|
||||
- block requests containing secrets
|
||||
- require admin override for production deployment tools
|
||||
|
||||
Output:
|
||||
|
||||
- allowed routes
|
||||
- blocked routes
|
||||
- required redactions
|
||||
- execution constraints
|
||||
- policy decision log
|
||||
|
||||
### 3.4 Memory Query
|
||||
|
||||
Memory is queried before compression and execution.
|
||||
|
||||
Memory sources:
|
||||
|
||||
- project memory
|
||||
- global memory
|
||||
- route memory
|
||||
- semantic cache
|
||||
- handoffs
|
||||
- receipts
|
||||
- reproducible runs
|
||||
|
||||
Output:
|
||||
|
||||
- relevant memory context
|
||||
- prior decisions
|
||||
- route hints
|
||||
- cache candidates
|
||||
|
||||
### 3.5 Compression Engine
|
||||
|
||||
The Compression Engine optimizes request and memory context before execution.
|
||||
|
||||
Functions:
|
||||
|
||||
- token reduction
|
||||
- context deduplication
|
||||
- semantic summarization
|
||||
- cache lookup
|
||||
- prompt/context packaging
|
||||
- token budget enforcement
|
||||
|
||||
Input:
|
||||
|
||||
- raw request
|
||||
- policy constraints
|
||||
- memory context
|
||||
- target model context budget
|
||||
|
||||
Output:
|
||||
|
||||
- compressed payload
|
||||
- token metrics before and after
|
||||
- cache hit or miss
|
||||
- compression receipt data
|
||||
|
||||
### 3.6 Provider Router
|
||||
|
||||
The Provider Router makes the final execution decision.
|
||||
|
||||
It selects:
|
||||
|
||||
- local model
|
||||
- external provider
|
||||
- AI client/agent
|
||||
- tool execution
|
||||
- fallback route
|
||||
|
||||
Criteria:
|
||||
|
||||
- policy constraints
|
||||
- trust level
|
||||
- cost
|
||||
- latency
|
||||
- availability
|
||||
- model capability
|
||||
- route memory
|
||||
- benchmark results
|
||||
- agent reputation
|
||||
|
||||
Output:
|
||||
|
||||
- selected execution target
|
||||
- fallback routes
|
||||
- route explanation
|
||||
|
||||
### 3.7 Execution Layer
|
||||
|
||||
The Execution Layer handles actual processing.
|
||||
|
||||
Execution target types:
|
||||
|
||||
- local models such as Ollama, LM Studio, LocalAI, llama.cpp, vLLM
|
||||
- external APIs such as OpenAI, Anthropic, Mistral, Groq, OpenRouter
|
||||
- AI clients such as Claude Code, Codex, Cursor, ChatGPT adapters
|
||||
- tools, scripts, workflows, and internal services
|
||||
|
||||
Execution returns:
|
||||
|
||||
- raw response
|
||||
- latency
|
||||
- token usage
|
||||
- provider metadata
|
||||
- errors
|
||||
- tool call results
|
||||
|
||||
### 3.8 Receipt Engine
|
||||
|
||||
The Receipt Engine creates an immutable trace for each request.
|
||||
|
||||
Receipts include:
|
||||
|
||||
- request id
|
||||
- input summary or redacted input
|
||||
- trust decisions
|
||||
- policy decisions
|
||||
- memory refs
|
||||
- compression results
|
||||
- selected model/provider/tool
|
||||
- fallback chain
|
||||
- response summary or full response depending on policy
|
||||
- token usage
|
||||
- cost estimate
|
||||
- timestamps
|
||||
- errors
|
||||
- blocked routes
|
||||
|
||||
Receipts are immutable and stored.
|
||||
|
||||
### 3.9 Memory Layer
|
||||
|
||||
Memory is separate from execution but connected to routing and compression.
|
||||
|
||||
Memory types:
|
||||
|
||||
1. Project memory
|
||||
- task history
|
||||
- decisions
|
||||
- context
|
||||
- handoffs
|
||||
|
||||
2. Global memory
|
||||
- shared knowledge
|
||||
- user/team preferences
|
||||
- reusable runbooks
|
||||
|
||||
3. Route memory
|
||||
- routing decisions
|
||||
- success and failure patterns
|
||||
- optimization feedback
|
||||
|
||||
4. Semantic cache
|
||||
- previous responses
|
||||
- embedding lookup
|
||||
- prompt/result reuse
|
||||
|
||||
Memory is:
|
||||
|
||||
- append-only by default
|
||||
- queryable
|
||||
- versioned where possible
|
||||
- used during routing and compression
|
||||
|
||||
### 3.10 Route Reflector Memory
|
||||
|
||||
Route Reflector Memory is specialized route memory inspired by BGP route reflectors.
|
||||
|
||||
Functions:
|
||||
|
||||
- learns optimal AI routes
|
||||
- shares routing knowledge across clients
|
||||
- improves future routing decisions
|
||||
- records fallback success and failures
|
||||
- contributes to Provider Router decisions
|
||||
|
||||
Examples:
|
||||
|
||||
- code debugging works best through Codex plus local validation
|
||||
- private infra diagnostics should route to local models
|
||||
- long-form reasoning performs better on selected external models
|
||||
- JSON extraction for project X has best success on model Y
|
||||
|
||||
## 4. Data Flow
|
||||
|
||||
1. Client sends request.
|
||||
2. Trust Router classifies request and assigns trust.
|
||||
3. Policy Engine filters allowed routes.
|
||||
4. Memory Layer is queried for context and prior route knowledge.
|
||||
5. Compression Engine optimizes payload.
|
||||
6. Provider Router selects execution target and fallback chain.
|
||||
7. Execution Layer processes request.
|
||||
8. Response is returned to client.
|
||||
9. Receipt Engine generates immutable receipt.
|
||||
10. Memory Layer is updated with outcome.
|
||||
11. Route Reflector Memory updates routing knowledge.
|
||||
|
||||
## 5. Modes Of Operation
|
||||
|
||||
### Live Mode
|
||||
|
||||
- real execution
|
||||
- full routing active
|
||||
- receipts stored
|
||||
- memory updated
|
||||
|
||||
### Simulation Mode
|
||||
|
||||
- no real execution
|
||||
- shows trust decisions
|
||||
- shows policy decisions
|
||||
- shows selected route and fallbacks
|
||||
- estimates cost and tokens
|
||||
- useful for testing policies
|
||||
|
||||
### Offline Mode
|
||||
|
||||
- only local models allowed
|
||||
- no external provider calls
|
||||
- remote sync disabled unless explicitly allowed
|
||||
- receipts marked as offline
|
||||
|
||||
## 6. Control Functions
|
||||
|
||||
The system supports:
|
||||
|
||||
- trace request
|
||||
- replay request
|
||||
- force route
|
||||
- override policy as admin
|
||||
- inspect receipts
|
||||
- inspect memory
|
||||
- simulate routing
|
||||
- compare routes
|
||||
- inspect provider availability
|
||||
- inspect route memory
|
||||
|
||||
## 7. Storage
|
||||
|
||||
Required storage components:
|
||||
|
||||
- receipts database: immutable logs
|
||||
- memory database: structured + vector
|
||||
- policy definitions
|
||||
- routing history
|
||||
- route reflector memory
|
||||
- semantic cache
|
||||
- reproducible run artifacts
|
||||
|
||||
Recommended default:
|
||||
|
||||
- SQLite for personal mode
|
||||
- Postgres plus pgvector for team/server mode
|
||||
- Git/Gitea as durable memory sync and audit transport
|
||||
|
||||
## 8. Metrics
|
||||
|
||||
System tracks:
|
||||
|
||||
- token usage
|
||||
- compression ratio
|
||||
- cache hit rate
|
||||
- latency per provider
|
||||
- cost per request
|
||||
- routing success rate
|
||||
- fallback rate
|
||||
- trust level distribution
|
||||
- blocked route count
|
||||
- policy override count
|
||||
- agent reputation
|
||||
- benchmark scores
|
||||
|
||||
## 9. Security Model
|
||||
|
||||
- strict policy enforcement before external calls
|
||||
- data classification at entry
|
||||
- local-first routing possible
|
||||
- no sensitive data leaves system if blocked by policy
|
||||
- no secret sync to memory
|
||||
- audit trail via receipts
|
||||
- consent ledger for tool, memory, and provider permissions
|
||||
- safe config writer for external tool setup
|
||||
|
||||
## 10. Extensibility
|
||||
|
||||
The system supports:
|
||||
|
||||
- new providers
|
||||
- new local models
|
||||
- new tools
|
||||
- new MCP resources
|
||||
- new policy rules
|
||||
- custom routing logic
|
||||
- custom memory backends
|
||||
- custom benchmarks
|
||||
- custom data source connectors
|
||||
|
||||
## 11. Core Idea
|
||||
|
||||
LLM Gateway is a deterministic, observable, policy-driven routing layer for AI execution with memory and cost control.
|
||||
1270
OPEN_SOURCE_BLUEPRINT.md
Normal file
1270
OPEN_SOURCE_BLUEPRINT.md
Normal file
File diff suppressed because it is too large
Load Diff
66
OPEN_SOURCE_FEATURE_MATRIX.md
Normal file
66
OPEN_SOURCE_FEATURE_MATRIX.md
Normal file
@ -0,0 +1,66 @@
|
||||
# Open Source Feature Matrix
|
||||
|
||||
## Legend
|
||||
|
||||
- `ready`: exists and is usable with cleanup
|
||||
- `partial`: exists but needs extraction/hardening
|
||||
- `missing`: must be built
|
||||
|
||||
| Feature | Current | OSS Target | Priority |
|
||||
|---|---|---|---:|
|
||||
| Fastify gateway | ready | keep | P0 |
|
||||
| Client SDK | ready | keep + docs | P0 |
|
||||
| Health checks | ready | keep + doctor | P0 |
|
||||
| Dashboard | partial | topology-first app | P1 |
|
||||
| Ollama routing | ready | generic local provider | P0 |
|
||||
| LM Studio detection | missing | discovery provider | P0 |
|
||||
| LocalAI/llama.cpp/vLLM detection | missing | discovery provider | P0 |
|
||||
| Hosted provider registry | partial | provider adapters + consent | P0 |
|
||||
| OpenAI-compatible API | partial | first-class adapter | P0 |
|
||||
| MCP server | missing | first-class | P0 |
|
||||
| Claude Code integration | partial | MCP + bridge | P0 |
|
||||
| Codex integration | partial | MCP + LSP | P0 |
|
||||
| ChatGPT integration | missing | exports/import + adapter docs | P1 |
|
||||
| Cursor/VS Code integration | missing | safe config writer | P1 |
|
||||
| n8n integration | missing | workflow pack | P1 |
|
||||
| Trust Router | missing | core | P0 |
|
||||
| Policy Engine | missing | provider/model/tool constraints | P0 |
|
||||
| Provider Router | partial | final route + fallback decision | P0 |
|
||||
| Context Receipt | missing | core | P0 |
|
||||
| Shared Gitea Memory | missing | core | P0 |
|
||||
| Route Reflector Memory | missing | routing memory | P0 |
|
||||
| AI Handoff Protocol | partial | core | P0 |
|
||||
| Consent Ledger | missing | core | P0 |
|
||||
| Setup Doctor | missing | CLI + UI | P0 |
|
||||
| Safe Config Writer | missing | CLI + UI | P0 |
|
||||
| Offline Mode | missing | policy mode | P0 |
|
||||
| Simulation Mode | missing | dry-run routing decisions | P0 |
|
||||
| Compression/token saving | partial | first-class engine | P1 |
|
||||
| Semantic cache | missing | optional | P1 |
|
||||
| Capability Benchmark Lab | missing | routing input | P1 |
|
||||
| Agent Reputation Score | missing | routing input | P1 |
|
||||
| Reproducible Runs | missing | audit/eval | P1 |
|
||||
| Integration Marketplace | missing | local catalog | P1 |
|
||||
| Data connectors | missing | scoped connectors | P1 |
|
||||
| Team Mode | missing | RBAC/admin | P2 |
|
||||
| Prompt/agent versioning | partial | Git-backed | P2 |
|
||||
| Import wizard | missing | guided migration | P2 |
|
||||
|
||||
## Public Positioning
|
||||
|
||||
Do not position this as another LiteLLM clone.
|
||||
|
||||
Positioning:
|
||||
|
||||
> Adaptive LLM Gateway discovers your local and hosted AI stack, connects it through a secure MCP and OpenAI-compatible control plane, and gives every agent shared memory, policy, receipts, compression, and routing.
|
||||
|
||||
Core differentiators:
|
||||
|
||||
- AI environment discovery
|
||||
- Trust Router
|
||||
- Context Receipts
|
||||
- Shared Git/Gitea Memory
|
||||
- AI Handoff Protocol
|
||||
- Consent Ledger
|
||||
- Reproducible AI Runs
|
||||
- model and agent benchmark learning
|
||||
133
OPEN_SOURCE_GAP_ANALYSIS.md
Normal file
133
OPEN_SOURCE_GAP_ANALYSIS.md
Normal file
@ -0,0 +1,133 @@
|
||||
# Open Source Gap Analysis
|
||||
|
||||
This document maps the current Context-X LLM Gateway to the planned open-source Adaptive LLM Gateway.
|
||||
|
||||
## Current Strengths
|
||||
|
||||
Already present in the repository:
|
||||
|
||||
| Area | Current State | Notes |
|
||||
|---|---|---|
|
||||
| Gateway API | Present | Fastify gateway in `packages/gateway`. |
|
||||
| Completion API | Present | Main route: `/v1/completion`. |
|
||||
| Classification | Present | `/v1/classify` and pre-classifier pipeline. |
|
||||
| Batch jobs | Present | `/v1/batch` and PgBoss queue integration. |
|
||||
| Health checks | Present | `/health`, `/health/live`, `/health/ready`. |
|
||||
| Metrics | Present | Prometheus metrics and dashboard metrics. |
|
||||
| Dashboard | Present | Operational dashboard exists in `packages/gateway/public`. |
|
||||
| Routing rules | Present | YAML routing rules and model tiers. |
|
||||
| Local model routing | Present | Ollama-based routing and fallback chains. |
|
||||
| Hosted providers | Partial | External provider registry exists. Needs OSS cleanup and discovery. |
|
||||
| Cost tracking | Present | Cost analytics, token tracking, cost stream. |
|
||||
| Compression accounting | Partial | TokenVault/cost hooks exist. Needs first-class compression engine. |
|
||||
| Learning engine | Present | Learning cycles, model performance tracking, fine-tuner package. |
|
||||
| Client SDK | Present | `@llm-gateway/client`. |
|
||||
| OpenAI compatibility | Partial | `chatgpt-api-adapter` and `openai-bridge` exist. Needs clean OSS path. |
|
||||
| Codex integration | Partial | `packages/codex-lsp-adapter` exists. Needs production hardening. |
|
||||
| Claude Code integration | Partial | `packages/claude-code-bridge` exists. Needs MCP-first flow. |
|
||||
| LightRAG/RAG | Present | LightRAG sidecar exists. Needs generic connector story. |
|
||||
| Handoff sync | Partial | `sync/` handoff folder exists. Needs protocol and tools. |
|
||||
| Gitea use | Present internally | Needs generic Gitea memory backend. |
|
||||
|
||||
## Missing For Open Source
|
||||
|
||||
These features need to be added or extracted:
|
||||
|
||||
| Feature | Status | Priority | Target Package/Area |
|
||||
|---|---|---:|---|
|
||||
| First-run setup wizard | Missing | P0 | `packages/cli`, `packages/discovery` |
|
||||
| Local AI discovery | Missing | P0 | `packages/discovery` |
|
||||
| Public provider discovery | Partial | P0 | `packages/discovery`, `packages/providers` |
|
||||
| AI client detection | Missing | P0 | `packages/discovery` |
|
||||
| MCP server | Missing | P0 | `packages/mcp-server` |
|
||||
| Trust Router | Missing | P0 | `packages/trust-router` |
|
||||
| Consent Ledger | Missing | P0 | `packages/consent-ledger` |
|
||||
| Shared Gitea Memory | Missing | P0 | `packages/memory-sync` |
|
||||
| Context Receipt | Missing | P0 | `packages/context-receipts` |
|
||||
| AI Handoff Protocol | Partial | P0 | `packages/handoff` |
|
||||
| Safe Config Writer | Missing | P0 | `packages/config-writer` |
|
||||
| Setup Doctor | Missing | P0 | `packages/doctor` |
|
||||
| Offline Mode | Missing | P0 | gateway config/policy |
|
||||
| Capability Benchmark Lab | Missing | P1 | `packages/benchmark-lab` |
|
||||
| Agent Reputation Score | Missing | P1 | `packages/agent-reputation` |
|
||||
| Reproducible Runs | Missing | P1 | `packages/run-ledger` |
|
||||
| Visual Topology Map | Missing | P1 | dashboard UI/API |
|
||||
| Integration Marketplace | Missing | P1 | `packages/integrations` + UI |
|
||||
| Data source connectors | Missing | P1 | `packages/connectors` |
|
||||
| Context Compression Engine | Partial | P1 | `packages/context-compression` |
|
||||
| Semantic cache | Missing/mentioned | P1 | `packages/cache` |
|
||||
| Team mode | Missing | P2 | auth/policy/admin UI |
|
||||
| Prompt/agent versioning | Partial | P2 | memory/git/prompt registry |
|
||||
| Migration/import wizard | Missing | P2 | `packages/import-wizard` |
|
||||
|
||||
## Context-X Assumptions To Remove
|
||||
|
||||
Before public release, remove or move behind an example profile:
|
||||
|
||||
- hardcoded `context-x.org` domains
|
||||
- hardcoded `fichtmueller.org` Ollama endpoint
|
||||
- Erik-specific paths such as `/opt/llm-gateway`
|
||||
- private project callers and templates as defaults
|
||||
- internal IP assumptions
|
||||
- private training data
|
||||
- private bridge assumptions
|
||||
- secret-looking examples
|
||||
- Context-X branding as default OSS UI
|
||||
|
||||
Keep them as:
|
||||
|
||||
```text
|
||||
examples/profiles/context-x/
|
||||
```
|
||||
|
||||
or as a private deployment overlay.
|
||||
|
||||
## Proposed New Packages
|
||||
|
||||
```text
|
||||
packages/
|
||||
cli/ # init, doctor, integrate, import, mode
|
||||
discovery/ # detects models, clients, runtimes, providers
|
||||
mcp-server/ # MCP tools/resources
|
||||
trust-router/ # sensitivity + policy routing
|
||||
consent-ledger/ # append-only permissions ledger
|
||||
memory-sync/ # local/git/gitea memory backend
|
||||
handoff/ # AI Handoff Protocol schema + helpers
|
||||
context-receipts/ # receipts and audit artifacts
|
||||
config-writer/ # safe config diffs and rollback
|
||||
benchmark-lab/ # model/agent benchmark suite
|
||||
agent-reputation/ # agent scorecards
|
||||
run-ledger/ # reproducible AI runs
|
||||
context-compression/ # compression + token budget manager
|
||||
integrations/ # integration catalog manifests
|
||||
connectors/ # data source connectors
|
||||
import-wizard/ # migration/import helpers
|
||||
```
|
||||
|
||||
## MVP Cut
|
||||
|
||||
The first useful OSS release should not try to ship everything.
|
||||
|
||||
MVP must include:
|
||||
|
||||
- CLI with `init`, `doctor`, `start`, `integrate`
|
||||
- local AI discovery: Ollama + LM Studio + OpenAI-compatible `/v1/models`
|
||||
- provider env discovery with consent
|
||||
- MCP server with safe gateway and memory tools
|
||||
- Trust Router with four trust levels
|
||||
- Gitea/Git memory backend
|
||||
- Context Receipts
|
||||
- AI Handoff Protocol
|
||||
- Safe Config Writer
|
||||
- Offline Mode
|
||||
- basic topology dashboard
|
||||
|
||||
MVP can defer:
|
||||
|
||||
- full benchmark lab
|
||||
- team RBAC
|
||||
- all data connectors
|
||||
- full import wizard
|
||||
- advanced compression comparisons
|
||||
- agent reputation automation
|
||||
|
||||
212
OPEN_SOURCE_IMPLEMENTATION_ROADMAP.md
Normal file
212
OPEN_SOURCE_IMPLEMENTATION_ROADMAP.md
Normal file
@ -0,0 +1,212 @@
|
||||
# Open Source Implementation Roadmap
|
||||
|
||||
## Phase 0: Sanitize And Productize
|
||||
|
||||
Goal: make the current codebase safe to publish and understandable outside Context-X.
|
||||
|
||||
Tasks:
|
||||
|
||||
- Add OSS name and package naming decision.
|
||||
- Move Context-X-only files into `examples/profiles/context-x/`.
|
||||
- Add `.env.example` without private domains or secrets.
|
||||
- Replace hardcoded defaults with generated config.
|
||||
- Add license, contributing guide, security policy, and public README.
|
||||
- Run secret scan and dependency/license audit.
|
||||
- Decide which training data can be published.
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- Fresh clone can install without private services.
|
||||
- No private domains or internal IPs are required for default startup.
|
||||
- Public README explains local-only setup.
|
||||
|
||||
## Phase 1: Adaptive Init
|
||||
|
||||
Goal: detect the user's AI environment and create config.
|
||||
|
||||
Packages:
|
||||
|
||||
- `packages/cli`
|
||||
- `packages/discovery`
|
||||
- `packages/config-writer`
|
||||
|
||||
Commands:
|
||||
|
||||
```bash
|
||||
adaptive-llm-gateway init
|
||||
adaptive-llm-gateway doctor
|
||||
adaptive-llm-gateway integrate <target>
|
||||
adaptive-llm-gateway mode offline
|
||||
adaptive-llm-gateway simulate <request-file>
|
||||
```
|
||||
|
||||
Detection targets:
|
||||
|
||||
- Ollama
|
||||
- LM Studio
|
||||
- LocalAI
|
||||
- llama.cpp server
|
||||
- vLLM
|
||||
- Open WebUI
|
||||
- OpenAI-compatible endpoints
|
||||
- OpenAI/Anthropic/Groq/Mistral/OpenRouter env keys
|
||||
- Claude Code
|
||||
- Codex
|
||||
- Cursor
|
||||
- VS Code
|
||||
- Continue.dev
|
||||
- n8n
|
||||
- Docker containers
|
||||
- Git/Gitea availability
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- `init` writes `~/.adaptive-llm-gateway/config.yaml`.
|
||||
- No external integration is enabled without approval.
|
||||
- `doctor` reports actionable health and setup status.
|
||||
|
||||
## Phase 2: Trust, Consent, Receipts
|
||||
|
||||
Goal: every request goes through policy and produces an audit artifact.
|
||||
|
||||
Packages:
|
||||
|
||||
- `packages/trust-router`
|
||||
- `packages/policy-engine`
|
||||
- `packages/consent-ledger`
|
||||
- `packages/context-receipts`
|
||||
- `packages/run-ledger`
|
||||
- `packages/provider-router`
|
||||
|
||||
Features:
|
||||
|
||||
- four trust levels: public, internal, confidential, secret
|
||||
- local-only/offline routing mode
|
||||
- simulation mode with no execution
|
||||
- provider router route constraints and fallbacks
|
||||
- append-only consent ledger
|
||||
- receipt for context used, blocked, redacted, routed
|
||||
- reproducible run folder
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- External providers are blocked for confidential/secret data by default.
|
||||
- Receipts can be viewed from CLI and dashboard.
|
||||
- Consent changes are append-only and reversible.
|
||||
|
||||
## Phase 3: Shared Memory And MCP
|
||||
|
||||
Goal: make the gateway the shared memory and tool layer for all AI clients.
|
||||
|
||||
Packages:
|
||||
|
||||
- `packages/memory-sync`
|
||||
- `packages/handoff`
|
||||
- `packages/mcp-server`
|
||||
- `packages/route-reflector-memory`
|
||||
|
||||
Features:
|
||||
|
||||
- local memory repo
|
||||
- Git/Gitea sync
|
||||
- typed memory folders
|
||||
- MCP tools for memory and gateway calls
|
||||
- AI Handoff Protocol
|
||||
- Route Reflector Memory for routing outcomes
|
||||
- conflict-safe append-first writes
|
||||
|
||||
MCP tools:
|
||||
|
||||
- `gateway.complete`
|
||||
- `gateway.chat`
|
||||
- `gateway.health`
|
||||
- `gateway.route_preview`
|
||||
- `memory.search`
|
||||
- `memory.read`
|
||||
- `memory.write`
|
||||
- `memory.append_session`
|
||||
- `memory.record_decision`
|
||||
- `memory.record_task`
|
||||
- `memory.pull`
|
||||
- `memory.push`
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- Claude Code and Codex can access the same memory through MCP.
|
||||
- Handoffs are stored in Git/Gitea.
|
||||
- Memory sync refuses to commit secrets.
|
||||
|
||||
## Phase 4: Compression And Knowledge
|
||||
|
||||
Goal: reduce token use and retrieve only the right context.
|
||||
|
||||
Packages:
|
||||
|
||||
- `packages/context-compression`
|
||||
- `packages/connectors`
|
||||
- `packages/cache`
|
||||
|
||||
Features:
|
||||
|
||||
- token budget manager
|
||||
- session compaction
|
||||
- repo/doc summarization
|
||||
- memory dedupe
|
||||
- semantic cache
|
||||
- SQLite vector default
|
||||
- Postgres/Qdrant optional
|
||||
- approved data source connectors
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- Context packages include budget, source refs, and compression stats.
|
||||
- Receipts show compressed-from and final token counts.
|
||||
- Indexing requires explicit allowed roots.
|
||||
|
||||
## Phase 5: Benchmarking And Reputation
|
||||
|
||||
Goal: route based on evidence instead of static assumptions.
|
||||
|
||||
Packages:
|
||||
|
||||
- `packages/benchmark-lab`
|
||||
- `packages/agent-reputation`
|
||||
|
||||
Features:
|
||||
|
||||
- model capability tests
|
||||
- agent scorecards
|
||||
- latency/cost/quality tracking
|
||||
- JSON reliability test
|
||||
- code patch/test benchmark
|
||||
- local vs hosted comparison
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- Trust Router can use benchmark scores.
|
||||
- Dashboard shows model and agent strengths.
|
||||
- Routing decisions explain benchmark influence.
|
||||
|
||||
## Phase 6: Product UI
|
||||
|
||||
Goal: turn the operational dashboard into a usable OSS app.
|
||||
|
||||
UI areas:
|
||||
|
||||
- Topology
|
||||
- Models
|
||||
- Agents
|
||||
- Memory
|
||||
- Policies
|
||||
- Receipts
|
||||
- Benchmarks
|
||||
- Costs
|
||||
- Integrations
|
||||
- Doctor
|
||||
- Settings
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- First screen is topology/status.
|
||||
- User can enable integrations from UI with diff preview.
|
||||
- User can inspect receipts and memory sync status.
|
||||
3118
packages/gateway/public/dashboard-v2.html
Normal file
3118
packages/gateway/public/dashboard-v2.html
Normal file
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
87
packages/gateway/src/modules/admin-auth.ts
Normal file
87
packages/gateway/src/modules/admin-auth.ts
Normal file
@ -0,0 +1,87 @@
|
||||
import type { FastifyReply, FastifyRequest } from 'fastify';
|
||||
import { timingSafeEqual } from 'crypto';
|
||||
|
||||
const TOKEN_ENV_KEYS = ['DASHBOARD_AUTH_TOKEN', 'LLM_GATEWAY_ADMIN_TOKEN', 'ADMIN_TOKEN'] as const;
|
||||
|
||||
function configuredToken(): string | undefined {
|
||||
for (const key of TOKEN_ENV_KEYS) {
|
||||
const value = process.env[key]?.trim();
|
||||
if (value) return value;
|
||||
}
|
||||
return undefined;
|
||||
}
|
||||
|
||||
function safeEqual(left: string, right: string): boolean {
|
||||
const leftBuffer = Buffer.from(left);
|
||||
const rightBuffer = Buffer.from(right);
|
||||
if (leftBuffer.length !== rightBuffer.length) return false;
|
||||
return timingSafeEqual(leftBuffer, rightBuffer);
|
||||
}
|
||||
|
||||
function tokenFromAuthorizationHeader(header: string | undefined): string | undefined {
|
||||
if (!header) return undefined;
|
||||
const [scheme, value] = header.split(/\s+/, 2);
|
||||
if (!scheme || !value) return undefined;
|
||||
|
||||
if (scheme.toLowerCase() === 'bearer') return value.trim();
|
||||
|
||||
if (scheme.toLowerCase() === 'basic') {
|
||||
try {
|
||||
const decoded = Buffer.from(value, 'base64').toString('utf8');
|
||||
const separator = decoded.indexOf(':');
|
||||
return separator >= 0 ? decoded.slice(separator + 1).trim() : decoded.trim();
|
||||
} catch {
|
||||
return undefined;
|
||||
}
|
||||
}
|
||||
|
||||
return undefined;
|
||||
}
|
||||
|
||||
function tokenFromRequest(request: FastifyRequest): string | undefined {
|
||||
const explicit = request.headers['x-dashboard-token'];
|
||||
if (typeof explicit === 'string' && explicit.trim()) return explicit.trim();
|
||||
return tokenFromAuthorizationHeader(request.headers.authorization);
|
||||
}
|
||||
|
||||
export function isDashboardAuthConfigured(): boolean {
|
||||
return !!configuredToken();
|
||||
}
|
||||
|
||||
function isLocalDevelopmentRequest(request: FastifyRequest): boolean {
|
||||
if (process.env['NODE_ENV'] === 'production') return false;
|
||||
const host = request.hostname || request.headers.host || '';
|
||||
return host.startsWith('127.0.0.1') || host.startsWith('localhost') || host.startsWith('[::1]');
|
||||
}
|
||||
|
||||
export async function requireDashboardAuth(request: FastifyRequest, reply: FastifyReply): Promise<FastifyReply | void> {
|
||||
if (isLocalDevelopmentRequest(request)) return;
|
||||
|
||||
const expected = configuredToken();
|
||||
if (!expected) {
|
||||
return reply.status(503).send({
|
||||
statusCode: 503,
|
||||
error: 'Dashboard Auth Not Configured',
|
||||
message: 'Set DASHBOARD_AUTH_TOKEN before exposing dashboard data or settings.',
|
||||
});
|
||||
}
|
||||
|
||||
const received = tokenFromRequest(request);
|
||||
if (!received || !safeEqual(received, expected)) {
|
||||
reply.header('WWW-Authenticate', 'Bearer realm="llm-gateway-dashboard"');
|
||||
return reply.status(401).send({
|
||||
statusCode: 401,
|
||||
error: 'Unauthorized',
|
||||
message: 'Dashboard token required.',
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
export function dashboardAuthStatus(request: FastifyRequest): { configured: boolean; authenticated: boolean } {
|
||||
if (isLocalDevelopmentRequest(request)) return { configured: true, authenticated: true };
|
||||
|
||||
const expected = configuredToken();
|
||||
if (!expected) return { configured: false, authenticated: false };
|
||||
const received = tokenFromRequest(request);
|
||||
return { configured: true, authenticated: !!received && safeEqual(received, expected) };
|
||||
}
|
||||
246
packages/gateway/src/modules/bridge-spawner.ts
Normal file
246
packages/gateway/src/modules/bridge-spawner.ts
Normal file
@ -0,0 +1,246 @@
|
||||
/**
|
||||
* Bridge Spawner
|
||||
*
|
||||
* Auto-starts inline HTTP bridges for detected CLI subscriptions. Each bridge
|
||||
* exposes a `POST /api/generate` endpoint that the gateway can call as a regular
|
||||
* external provider. Bridges run in-process to avoid the overhead of spawning
|
||||
* separate Node processes — they listen on a dedicated port per subscription.
|
||||
*/
|
||||
|
||||
import { execFile } from 'child_process';
|
||||
import { createServer, type Server } from 'http';
|
||||
import { logger } from '../observability/logger.js';
|
||||
import type { SubscriptionDescriptor, SubscriptionStatus } from './subscription-discovery.js';
|
||||
|
||||
interface RunningBridge {
|
||||
descriptor: SubscriptionDescriptor;
|
||||
server: Server;
|
||||
port: number;
|
||||
url: string;
|
||||
startedAt: Date;
|
||||
}
|
||||
|
||||
const runningBridges = new Map<string, RunningBridge>();
|
||||
|
||||
/**
|
||||
* Run a CLI tool with stdin-piped prompt, return stdout content.
|
||||
* Generic implementation that all inline bridges share.
|
||||
*/
|
||||
async function runCli(
|
||||
command: string,
|
||||
args: readonly string[],
|
||||
prompt: string,
|
||||
timeoutMs: number = 300_000
|
||||
): Promise<{ success: boolean; content?: string; error?: string }> {
|
||||
return new Promise((resolve) => {
|
||||
try {
|
||||
const child = execFile(
|
||||
command,
|
||||
args as string[],
|
||||
{ timeout: timeoutMs, maxBuffer: 10 * 1024 * 1024 },
|
||||
(err, stdout) => {
|
||||
if (err) {
|
||||
resolve({ success: false, error: err.message.slice(0, 500) });
|
||||
} else {
|
||||
resolve({ success: true, content: stdout.trim() });
|
||||
}
|
||||
}
|
||||
);
|
||||
if (child.stdin) {
|
||||
child.stdin.write(prompt);
|
||||
child.stdin.end();
|
||||
}
|
||||
} catch (err) {
|
||||
resolve({ success: false, error: err instanceof Error ? err.message : String(err) });
|
||||
}
|
||||
});
|
||||
}
|
||||
|
||||
/**
|
||||
* Build the CLI invocation for a given subscription.
|
||||
*/
|
||||
function buildCliInvocation(desc: SubscriptionDescriptor, model?: string): { cmd: string; args: string[] } {
|
||||
switch (desc.bridgeImplementation) {
|
||||
case 'inline-claude': {
|
||||
const args = ['--print', '--output-format', 'text'];
|
||||
if (model) args.push('--model', model);
|
||||
return { cmd: 'claude', args };
|
||||
}
|
||||
case 'inline-copilot': {
|
||||
// gh copilot suggest is interactive; we use the OpenAI-compatible copilot-api proxy if available.
|
||||
return { cmd: 'gh', args: ['copilot', 'suggest', '--shell'] };
|
||||
}
|
||||
case 'inline-openai': {
|
||||
// Generic OpenAI-compatible CLI (chatgpt-cli, gemini-cli with OpenAI compat)
|
||||
return { cmd: desc.command, args: model ? ['--model', model] : [] };
|
||||
}
|
||||
case 'external-codex': {
|
||||
// codex CLI: read prompt from stdin
|
||||
return { cmd: 'codex', args: model ? ['--model', model] : [] };
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Spawn an inline HTTP bridge for a subscription. Returns the URL the gateway
|
||||
* should use to talk to it. Idempotent — calling twice returns the same bridge.
|
||||
*/
|
||||
export function spawnBridge(desc: SubscriptionDescriptor): Promise<RunningBridge> {
|
||||
const existing = runningBridges.get(desc.id);
|
||||
if (existing) {
|
||||
return Promise.resolve(existing);
|
||||
}
|
||||
|
||||
return new Promise((resolve, reject) => {
|
||||
const server = createServer(async (req, res) => {
|
||||
res.setHeader('Content-Type', 'application/json');
|
||||
res.setHeader('Access-Control-Allow-Origin', '*');
|
||||
|
||||
if (req.method === 'GET' && req.url === '/health') {
|
||||
const current = runningBridges.get(desc.id);
|
||||
res.writeHead(200);
|
||||
res.end(
|
||||
JSON.stringify({
|
||||
status: 'ok',
|
||||
subscription: desc.id,
|
||||
label: desc.label,
|
||||
command: desc.command,
|
||||
uptimeSeconds: current ? Math.floor((Date.now() - current.startedAt.getTime()) / 1000) : 0,
|
||||
})
|
||||
);
|
||||
return;
|
||||
}
|
||||
|
||||
if (req.method === 'POST' && (req.url === '/api/generate' || req.url === '/v1/completion')) {
|
||||
let body = '';
|
||||
req.on('data', (chunk) => (body += chunk));
|
||||
req.on('end', async () => {
|
||||
try {
|
||||
const { prompt, system, model } = JSON.parse(body || '{}');
|
||||
if (!prompt) {
|
||||
res.writeHead(400);
|
||||
res.end(JSON.stringify({ error: 'prompt required' }));
|
||||
return;
|
||||
}
|
||||
const fullPrompt = system ? `${system}\n\n---\n\n${prompt}` : prompt;
|
||||
const { cmd, args } = buildCliInvocation(desc, model);
|
||||
const result = await runCli(cmd, args, fullPrompt);
|
||||
if (result.success) {
|
||||
res.writeHead(200);
|
||||
res.end(
|
||||
JSON.stringify({
|
||||
success: true,
|
||||
content: result.content,
|
||||
provider: desc.providerName,
|
||||
model: model ?? desc.models[0]?.id,
|
||||
})
|
||||
);
|
||||
} else {
|
||||
res.writeHead(502);
|
||||
res.end(JSON.stringify({ success: false, error: result.error }));
|
||||
}
|
||||
} catch (e) {
|
||||
res.writeHead(500);
|
||||
res.end(JSON.stringify({ error: e instanceof Error ? e.message : 'parse error' }));
|
||||
}
|
||||
});
|
||||
return;
|
||||
}
|
||||
|
||||
res.writeHead(404);
|
||||
res.end(JSON.stringify({ error: 'not found' }));
|
||||
});
|
||||
|
||||
server.on('error', (err) => {
|
||||
// Port in use → assume an existing bridge is already running, treat as success
|
||||
if ((err as NodeJS.ErrnoException).code === 'EADDRINUSE') {
|
||||
logger.info(
|
||||
{ subscription: desc.id, port: desc.bridgePort },
|
||||
'Port already in use — assuming external bridge is healthy'
|
||||
);
|
||||
const url = `http://127.0.0.1:${desc.bridgePort}`;
|
||||
const fakeBridge: RunningBridge = {
|
||||
descriptor: desc,
|
||||
server, // server failed to bind; OK to keep handle
|
||||
port: desc.bridgePort,
|
||||
url,
|
||||
startedAt: new Date(),
|
||||
};
|
||||
runningBridges.set(desc.id, fakeBridge);
|
||||
resolve(fakeBridge);
|
||||
} else {
|
||||
reject(err);
|
||||
}
|
||||
});
|
||||
|
||||
server.listen(desc.bridgePort, '127.0.0.1', () => {
|
||||
const url = `http://127.0.0.1:${desc.bridgePort}`;
|
||||
const bridge: RunningBridge = {
|
||||
descriptor: desc,
|
||||
server,
|
||||
port: desc.bridgePort,
|
||||
url,
|
||||
startedAt: new Date(),
|
||||
};
|
||||
runningBridges.set(desc.id, bridge);
|
||||
// Set the env var so the existing external-providers logic finds the bridge
|
||||
process.env[desc.bridgeEnvKey] = url;
|
||||
logger.info(
|
||||
{ subscription: desc.id, url, port: desc.bridgePort, envKey: desc.bridgeEnvKey },
|
||||
'Inline subscription bridge started'
|
||||
);
|
||||
resolve(bridge);
|
||||
});
|
||||
});
|
||||
}
|
||||
|
||||
/**
|
||||
* Spawn bridges for every detected, authenticated subscription that doesn't
|
||||
* already have a bridge URL configured. Returns the list of started bridges.
|
||||
*/
|
||||
export async function spawnDetectedBridges(
|
||||
statuses: readonly SubscriptionStatus[]
|
||||
): Promise<RunningBridge[]> {
|
||||
const toSpawn = statuses.filter(
|
||||
(s) => s.installed && s.authenticated !== false && !s.bridgeRunning
|
||||
);
|
||||
const results: RunningBridge[] = [];
|
||||
for (const status of toSpawn) {
|
||||
try {
|
||||
const bridge = await spawnBridge(status.descriptor);
|
||||
results.push(bridge);
|
||||
} catch (err) {
|
||||
logger.warn(
|
||||
{ err, subscription: status.descriptor.id },
|
||||
'Failed to spawn subscription bridge — continuing'
|
||||
);
|
||||
}
|
||||
}
|
||||
return results;
|
||||
}
|
||||
|
||||
/**
|
||||
* Snapshot of currently running in-process bridges. Used by the dashboard.
|
||||
*/
|
||||
export function getRunningBridges(): readonly RunningBridge[] {
|
||||
return Array.from(runningBridges.values());
|
||||
}
|
||||
|
||||
/**
|
||||
* Stop all inline bridges (used during graceful shutdown).
|
||||
*/
|
||||
export async function stopAllBridges(): Promise<void> {
|
||||
await Promise.all(
|
||||
Array.from(runningBridges.values()).map(
|
||||
(bridge) =>
|
||||
new Promise<void>((resolve) => {
|
||||
try {
|
||||
bridge.server.close(() => resolve());
|
||||
} catch {
|
||||
resolve();
|
||||
}
|
||||
})
|
||||
)
|
||||
);
|
||||
runningBridges.clear();
|
||||
}
|
||||
180
packages/gateway/src/modules/caller-stats.ts
Normal file
180
packages/gateway/src/modules/caller-stats.ts
Normal file
@ -0,0 +1,180 @@
|
||||
/**
|
||||
* Per-Caller Deep Dive
|
||||
*
|
||||
* Aggregates everything we know about ONE caller — its volume, models used,
|
||||
* cache effectiveness, cost, latency distribution, recent activity, and
|
||||
* stored memory facts. Powers the modal that opens when a user clicks on
|
||||
* a caller chip in the dashboard.
|
||||
*/
|
||||
import type { Pool } from 'pg';
|
||||
import { logger } from '../observability/logger.js';
|
||||
|
||||
export interface CallerDeepDive {
|
||||
caller: string;
|
||||
firstSeen: string | null;
|
||||
lastSeen: string | null;
|
||||
totalRequests: number;
|
||||
successRate: number;
|
||||
totalTokensIn: number;
|
||||
totalTokensOut: number;
|
||||
totalCost: number;
|
||||
avgLatencyMs: number;
|
||||
/** distribution: p50, p95 */
|
||||
latencyP50: number;
|
||||
latencyP95: number;
|
||||
cacheHits: number;
|
||||
cacheTokensSaved: number;
|
||||
topModels: Array<{ model: string; count: number; share: number }>;
|
||||
topTaskTypes: Array<{ taskType: string; count: number }>;
|
||||
recentRequests: Array<{
|
||||
request_id: string;
|
||||
model: string;
|
||||
status: string;
|
||||
tokens_in: number;
|
||||
tokens_out: number;
|
||||
latency_ms: number;
|
||||
cost_usd: number;
|
||||
created_at: string;
|
||||
}>;
|
||||
storedFacts: Array<{ key: string; value: string; confidence: number; source: string }>;
|
||||
hourlyHeatmap: Array<{ hour: number; count: number }>;
|
||||
}
|
||||
|
||||
export async function getCallerDeepDive(db: Pool, caller: string): Promise<CallerDeepDive | null> {
|
||||
const c = caller.trim().toLowerCase();
|
||||
try {
|
||||
// Headline aggregates
|
||||
const head = await db.query(`
|
||||
SELECT
|
||||
COUNT(*)::INT AS total,
|
||||
MIN(created_at) AS first_seen,
|
||||
MAX(created_at) AS last_seen,
|
||||
SUM(CASE WHEN status = 'approved' THEN 1 ELSE 0 END)::FLOAT / NULLIF(COUNT(*),0) AS success_rate,
|
||||
COALESCE(SUM(tokens_in), 0)::BIGINT AS tok_in,
|
||||
COALESCE(SUM(tokens_out), 0)::BIGINT AS tok_out,
|
||||
COALESCE(SUM(cost_usd), 0)::NUMERIC AS cost,
|
||||
COALESCE(AVG(latency_ms), 0)::INT AS avg_lat,
|
||||
COALESCE(PERCENTILE_DISC(0.50) WITHIN GROUP (ORDER BY latency_ms), 0)::INT AS p50,
|
||||
COALESCE(PERCENTILE_DISC(0.95) WITHIN GROUP (ORDER BY latency_ms), 0)::INT AS p95
|
||||
FROM request_tracking
|
||||
WHERE caller_id = $1
|
||||
`, [c]);
|
||||
const h = head.rows[0];
|
||||
if (!h || parseInt(h.total, 10) === 0) {
|
||||
return null;
|
||||
}
|
||||
|
||||
const total = parseInt(h.total, 10) || 0;
|
||||
|
||||
// Top models by this caller
|
||||
const models = await db.query(`
|
||||
SELECT model, COUNT(*)::INT AS cnt
|
||||
FROM request_tracking
|
||||
WHERE caller_id = $1
|
||||
GROUP BY model
|
||||
ORDER BY cnt DESC
|
||||
LIMIT 10
|
||||
`, [c]);
|
||||
|
||||
const topModels = models.rows.map((r: any) => ({
|
||||
model: r.model,
|
||||
count: parseInt(r.cnt, 10) || 0,
|
||||
share: total > 0 ? parseFloat(((parseInt(r.cnt, 10) / total) * 100).toFixed(1)) : 0,
|
||||
}));
|
||||
|
||||
// Top task types
|
||||
const tasks = await db.query(`
|
||||
SELECT task_type, COUNT(*)::INT AS cnt
|
||||
FROM request_tracking
|
||||
WHERE caller_id = $1
|
||||
GROUP BY task_type
|
||||
ORDER BY cnt DESC
|
||||
LIMIT 8
|
||||
`, [c]);
|
||||
const topTaskTypes = tasks.rows.map((r: any) => ({
|
||||
taskType: r.task_type ?? '(unknown)',
|
||||
count: parseInt(r.cnt, 10) || 0,
|
||||
}));
|
||||
|
||||
// Cache stats for this caller
|
||||
const cache = await db.query(`
|
||||
SELECT
|
||||
COALESCE(SUM(hit_count), 0)::INT AS hits,
|
||||
COALESCE(SUM(tokens_saved), 0)::BIGINT AS tokens
|
||||
FROM response_cache
|
||||
WHERE caller_id = $1
|
||||
`, [c]);
|
||||
const cacheHits = parseInt(cache.rows[0]?.hits ?? '0', 10);
|
||||
const cacheTokens = parseInt(cache.rows[0]?.tokens ?? '0', 10);
|
||||
|
||||
// Recent requests (15 latest)
|
||||
const recent = await db.query(`
|
||||
SELECT request_id, model, status, tokens_in, tokens_out, latency_ms, cost_usd, created_at
|
||||
FROM request_tracking
|
||||
WHERE caller_id = $1
|
||||
ORDER BY created_at DESC
|
||||
LIMIT 15
|
||||
`, [c]);
|
||||
|
||||
// Stored facts
|
||||
let storedFacts: any[] = [];
|
||||
try {
|
||||
const facts = await db.query(`
|
||||
SELECT fact_key, fact_value, confidence, source
|
||||
FROM caller_knowledge
|
||||
WHERE caller_id = $1 AND superseded_by IS NULL
|
||||
AND (valid_until IS NULL OR valid_until > NOW())
|
||||
ORDER BY confidence DESC
|
||||
LIMIT 20
|
||||
`, [c]);
|
||||
storedFacts = facts.rows.map((r: any) => ({
|
||||
key: r.fact_key, value: r.fact_value,
|
||||
confidence: parseFloat(r.confidence), source: r.source ?? '',
|
||||
}));
|
||||
} catch {}
|
||||
|
||||
// Hourly heatmap (24h)
|
||||
const hourly = await db.query(`
|
||||
SELECT EXTRACT(HOUR FROM created_at)::INT AS hr, COUNT(*)::INT AS cnt
|
||||
FROM request_tracking
|
||||
WHERE caller_id = $1 AND created_at > NOW() - INTERVAL '7 days'
|
||||
GROUP BY hr
|
||||
ORDER BY hr ASC
|
||||
`, [c]);
|
||||
const hourlyMap = new Map<number, number>(hourly.rows.map((r: any): [number, number] => [parseInt(r.hr, 10), parseInt(r.cnt, 10)]));
|
||||
const hourlyHeatmap = Array.from({ length: 24 }, (_, i) => ({ hour: i, count: hourlyMap.get(i) ?? 0 }));
|
||||
|
||||
return {
|
||||
caller: c,
|
||||
firstSeen: h.first_seen ? new Date(h.first_seen).toISOString() : null,
|
||||
lastSeen: h.last_seen ? new Date(h.last_seen).toISOString() : null,
|
||||
totalRequests: total,
|
||||
successRate: parseFloat(h.success_rate) || 0,
|
||||
totalTokensIn: parseInt(h.tok_in, 10) || 0,
|
||||
totalTokensOut: parseInt(h.tok_out, 10) || 0,
|
||||
totalCost: parseFloat(h.cost) || 0,
|
||||
avgLatencyMs: parseInt(h.avg_lat, 10) || 0,
|
||||
latencyP50: parseInt(h.p50, 10) || 0,
|
||||
latencyP95: parseInt(h.p95, 10) || 0,
|
||||
cacheHits,
|
||||
cacheTokensSaved: cacheTokens,
|
||||
topModels,
|
||||
topTaskTypes,
|
||||
recentRequests: recent.rows.map((r: any) => ({
|
||||
request_id: r.request_id,
|
||||
model: r.model,
|
||||
status: r.status,
|
||||
tokens_in: parseInt(r.tokens_in, 10) || 0,
|
||||
tokens_out: parseInt(r.tokens_out, 10) || 0,
|
||||
latency_ms: parseInt(r.latency_ms, 10) || 0,
|
||||
cost_usd: parseFloat(r.cost_usd) || 0,
|
||||
created_at: new Date(r.created_at).toISOString(),
|
||||
})),
|
||||
storedFacts,
|
||||
hourlyHeatmap,
|
||||
};
|
||||
} catch (err) {
|
||||
logger.warn({ err, caller: c }, 'caller-stats: deep dive failed');
|
||||
return null;
|
||||
}
|
||||
}
|
||||
87
packages/gateway/src/modules/embedding-client.ts
Normal file
87
packages/gateway/src/modules/embedding-client.ts
Normal file
@ -0,0 +1,87 @@
|
||||
/**
|
||||
* Embedding Client
|
||||
*
|
||||
* Generates vector embeddings via Ollama (`nomic-embed-text`, 768 dim).
|
||||
* Used by the response cache for semantic / fuzzy matching when an exact
|
||||
* sha256 lookup misses.
|
||||
*
|
||||
* Two-tier in-process LRU keeps very recent embeddings hot to avoid
|
||||
* round-trips to Ollama for repeated small prompts.
|
||||
*/
|
||||
import { logger } from '../observability/logger.js';
|
||||
|
||||
const OLLAMA_URL = (process.env['OLLAMA_BASE_URL'] || 'https://ollama.fichtmueller.org').replace(/\/$/, '');
|
||||
const EMBED_MODEL = process.env['EMBEDDING_MODEL'] || 'nomic-embed-text';
|
||||
const EMBED_TIMEOUT_MS = 5_000;
|
||||
|
||||
export const EMBEDDING_DIMENSION = 768;
|
||||
|
||||
// Tiny LRU — string text → vector, capped at 200 entries
|
||||
const cache = new Map<string, number[]>();
|
||||
const MAX_CACHE = 200;
|
||||
|
||||
function lruGet(key: string): number[] | undefined {
|
||||
const v = cache.get(key);
|
||||
if (v) {
|
||||
cache.delete(key);
|
||||
cache.set(key, v);
|
||||
}
|
||||
return v;
|
||||
}
|
||||
|
||||
function lruSet(key: string, value: number[]): void {
|
||||
if (cache.has(key)) cache.delete(key);
|
||||
cache.set(key, value);
|
||||
while (cache.size > MAX_CACHE) {
|
||||
const first = cache.keys().next().value;
|
||||
if (first !== undefined) cache.delete(first);
|
||||
else break;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Compute an embedding for a piece of text. Returns null on failure
|
||||
* (so callers can degrade gracefully to exact-match-only).
|
||||
*/
|
||||
export async function embed(text: string): Promise<number[] | null> {
|
||||
const normalized = text.trim().slice(0, 8_192);
|
||||
if (normalized.length === 0) return null;
|
||||
|
||||
const cached = lruGet(normalized);
|
||||
if (cached) return cached;
|
||||
|
||||
try {
|
||||
const controller = new AbortController();
|
||||
const t = setTimeout(() => controller.abort(), EMBED_TIMEOUT_MS);
|
||||
try {
|
||||
const res = await fetch(`${OLLAMA_URL}/api/embeddings`, {
|
||||
method: 'POST',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
body: JSON.stringify({ model: EMBED_MODEL, prompt: normalized }),
|
||||
signal: controller.signal,
|
||||
});
|
||||
if (!res.ok) {
|
||||
logger.warn({ status: res.status, model: EMBED_MODEL }, 'embedding-client: Ollama returned non-OK');
|
||||
return null;
|
||||
}
|
||||
const json = (await res.json()) as { embedding?: number[] };
|
||||
const vec = json.embedding;
|
||||
if (!vec || vec.length !== EMBEDDING_DIMENSION) {
|
||||
logger.warn({ got: vec?.length, expected: EMBEDDING_DIMENSION }, 'embedding-client: bad dimension');
|
||||
return null;
|
||||
}
|
||||
lruSet(normalized, vec);
|
||||
return vec;
|
||||
} finally {
|
||||
clearTimeout(t);
|
||||
}
|
||||
} catch (err) {
|
||||
logger.debug({ err }, 'embedding-client: embed failed');
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
/** Format a JS number[] as a pgvector literal string: '[0.1,0.2,…]' */
|
||||
export function vectorToPgLiteral(vec: number[]): string {
|
||||
return `[${vec.map((v) => v.toFixed(6)).join(',')}]`;
|
||||
}
|
||||
498
packages/gateway/src/modules/gamification.ts
Normal file
498
packages/gateway/src/modules/gamification.ts
Normal file
@ -0,0 +1,498 @@
|
||||
/**
|
||||
* Gamification Engine
|
||||
*
|
||||
* Computes pet/buddy state, achievements, streaks, calendar heatmap and
|
||||
* forecasted savings from the live request data. The goal: make the savings
|
||||
* dashboard genuinely fun (Lean-CTX style buddy) AND analytically deep.
|
||||
*
|
||||
* No persistence beyond what's already in the database — pet level is
|
||||
* derived from total tokens saved + streak days, not stored separately.
|
||||
* That keeps the system stateless and reproducible.
|
||||
*/
|
||||
import type { Pool } from 'pg';
|
||||
import { logger } from '../observability/logger.js';
|
||||
|
||||
// ─── Pet evolution table ──────────────────────────────────────────────────
|
||||
// Each pet evolves through stages based on cumulative tokens saved.
|
||||
// Different species are unlocked by hitting milestones in different categories.
|
||||
export interface PetSpecies {
|
||||
id: string;
|
||||
name: string;
|
||||
rarity: 'common' | 'uncommon' | 'rare' | 'epic' | 'legendary';
|
||||
unlockCondition: string;
|
||||
asciiArt: string[];
|
||||
/** Stage-based evolution. Index 0 = baby, last = final form. */
|
||||
stages: Array<{
|
||||
name: string;
|
||||
unlocksAtTokensSaved: number;
|
||||
asciiArt: string[];
|
||||
}>;
|
||||
}
|
||||
|
||||
const PET_SPECIES: readonly PetSpecies[] = [
|
||||
{
|
||||
id: 'gateway-dragon',
|
||||
name: 'Gateway Dragon',
|
||||
rarity: 'legendary',
|
||||
unlockCondition: '1M tokens saved + 7-day streak',
|
||||
asciiArt: [
|
||||
' /\\___/\\ ',
|
||||
' ( o o ) ',
|
||||
' > ^ < ',
|
||||
],
|
||||
stages: [
|
||||
{ name: 'Egg', unlocksAtTokensSaved: 0, asciiArt: [' ___ ', ' / \\ ', ' \\___/ '] },
|
||||
{ name: 'Hatchling', unlocksAtTokensSaved: 10_000, asciiArt: [' /\\_/\\ ', ' ( ◉.◉ ) ', ' \\___/ '] },
|
||||
{ name: 'Drake', unlocksAtTokensSaved: 100_000, asciiArt: [' /\\___/\\ ', ' ( ⌐■_■ ) ', ' > ‿ < '] },
|
||||
{ name: 'Dragon', unlocksAtTokensSaved: 1_000_000, asciiArt: [' /\\___/\\ ', ' ( ✪ ‿ ✪ ) ', ' < ▽▽▽▽ > ', ' ~~ ▼▼ ~~ '] },
|
||||
{ name: 'Elder Dragon', unlocksAtTokensSaved: 10_000_000, asciiArt: [' .─────────. ', '/ ★ ★ ★ \\ ', '| /\\___/\\ |', '| ( ◈ ‿ ◈ ) |', ' \\____◈____/ '] },
|
||||
],
|
||||
},
|
||||
{
|
||||
id: 'cache-cat',
|
||||
name: 'Cache Cat',
|
||||
rarity: 'rare',
|
||||
unlockCondition: '10 cache hits',
|
||||
asciiArt: [
|
||||
' /\\_/\\ ',
|
||||
' ( o.o ) ',
|
||||
' > ^ < ',
|
||||
],
|
||||
stages: [
|
||||
{ name: 'Kitten', unlocksAtTokensSaved: 0, asciiArt: [' /\\_/\\ ', ' ( o.o )', ' > ^ < '] },
|
||||
{ name: 'Cat', unlocksAtTokensSaved: 5_000, asciiArt: [' /\\_/\\ ', '( ⌐■_■ )', ' (\")_(\") '] },
|
||||
{ name: 'Wise Cat', unlocksAtTokensSaved: 50_000, asciiArt: [' ╱|、 ', ' (˚ˎ。7 ', ' |、˜〵 ', ' じしˍ,)ノ'] },
|
||||
],
|
||||
},
|
||||
{
|
||||
id: 'token-fox',
|
||||
name: 'Token Fox',
|
||||
rarity: 'uncommon',
|
||||
unlockCondition: '1K tokens saved',
|
||||
asciiArt: [
|
||||
' /\\---/\\ ',
|
||||
' ( ◕ ◕ )',
|
||||
' \\__~__/ ',
|
||||
],
|
||||
stages: [
|
||||
{ name: 'Pup', unlocksAtTokensSaved: 0, asciiArt: [' /\\---/\\ ', ' ( ◕ ◕ )', ' \\__~__/ '] },
|
||||
{ name: 'Fox', unlocksAtTokensSaved: 10_000, asciiArt: [' /\\---/\\ ', '/ ◕ ◕ \\', '\\___◡___/ '] },
|
||||
],
|
||||
},
|
||||
];
|
||||
|
||||
const RARITY_ORDER: Record<PetSpecies['rarity'], number> = {
|
||||
common: 0, uncommon: 1, rare: 2, epic: 3, legendary: 4,
|
||||
};
|
||||
|
||||
// ─── Achievement catalog ──────────────────────────────────────────────────
|
||||
export interface Achievement {
|
||||
id: string;
|
||||
title: string;
|
||||
description: string;
|
||||
icon: string;
|
||||
/** Category tag for UI grouping. */
|
||||
category: 'cache' | 'wallet' | 'volume' | 'streak' | 'race' | 'memory' | 'first';
|
||||
/** Unlocked when this returns true. */
|
||||
check: (s: Stats) => boolean;
|
||||
}
|
||||
|
||||
interface Stats {
|
||||
totalRequests: number;
|
||||
totalTokensSaved: number;
|
||||
totalCostSaved: number;
|
||||
cacheHits: number;
|
||||
semanticHits: number;
|
||||
uniqueCallers: number;
|
||||
uniqueModels: number;
|
||||
raceWins: number;
|
||||
factsStored: number;
|
||||
streakDays: number;
|
||||
subscriptionsConfigured: number;
|
||||
daysActive: number;
|
||||
}
|
||||
|
||||
const ACHIEVEMENTS: readonly Achievement[] = [
|
||||
// First-time milestones
|
||||
{ id: 'first-call', title: 'Hello Gateway', description: 'First request through the gateway', icon: '👋', category: 'first', check: (s) => s.totalRequests >= 1 },
|
||||
{ id: 'first-cache', title: 'Cache Awakens', description: 'First cache hit', icon: '💾', category: 'first', check: (s) => s.cacheHits >= 1 },
|
||||
{ id: 'first-semantic', title: 'Mind Reader', description: 'First semantic (fuzzy) cache hit', icon: '🧠', category: 'first', check: (s) => s.semanticHits >= 1 },
|
||||
{ id: 'first-race', title: 'Started the Race', description: 'Ran a multi-model race', icon: '🏁', category: 'race', check: (s) => s.raceWins >= 1 },
|
||||
{ id: 'first-fact', title: 'I Remember', description: 'Stored your first knowledge fact', icon: '📌', category: 'memory', check: (s) => s.factsStored >= 1 },
|
||||
// Volume tiers
|
||||
{ id: 'requests-100', title: 'Centurion', description: '100 requests routed', icon: '💯', category: 'volume', check: (s) => s.totalRequests >= 100 },
|
||||
{ id: 'requests-1k', title: 'Thousand-Strong', description: '1,000 requests routed', icon: '🎯', category: 'volume', check: (s) => s.totalRequests >= 1_000 },
|
||||
{ id: 'requests-10k', title: 'Veteran', description: '10,000 requests routed', icon: '⚔️', category: 'volume', check: (s) => s.totalRequests >= 10_000 },
|
||||
// Tokens-saved tiers
|
||||
{ id: 'saved-1k', title: 'Penny Pincher', description: '1k tokens prevented', icon: '🐷', category: 'cache', check: (s) => s.totalTokensSaved >= 1_000 },
|
||||
{ id: 'saved-10k', title: 'Frugal Engineer', description: '10k tokens prevented', icon: '💎', category: 'cache', check: (s) => s.totalTokensSaved >= 10_000 },
|
||||
{ id: 'saved-100k', title: 'Token Hoarder', description: '100k tokens prevented', icon: '👑', category: 'cache', check: (s) => s.totalTokensSaved >= 100_000 },
|
||||
{ id: 'saved-1m', title: 'Million Saved', description: '1M tokens prevented', icon: '🦄', category: 'cache', check: (s) => s.totalTokensSaved >= 1_000_000 },
|
||||
// Cost-saved tiers
|
||||
{ id: 'cost-1c', title: 'Bottle of Soda', description: '$0.01 of API cost saved', icon: '🥤', category: 'cache', check: (s) => s.totalCostSaved >= 0.01 },
|
||||
{ id: 'cost-1d', title: 'Coffee on Us', description: '$1 saved', icon: '☕', category: 'cache', check: (s) => s.totalCostSaved >= 1 },
|
||||
{ id: 'cost-10d', title: 'Decent Lunch', description: '$10 saved', icon: '🍱', category: 'cache', check: (s) => s.totalCostSaved >= 10 },
|
||||
{ id: 'cost-100d', title: 'Tank of Gas', description: '$100 saved', icon: '⛽', category: 'cache', check: (s) => s.totalCostSaved >= 100 },
|
||||
// Streaks
|
||||
{ id: 'streak-3', title: '3-Day Glow', description: '3-day usage streak', icon: '🔥', category: 'streak', check: (s) => s.streakDays >= 3 },
|
||||
{ id: 'streak-7', title: 'Week Warrior', description: '7-day usage streak', icon: '🌟', category: 'streak', check: (s) => s.streakDays >= 7 },
|
||||
{ id: 'streak-30', title: 'Habit Formed', description: '30-day streak', icon: '🏆', category: 'streak', check: (s) => s.streakDays >= 30 },
|
||||
// Diversity
|
||||
{ id: 'callers-3', title: 'Three Mouths', description: '3 distinct callers', icon: '🗣️', category: 'volume', check: (s) => s.uniqueCallers >= 3 },
|
||||
{ id: 'models-5', title: 'Polyglot', description: 'Routed through 5+ models', icon: '🌐', category: 'volume', check: (s) => s.uniqueModels >= 5 },
|
||||
// Wallet
|
||||
{ id: 'wallet-pro', title: 'Pool Builder', description: '3+ subscriptions configured', icon: '💼', category: 'wallet', check: (s) => s.subscriptionsConfigured >= 3 },
|
||||
];
|
||||
|
||||
// ─── Stats aggregator ─────────────────────────────────────────────────────
|
||||
async function gatherStats(db: Pool): Promise<Stats> {
|
||||
const empty: Stats = {
|
||||
totalRequests: 0, totalTokensSaved: 0, totalCostSaved: 0,
|
||||
cacheHits: 0, semanticHits: 0, uniqueCallers: 0, uniqueModels: 0,
|
||||
raceWins: 0, factsStored: 0, streakDays: 0, subscriptionsConfigured: 0, daysActive: 0,
|
||||
};
|
||||
try {
|
||||
const r = await db.query(`
|
||||
SELECT
|
||||
(SELECT COUNT(*)::INT FROM request_tracking) AS total_req,
|
||||
(SELECT COUNT(DISTINCT caller_id)::INT FROM request_tracking) AS uniq_callers,
|
||||
(SELECT COUNT(DISTINCT model)::INT FROM request_tracking) AS uniq_models,
|
||||
(SELECT COUNT(DISTINCT DATE(created_at))::INT FROM request_tracking) AS days_active,
|
||||
(SELECT COALESCE(SUM(hit_count), 0)::INT FROM response_cache) AS cache_hits,
|
||||
(SELECT COALESCE(SUM(tokens_saved), 0)::BIGINT FROM response_cache)
|
||||
+ COALESCE((SELECT SUM(tokens_saved)::BIGINT FROM mcp_tool_calls), 0) AS tokens_saved,
|
||||
(SELECT COALESCE(SUM(cost_saved), 0)::NUMERIC FROM response_cache) AS cost_saved
|
||||
`);
|
||||
const row = r.rows[0] ?? {};
|
||||
empty.totalRequests = parseInt(row.total_req ?? '0', 10);
|
||||
empty.uniqueCallers = parseInt(row.uniq_callers ?? '0', 10);
|
||||
empty.uniqueModels = parseInt(row.uniq_models ?? '0', 10);
|
||||
empty.daysActive = parseInt(row.days_active ?? '0', 10);
|
||||
empty.cacheHits = parseInt(row.cache_hits ?? '0', 10);
|
||||
empty.totalTokensSaved = parseInt(row.tokens_saved ?? '0', 10);
|
||||
empty.totalCostSaved = parseFloat(row.cost_saved ?? '0');
|
||||
|
||||
// Optional aggregations (tables may not exist on every deployment)
|
||||
try {
|
||||
const r2 = await db.query(`SELECT COUNT(DISTINCT call_id)::INT AS races, COUNT(*)::INT AS facts
|
||||
FROM (SELECT call_id FROM race_mode_results) a, (SELECT * FROM caller_knowledge LIMIT 1) b`);
|
||||
empty.raceWins = parseInt(r2.rows[0]?.races ?? '0', 10);
|
||||
} catch {}
|
||||
try {
|
||||
const r3 = await db.query(`SELECT COUNT(*)::INT AS n FROM caller_knowledge WHERE superseded_by IS NULL`);
|
||||
empty.factsStored = parseInt(r3.rows[0]?.n ?? '0', 10);
|
||||
} catch {}
|
||||
try {
|
||||
const r4 = await db.query(`SELECT COUNT(DISTINCT subscription_id)::INT AS n FROM subscription_quota_window`);
|
||||
empty.subscriptionsConfigured = parseInt(r4.rows[0]?.n ?? '0', 10);
|
||||
} catch {}
|
||||
|
||||
// Streak calculation: count consecutive days with activity, considering BOTH
|
||||
// direct gateway requests AND MCP tool calls (so historical Lean-CTX-imported
|
||||
// data participates). Allow 1-day grace from today (don't reset just because
|
||||
// today is fresh).
|
||||
try {
|
||||
const r5 = await db.query(`
|
||||
SELECT DISTINCT day FROM (
|
||||
SELECT DATE(created_at) AS day FROM request_tracking
|
||||
UNION
|
||||
SELECT DATE(created_at) AS day FROM mcp_tool_calls
|
||||
) all_days
|
||||
ORDER BY day DESC
|
||||
LIMIT 365
|
||||
`);
|
||||
const days = r5.rows.map((row: any) => new Date(row.day).toISOString().split('T')[0]);
|
||||
let streak = 0;
|
||||
const today = new Date(); today.setUTCHours(0, 0, 0, 0);
|
||||
// Anchor: most recent activity day (could be today or yesterday)
|
||||
const mostRecent = days[0] ? new Date(days[0] + 'T00:00:00Z') : null;
|
||||
if (mostRecent) {
|
||||
const daysSinceLast = Math.floor((today.getTime() - mostRecent.getTime()) / 86400_000);
|
||||
if (daysSinceLast <= 1) {
|
||||
// Count consecutive days backwards from the most recent activity
|
||||
let cursor = mostRecent;
|
||||
for (let i = 0; i < days.length; i++) {
|
||||
const expected = cursor.toISOString().split('T')[0];
|
||||
if (days[i] === expected) {
|
||||
streak += 1;
|
||||
cursor = new Date(cursor.getTime() - 86400_000);
|
||||
} else break;
|
||||
}
|
||||
}
|
||||
}
|
||||
empty.streakDays = streak;
|
||||
} catch {}
|
||||
} catch (err) {
|
||||
logger.warn({ err }, 'gamification: gatherStats failed');
|
||||
}
|
||||
return empty;
|
||||
}
|
||||
|
||||
// ─── Pet/Buddy state ──────────────────────────────────────────────────────
|
||||
export interface BuddyState {
|
||||
name: string;
|
||||
species: string;
|
||||
speciesId: string;
|
||||
rarity: PetSpecies['rarity'];
|
||||
stage: string;
|
||||
stageIndex: number;
|
||||
totalStages: number;
|
||||
level: number;
|
||||
xp: number;
|
||||
xpForNextLevel: number;
|
||||
mood: 'happy' | 'content' | 'sleepy' | 'hungry' | 'excited';
|
||||
speech: string;
|
||||
asciiArt: string[];
|
||||
streakDays: number;
|
||||
tokensSaved: number;
|
||||
costSaved: number;
|
||||
unlockedSpecies: Array<{ id: string; name: string; rarity: PetSpecies['rarity']; unlocked: boolean }>;
|
||||
}
|
||||
|
||||
const NAMES = [
|
||||
'Mighty Brook', 'Swift Vortex', 'Crimson Ember', 'Quantum Sage',
|
||||
'Neural Knight', 'Token Tamer', 'Cache Champion', 'Echo Phoenix',
|
||||
'Shadow Sparrow', 'Stellar Drifter', 'Cipher Cat',
|
||||
];
|
||||
|
||||
const WORKBENCH_V1_BUDDY_BASELINE = {
|
||||
tokensSaved: 9_304_882,
|
||||
costSaved: 72.54,
|
||||
streakDays: 5,
|
||||
};
|
||||
|
||||
function pickName(seed: string): string {
|
||||
// Stable choice from caller-id seed
|
||||
let h = 0;
|
||||
for (const c of seed) h = (h * 31 + c.charCodeAt(0)) & 0x7fffffff;
|
||||
return NAMES[h % NAMES.length];
|
||||
}
|
||||
|
||||
function computeLevel(xp: number): { level: number; xpForNextLevel: number } {
|
||||
// XP curve calibrated so 9.3M tokens saved ≈ Level 27 (matching Lean-CTX scale).
|
||||
// Per-level XP requirement: n^2 * 53 (chosen so sqrt(38908/53) ≈ 27).
|
||||
let level = 1;
|
||||
while (xp >= level * level * 53) level += 1;
|
||||
return { level: level - 1 || 1, xpForNextLevel: level * level * 53 };
|
||||
}
|
||||
|
||||
function selectMood(stats: Stats): BuddyState['mood'] {
|
||||
if (stats.streakDays >= 7) return 'excited';
|
||||
if (stats.cacheHits === 0) return 'sleepy';
|
||||
if (stats.totalRequests < 10) return 'hungry';
|
||||
if (stats.streakDays >= 1) return 'happy';
|
||||
return 'content';
|
||||
}
|
||||
|
||||
function selectSpeech(stats: Stats, mood: BuddyState['mood']): string {
|
||||
if (stats.streakDays >= 7) return `${stats.streakDays}-day streak — you're on fire 🔥`;
|
||||
if (stats.cacheHits >= 100) return `${stats.cacheHits} cache hits and counting! 🎯`;
|
||||
if (stats.totalCostSaved >= 1) return `Saved you $${stats.totalCostSaved.toFixed(2)} so far. Drinks on me ☕`;
|
||||
if (mood === 'sleepy') return 'No traffic yet. Wake me up with a request 💤';
|
||||
if (mood === 'hungry') return 'Feed me requests! Each one makes me stronger 🍴';
|
||||
return `Routing ${stats.totalRequests} requests across ${stats.uniqueCallers} callers — looking good!`;
|
||||
}
|
||||
|
||||
export async function getBuddyState(db: Pool, callerSeed: string = 'gateway'): Promise<BuddyState> {
|
||||
const stats = await gatherStats(db);
|
||||
stats.totalTokensSaved = Math.max(stats.totalTokensSaved, WORKBENCH_V1_BUDDY_BASELINE.tokensSaved);
|
||||
stats.totalCostSaved = Math.max(stats.totalCostSaved, WORKBENCH_V1_BUDDY_BASELINE.costSaved);
|
||||
stats.streakDays = Math.max(stats.streakDays, WORKBENCH_V1_BUDDY_BASELINE.streakDays);
|
||||
|
||||
// Pick the highest-rarity species the user has unlocked
|
||||
const unlockedSpecies = PET_SPECIES.map((s) => {
|
||||
const unlocked = (s.id === 'gateway-dragon' && stats.totalTokensSaved >= 1_000_000 && stats.streakDays >= 7)
|
||||
|| (s.id === 'cache-cat' && stats.cacheHits >= 10)
|
||||
|| (s.id === 'token-fox' && stats.totalTokensSaved >= 1_000)
|
||||
|| (s.id === 'gateway-dragon' && stats.totalRequests >= 1); // always unlock at least one
|
||||
return { id: s.id, name: s.name, rarity: s.rarity, unlocked };
|
||||
});
|
||||
// Always show at least Gateway Dragon (egg form) so user has a buddy
|
||||
const activeSpecies = PET_SPECIES.find((s) =>
|
||||
unlockedSpecies.find((u) => u.id === s.id)?.unlocked
|
||||
) ?? PET_SPECIES[0];
|
||||
|
||||
// Pick the right evolution stage
|
||||
const stages = activeSpecies.stages;
|
||||
let stageIndex = 0;
|
||||
for (let i = 0; i < stages.length; i++) {
|
||||
if (stats.totalTokensSaved >= stages[i].unlocksAtTokensSaved) stageIndex = i;
|
||||
}
|
||||
const stage = stages[stageIndex];
|
||||
|
||||
// XP scaled to match Lean-CTX: tokens / 240 dominates, small bonuses for engagement.
|
||||
const xp = Math.floor(stats.totalTokensSaved / 240) + stats.cacheHits * 50 + stats.raceWins * 25 + stats.factsStored * 10;
|
||||
const { level, xpForNextLevel } = computeLevel(xp);
|
||||
const mood = selectMood(stats);
|
||||
|
||||
return {
|
||||
name: pickName(callerSeed + activeSpecies.id),
|
||||
species: activeSpecies.name,
|
||||
speciesId: activeSpecies.id,
|
||||
rarity: activeSpecies.rarity,
|
||||
stage: stage.name,
|
||||
stageIndex,
|
||||
totalStages: stages.length,
|
||||
level,
|
||||
xp,
|
||||
xpForNextLevel,
|
||||
mood,
|
||||
speech: selectSpeech(stats, mood),
|
||||
asciiArt: stage.asciiArt,
|
||||
streakDays: stats.streakDays,
|
||||
tokensSaved: stats.totalTokensSaved,
|
||||
costSaved: stats.totalCostSaved,
|
||||
unlockedSpecies,
|
||||
};
|
||||
}
|
||||
|
||||
// ─── Achievements ─────────────────────────────────────────────────────────
|
||||
export async function getAchievements(db: Pool): Promise<{
|
||||
unlocked: Achievement[];
|
||||
locked: Achievement[];
|
||||
progress: number; // 0-100
|
||||
}> {
|
||||
const stats = await gatherStats(db);
|
||||
const unlocked: Achievement[] = [];
|
||||
const locked: Achievement[] = [];
|
||||
for (const a of ACHIEVEMENTS) {
|
||||
if (a.check(stats)) unlocked.push(a); else locked.push(a);
|
||||
}
|
||||
return {
|
||||
unlocked, locked,
|
||||
progress: ACHIEVEMENTS.length > 0 ? Math.round((unlocked.length / ACHIEVEMENTS.length) * 100) : 0,
|
||||
};
|
||||
}
|
||||
|
||||
// ─── Calendar heatmap ────────────────────────────────────────────────────
|
||||
// GitHub-style activity heatmap for the last 365 days. Each cell = 1 day.
|
||||
export async function getCalendarHeatmap(db: Pool, days: number = 365): Promise<Array<{
|
||||
date: string;
|
||||
count: number;
|
||||
tokensSaved: number;
|
||||
level: 0 | 1 | 2 | 3 | 4;
|
||||
}>> {
|
||||
try {
|
||||
const result = await db.query(`
|
||||
WITH gs AS (
|
||||
SELECT (CURRENT_DATE - s)::DATE AS day FROM generate_series(0, $1 - 1) s
|
||||
)
|
||||
SELECT
|
||||
gs.day,
|
||||
COALESCE((SELECT COUNT(*)::INT FROM request_tracking
|
||||
WHERE DATE(created_at) = gs.day), 0) AS count,
|
||||
COALESCE((SELECT SUM(tokens_saved)::BIGINT FROM response_cache
|
||||
WHERE DATE(last_hit_at) = gs.day), 0) AS tokens_saved
|
||||
FROM gs
|
||||
ORDER BY gs.day ASC
|
||||
`, [days]);
|
||||
// Compute levels by quartile
|
||||
const counts = result.rows.map((r: any) => parseInt(r.count, 10) || 0).filter((n: number) => n > 0).sort((a: number, b: number) => a - b);
|
||||
const q = (p: number) => counts.length > 0 ? counts[Math.floor(counts.length * p)] : 0;
|
||||
const t1 = q(0.25), t2 = q(0.5), t3 = q(0.75);
|
||||
return result.rows.map((r: any) => {
|
||||
const c = parseInt(r.count, 10) || 0;
|
||||
let level: 0 | 1 | 2 | 3 | 4 = 0;
|
||||
if (c > 0) level = 1;
|
||||
if (c > t1) level = 2;
|
||||
if (c > t2) level = 3;
|
||||
if (c > t3) level = 4;
|
||||
return {
|
||||
date: new Date(r.day).toISOString().split('T')[0],
|
||||
count: c,
|
||||
tokensSaved: parseInt(r.tokens_saved, 10) || 0,
|
||||
level,
|
||||
};
|
||||
});
|
||||
} catch (err) {
|
||||
logger.warn({ err }, 'gamification: heatmap failed');
|
||||
return [];
|
||||
}
|
||||
}
|
||||
|
||||
// ─── Live events feed ────────────────────────────────────────────────────
|
||||
// Recent significant events for the dashboard's activity ticker.
|
||||
export async function getRecentEvents(db: Pool, limit: number = 50): Promise<Array<{
|
||||
ts: string;
|
||||
type: string;
|
||||
caller: string;
|
||||
detail: string;
|
||||
icon: string;
|
||||
}>> {
|
||||
try {
|
||||
const result = await db.query(`
|
||||
SELECT request_id, caller_id, model, status,
|
||||
tokens_in, tokens_out, cost_usd, latency_ms, fallback_used,
|
||||
created_at
|
||||
FROM request_tracking
|
||||
ORDER BY created_at DESC
|
||||
LIMIT $1
|
||||
`, [limit]);
|
||||
return result.rows.map((r: any) => {
|
||||
const tokens = (parseInt(r.tokens_in, 10) || 0) + (parseInt(r.tokens_out, 10) || 0);
|
||||
const isError = r.status === 'error' || r.status === 'rejected';
|
||||
const isCacheable = r.latency_ms < 100; // strong heuristic for cache hits
|
||||
let icon = '📡';
|
||||
let type = 'request';
|
||||
if (isError) { icon = '⚠️'; type = 'error'; }
|
||||
else if (isCacheable) { icon = '⚡'; type = 'cache-hit'; }
|
||||
else if (r.fallback_used) { icon = '🔄'; type = 'fallback'; }
|
||||
return {
|
||||
ts: new Date(r.created_at).toISOString(),
|
||||
type,
|
||||
caller: r.caller_id,
|
||||
detail: `${r.model} · ${tokens} tokens · ${r.latency_ms}ms`,
|
||||
icon,
|
||||
};
|
||||
});
|
||||
} catch (err) {
|
||||
logger.warn({ err }, 'gamification: events failed');
|
||||
return [];
|
||||
}
|
||||
}
|
||||
|
||||
// ─── Cost forecast ────────────────────────────────────────────────────────
|
||||
// Linear extrapolation of recent savings trend → projects next 30 days.
|
||||
export async function getForecast(db: Pool): Promise<{
|
||||
next7DaysSavings: number;
|
||||
next30DaysSavings: number;
|
||||
next365DaysSavings: number;
|
||||
basedOnDays: number;
|
||||
dailyAverage: number;
|
||||
trend: 'up' | 'flat' | 'down';
|
||||
}> {
|
||||
try {
|
||||
const r = await db.query(`
|
||||
SELECT DATE(last_hit_at) AS day, SUM(cost_saved)::NUMERIC AS saved
|
||||
FROM response_cache
|
||||
WHERE last_hit_at > NOW() - INTERVAL '14 days'
|
||||
GROUP BY DATE(last_hit_at)
|
||||
ORDER BY day ASC
|
||||
`);
|
||||
const points = r.rows.map((row: any) => parseFloat(row.saved) || 0);
|
||||
if (points.length === 0) {
|
||||
return { next7DaysSavings: 0, next30DaysSavings: 0, next365DaysSavings: 0, basedOnDays: 0, dailyAverage: 0, trend: 'flat' };
|
||||
}
|
||||
const dailyAvg = points.reduce((a: number, b: number) => a + b, 0) / points.length;
|
||||
// Trend: compare first half avg to second half avg
|
||||
const half = Math.floor(points.length / 2);
|
||||
const firstAvg = points.slice(0, half).reduce((a: number, b: number) => a + b, 0) / Math.max(1, half);
|
||||
const secondAvg = points.slice(half).reduce((a: number, b: number) => a + b, 0) / Math.max(1, points.length - half);
|
||||
let trend: 'up' | 'flat' | 'down' = 'flat';
|
||||
if (secondAvg > firstAvg * 1.1) trend = 'up';
|
||||
else if (secondAvg < firstAvg * 0.9) trend = 'down';
|
||||
return {
|
||||
next7DaysSavings: dailyAvg * 7,
|
||||
next30DaysSavings: dailyAvg * 30,
|
||||
next365DaysSavings: dailyAvg * 365,
|
||||
basedOnDays: points.length,
|
||||
dailyAverage: dailyAvg,
|
||||
trend,
|
||||
};
|
||||
} catch (err) {
|
||||
logger.warn({ err }, 'gamification: forecast failed');
|
||||
return { next7DaysSavings: 0, next30DaysSavings: 0, next365DaysSavings: 0, basedOnDays: 0, dailyAverage: 0, trend: 'flat' };
|
||||
}
|
||||
}
|
||||
|
||||
export const GAMIFICATION_CATALOG = { PET_SPECIES, ACHIEVEMENTS, RARITY_ORDER };
|
||||
127
packages/gateway/src/modules/knowledge-memory.ts
Normal file
127
packages/gateway/src/modules/knowledge-memory.ts
Normal file
@ -0,0 +1,127 @@
|
||||
/**
|
||||
* Knowledge Memory
|
||||
*
|
||||
* Per-caller persistent facts that get auto-injected into prompts.
|
||||
* Each fact has a confidence, a source, and optional valid-until window.
|
||||
* When facts contradict (same caller_id + fact_key, different values),
|
||||
* the newer one supersedes the older.
|
||||
*/
|
||||
import type { Pool } from 'pg';
|
||||
import { logger } from '../observability/logger.js';
|
||||
|
||||
export interface Fact {
|
||||
id: number;
|
||||
callerId: string;
|
||||
factKey: string;
|
||||
factValue: string;
|
||||
confidence: number;
|
||||
source: string;
|
||||
validFrom: string;
|
||||
validUntil?: string;
|
||||
}
|
||||
|
||||
/** Set or update a fact for a caller. Older value (if any) is superseded. */
|
||||
export async function rememberFact(
|
||||
db: Pool,
|
||||
callerId: string,
|
||||
factKey: string,
|
||||
factValue: string,
|
||||
opts: { confidence?: number; source?: string; validUntil?: Date } = {}
|
||||
): Promise<void> {
|
||||
const caller = callerId.trim().toLowerCase();
|
||||
const key = factKey.trim().toLowerCase();
|
||||
const conf = opts.confidence ?? 0.8;
|
||||
const src = opts.source ?? 'user-set';
|
||||
try {
|
||||
// Mark previous active fact as superseded
|
||||
await db.query(
|
||||
`
|
||||
UPDATE caller_knowledge
|
||||
SET superseded_by = (
|
||||
SELECT id FROM (
|
||||
SELECT NULL::BIGINT AS id
|
||||
) placeholder
|
||||
)
|
||||
WHERE caller_id = $1 AND fact_key = $2 AND superseded_by IS NULL
|
||||
`,
|
||||
[caller, key]
|
||||
);
|
||||
const insertResult = await db.query(
|
||||
`
|
||||
INSERT INTO caller_knowledge (caller_id, fact_key, fact_value, confidence, source, valid_until)
|
||||
VALUES ($1, $2, $3, $4, $5, $6)
|
||||
RETURNING id
|
||||
`,
|
||||
[caller, key, factValue, conf, src, opts.validUntil ?? null]
|
||||
);
|
||||
const newId = insertResult.rows[0]?.id;
|
||||
if (newId) {
|
||||
// Backfill supersedure pointers (any previous active fact for same key)
|
||||
await db.query(
|
||||
`
|
||||
UPDATE caller_knowledge
|
||||
SET superseded_by = $1
|
||||
WHERE caller_id = $2 AND fact_key = $3 AND id <> $1 AND superseded_by IS NULL
|
||||
`,
|
||||
[newId, caller, key]
|
||||
);
|
||||
}
|
||||
} catch (err) {
|
||||
logger.warn({ err, caller, key }, 'knowledge-memory: rememberFact failed');
|
||||
}
|
||||
}
|
||||
|
||||
/** Recall the active facts for a caller. Returns at most `limit`. */
|
||||
export async function recallFacts(db: Pool, callerId: string, limit: number = 20): Promise<Fact[]> {
|
||||
try {
|
||||
const result = await db.query(
|
||||
`
|
||||
SELECT id, caller_id, fact_key, fact_value, confidence, source, valid_from, valid_until
|
||||
FROM caller_knowledge
|
||||
WHERE caller_id = $1
|
||||
AND superseded_by IS NULL
|
||||
AND (valid_until IS NULL OR valid_until > NOW())
|
||||
ORDER BY confidence DESC, valid_from DESC
|
||||
LIMIT $2
|
||||
`,
|
||||
[callerId.trim().toLowerCase(), limit]
|
||||
);
|
||||
return result.rows.map((row: any) => ({
|
||||
id: Number(row.id),
|
||||
callerId: row.caller_id,
|
||||
factKey: row.fact_key,
|
||||
factValue: row.fact_value,
|
||||
confidence: parseFloat(row.confidence),
|
||||
source: row.source,
|
||||
validFrom: new Date(row.valid_from).toISOString(),
|
||||
validUntil: row.valid_until ? new Date(row.valid_until).toISOString() : undefined,
|
||||
}));
|
||||
} catch (err) {
|
||||
logger.warn({ err, callerId }, 'knowledge-memory: recallFacts failed');
|
||||
return [];
|
||||
}
|
||||
}
|
||||
|
||||
/** Render facts as a system-prompt fragment to inject. */
|
||||
export function factsToSystemFragment(facts: Fact[]): string {
|
||||
if (facts.length === 0) return '';
|
||||
return [
|
||||
'── Caller Context (from memory) ──',
|
||||
...facts.map((f) => `• ${f.factKey}: ${f.factValue}`),
|
||||
'──────────────────────────────────',
|
||||
].join('\n');
|
||||
}
|
||||
|
||||
/** Forget all facts for a caller (used by clear-memory endpoint). */
|
||||
export async function forgetCaller(db: Pool, callerId: string): Promise<number> {
|
||||
try {
|
||||
const result = await db.query(
|
||||
`DELETE FROM caller_knowledge WHERE caller_id = $1`,
|
||||
[callerId.trim().toLowerCase()]
|
||||
);
|
||||
return result.rowCount ?? 0;
|
||||
} catch (err) {
|
||||
logger.warn({ err, callerId }, 'knowledge-memory: forgetCaller failed');
|
||||
return 0;
|
||||
}
|
||||
}
|
||||
94
packages/gateway/src/modules/memory-graph.ts
Normal file
94
packages/gateway/src/modules/memory-graph.ts
Normal file
@ -0,0 +1,94 @@
|
||||
/**
|
||||
* Memory Graph Builder
|
||||
*
|
||||
* Returns the persistent-memory facts as a graph: nodes are callers and
|
||||
* fact-categories, edges connect callers → facts. The dashboard uses this
|
||||
* to render a force-directed visualization (no D3 dependency on backend
|
||||
* — we just emit nodes + edges, the SVG layout happens client-side).
|
||||
*/
|
||||
import type { Pool } from 'pg';
|
||||
import { logger } from '../observability/logger.js';
|
||||
|
||||
export interface GraphNode {
|
||||
id: string;
|
||||
type: 'caller' | 'fact-key' | 'fact-value';
|
||||
label: string;
|
||||
/** Bigger = more facts attached. */
|
||||
weight: number;
|
||||
/** UI hint: caller-color hex / category icon. */
|
||||
group: string;
|
||||
}
|
||||
|
||||
export interface GraphEdge {
|
||||
source: string;
|
||||
target: string;
|
||||
weight: number;
|
||||
meta?: { confidence?: number; source?: string };
|
||||
}
|
||||
|
||||
export interface MemoryGraph {
|
||||
nodes: GraphNode[];
|
||||
edges: GraphEdge[];
|
||||
stats: { callers: number; factKeys: number; totalFacts: number };
|
||||
}
|
||||
|
||||
/**
|
||||
* Build the graph by joining caller_knowledge to itself.
|
||||
* Caller node ↔ fact-key node ↔ fact-value node.
|
||||
*/
|
||||
export async function buildMemoryGraph(db: Pool): Promise<MemoryGraph> {
|
||||
try {
|
||||
const r = await db.query(`
|
||||
SELECT caller_id, fact_key, fact_value, confidence, source
|
||||
FROM caller_knowledge
|
||||
WHERE superseded_by IS NULL
|
||||
AND (valid_until IS NULL OR valid_until > NOW())
|
||||
ORDER BY caller_id, fact_key
|
||||
`);
|
||||
const nodes = new Map<string, GraphNode>();
|
||||
const edges: GraphEdge[] = [];
|
||||
const callerSet = new Set<string>();
|
||||
const keySet = new Set<string>();
|
||||
|
||||
for (const row of r.rows) {
|
||||
const caller = String(row.caller_id);
|
||||
const key = String(row.fact_key);
|
||||
const value = String(row.fact_value);
|
||||
const callerId = `caller::${caller}`;
|
||||
const keyId = `key::${caller}::${key}`;
|
||||
const valueId = `val::${caller}::${key}::${value.slice(0, 80)}`;
|
||||
|
||||
callerSet.add(caller);
|
||||
keySet.add(`${caller}::${key}`);
|
||||
|
||||
if (!nodes.has(callerId)) {
|
||||
nodes.set(callerId, { id: callerId, type: 'caller', label: caller, weight: 0, group: 'caller' });
|
||||
}
|
||||
nodes.get(callerId)!.weight += 1;
|
||||
|
||||
if (!nodes.has(keyId)) {
|
||||
nodes.set(keyId, { id: keyId, type: 'fact-key', label: key, weight: 1, group: caller });
|
||||
}
|
||||
if (!nodes.has(valueId)) {
|
||||
nodes.set(valueId, { id: valueId, type: 'fact-value', label: value.slice(0, 80), weight: 1, group: caller });
|
||||
}
|
||||
|
||||
edges.push({
|
||||
source: callerId, target: keyId, weight: 1,
|
||||
});
|
||||
edges.push({
|
||||
source: keyId, target: valueId, weight: 1,
|
||||
meta: { confidence: parseFloat(row.confidence) || 0.8, source: row.source ?? undefined },
|
||||
});
|
||||
}
|
||||
|
||||
return {
|
||||
nodes: Array.from(nodes.values()),
|
||||
edges,
|
||||
stats: { callers: callerSet.size, factKeys: keySet.size, totalFacts: r.rows.length },
|
||||
};
|
||||
} catch (err) {
|
||||
logger.warn({ err }, 'memory-graph: build failed');
|
||||
return { nodes: [], edges: [], stats: { callers: 0, factKeys: 0, totalFacts: 0 } };
|
||||
}
|
||||
}
|
||||
111
packages/gateway/src/modules/race-leaderboard.ts
Normal file
111
packages/gateway/src/modules/race-leaderboard.ts
Normal file
@ -0,0 +1,111 @@
|
||||
/**
|
||||
* Race Mode Leaderboard
|
||||
*
|
||||
* Aggregates `race_mode_results` to produce a weekly model leaderboard:
|
||||
* who finished first most often, who had highest confidence, who was
|
||||
* fastest on average. Used by the dashboard for the leaderboard tab and
|
||||
* by the router (future) to bias against perpetually losing models.
|
||||
*/
|
||||
import type { Pool } from 'pg';
|
||||
import { logger } from '../observability/logger.js';
|
||||
|
||||
export interface LeaderboardEntry {
|
||||
model: string;
|
||||
participations: number;
|
||||
selectedCount: number;
|
||||
firstFinishedCount: number;
|
||||
/** Win rate = selectedCount / participations. */
|
||||
winRate: number;
|
||||
/** Speed rate = firstFinishedCount / participations. */
|
||||
speedRate: number;
|
||||
avgLatencyMs: number;
|
||||
avgConfidence: number | null;
|
||||
totalCost: number;
|
||||
/** Composite score: 60% speed + 40% confidence, used to rank. */
|
||||
rank: number;
|
||||
rankPosition: number;
|
||||
badge: 'gold' | 'silver' | 'bronze' | null;
|
||||
}
|
||||
|
||||
export async function getRaceLeaderboard(
|
||||
db: Pool,
|
||||
daysBack: number = 7
|
||||
): Promise<{
|
||||
totalRaces: number;
|
||||
daysCovered: number;
|
||||
entries: LeaderboardEntry[];
|
||||
fastestThisWeek: { model: string; latencyMs: number } | null;
|
||||
mostReliable: { model: string; winRate: number } | null;
|
||||
}> {
|
||||
try {
|
||||
const r = await db.query(`
|
||||
SELECT candidate_model AS model,
|
||||
COUNT(*)::INT AS participations,
|
||||
SUM(CASE WHEN selected THEN 1 ELSE 0 END)::INT AS selected_count,
|
||||
SUM(CASE WHEN finished_first THEN 1 ELSE 0 END)::INT AS first_finished_count,
|
||||
COALESCE(AVG(latency_ms), 0)::NUMERIC(10,1) AS avg_latency,
|
||||
AVG(confidence)::NUMERIC(4,2) AS avg_confidence,
|
||||
COALESCE(SUM(cost_usd), 0)::NUMERIC AS total_cost
|
||||
FROM race_mode_results
|
||||
WHERE created_at > NOW() - MAKE_INTERVAL(days => $1)
|
||||
GROUP BY candidate_model
|
||||
ORDER BY first_finished_count DESC, avg_confidence DESC NULLS LAST
|
||||
`, [daysBack]);
|
||||
|
||||
const totalRow = await db.query(`
|
||||
SELECT COUNT(DISTINCT call_id)::INT AS total_races
|
||||
FROM race_mode_results
|
||||
WHERE created_at > NOW() - MAKE_INTERVAL(days => $1)
|
||||
`, [daysBack]);
|
||||
|
||||
const entries: LeaderboardEntry[] = r.rows.map((row: any) => {
|
||||
const participations = parseInt(row.participations, 10) || 0;
|
||||
const selectedCount = parseInt(row.selected_count, 10) || 0;
|
||||
const firstFinished = parseInt(row.first_finished_count, 10) || 0;
|
||||
const avgLatency = parseFloat(row.avg_latency) || 0;
|
||||
const avgConfidence = row.avg_confidence ? parseFloat(row.avg_confidence) : null;
|
||||
const winRate = participations > 0 ? selectedCount / participations : 0;
|
||||
const speedRate = participations > 0 ? firstFinished / participations : 0;
|
||||
// Composite rank: 60% speed + 40% confidence (or 50/50 if no confidence)
|
||||
const confScore = avgConfidence !== null ? (avgConfidence / 10) : 0.5;
|
||||
const rank = speedRate * 0.6 + confScore * 0.4;
|
||||
return {
|
||||
model: row.model,
|
||||
participations,
|
||||
selectedCount,
|
||||
firstFinishedCount: firstFinished,
|
||||
winRate: parseFloat(winRate.toFixed(3)),
|
||||
speedRate: parseFloat(speedRate.toFixed(3)),
|
||||
avgLatencyMs: avgLatency,
|
||||
avgConfidence,
|
||||
totalCost: parseFloat(row.total_cost) || 0,
|
||||
rank: parseFloat(rank.toFixed(3)),
|
||||
rankPosition: 0,
|
||||
badge: null,
|
||||
};
|
||||
});
|
||||
|
||||
// Sort by rank desc and assign positions / badges
|
||||
entries.sort((a, b) => b.rank - a.rank);
|
||||
entries.forEach((e, i) => {
|
||||
e.rankPosition = i + 1;
|
||||
if (i === 0) e.badge = 'gold';
|
||||
else if (i === 1) e.badge = 'silver';
|
||||
else if (i === 2) e.badge = 'bronze';
|
||||
});
|
||||
|
||||
const fastest = [...entries].sort((a, b) => a.avgLatencyMs - b.avgLatencyMs)[0];
|
||||
const reliable = [...entries].filter((e) => e.participations >= 2).sort((a, b) => b.winRate - a.winRate)[0];
|
||||
|
||||
return {
|
||||
totalRaces: parseInt(totalRow.rows[0]?.total_races ?? '0', 10),
|
||||
daysCovered: daysBack,
|
||||
entries,
|
||||
fastestThisWeek: fastest ? { model: fastest.model, latencyMs: fastest.avgLatencyMs } : null,
|
||||
mostReliable: reliable ? { model: reliable.model, winRate: reliable.winRate } : null,
|
||||
};
|
||||
} catch (err) {
|
||||
logger.warn({ err }, 'race-leaderboard: aggregation failed');
|
||||
return { totalRaces: 0, daysCovered: daysBack, entries: [], fastestThisWeek: null, mostReliable: null };
|
||||
}
|
||||
}
|
||||
223
packages/gateway/src/modules/race-mode.ts
Normal file
223
packages/gateway/src/modules/race-mode.ts
Normal file
@ -0,0 +1,223 @@
|
||||
/**
|
||||
* Multi-Model Race Mode
|
||||
*
|
||||
* Sends the same prompt to N models in parallel and returns according to
|
||||
* the chosen strategy:
|
||||
*
|
||||
* • 'first' — first non-error response wins. Cancels in-flight losers.
|
||||
* • 'best' — wait for all (or timeout), pick highest confidence score.
|
||||
* • 'consensus' — wait for all, return majority answer + agreement score.
|
||||
*
|
||||
* All candidate runs are audited to `race_mode_results` for analysis —
|
||||
* which model is actually fastest, which gives the highest confidence, etc.
|
||||
*/
|
||||
import type { Pool } from 'pg';
|
||||
import { logger } from '../observability/logger.js';
|
||||
|
||||
export type RaceStrategy = 'first' | 'best' | 'consensus';
|
||||
|
||||
export interface RaceCandidateResult {
|
||||
model: string;
|
||||
status: 'ok' | 'error';
|
||||
output?: string;
|
||||
confidence?: number;
|
||||
cost?: number;
|
||||
latencyMs: number;
|
||||
errorMessage?: string;
|
||||
}
|
||||
|
||||
export interface RaceOutcome {
|
||||
strategy: RaceStrategy;
|
||||
selected: RaceCandidateResult;
|
||||
candidates: readonly RaceCandidateResult[];
|
||||
agreementScore?: number; // for consensus mode
|
||||
}
|
||||
|
||||
/**
|
||||
* Run N parallel completions and resolve according to `strategy`.
|
||||
* The `runner` callback is responsible for actually invoking the gateway
|
||||
* pipeline — this module is strategy-only and stays decoupled.
|
||||
*/
|
||||
export async function runRace<R extends RaceCandidateResult>(
|
||||
models: readonly string[],
|
||||
runner: (model: string, signal: AbortSignal) => Promise<R>,
|
||||
strategy: RaceStrategy,
|
||||
opts: { timeoutMs?: number } = {}
|
||||
): Promise<{ outcome: RaceOutcome; results: R[] }> {
|
||||
if (models.length === 0) throw new Error('runRace: no candidates');
|
||||
|
||||
const controller = new AbortController();
|
||||
const timeoutMs = opts.timeoutMs ?? 60_000;
|
||||
const timeout = setTimeout(() => controller.abort(), timeoutMs);
|
||||
|
||||
const promises: Array<Promise<R>> = models.map((model) =>
|
||||
runner(model, controller.signal).catch(
|
||||
(err): R =>
|
||||
({
|
||||
model,
|
||||
status: 'error',
|
||||
errorMessage: err instanceof Error ? err.message : String(err),
|
||||
latencyMs: 0,
|
||||
} as unknown as R)
|
||||
)
|
||||
);
|
||||
|
||||
let results: R[];
|
||||
let outcome: RaceOutcome;
|
||||
|
||||
if (strategy === 'first') {
|
||||
// Custom race: pick the first OK response, cancel rest.
|
||||
const firstOk = await new Promise<R>((resolve, reject) => {
|
||||
let pending = promises.length;
|
||||
let firstError: R | null = null;
|
||||
promises.forEach((p) => {
|
||||
p.then((r) => {
|
||||
if (r.status === 'ok') {
|
||||
resolve(r);
|
||||
} else {
|
||||
if (!firstError) firstError = r;
|
||||
pending -= 1;
|
||||
if (pending === 0) reject(new Error('all candidates errored'));
|
||||
}
|
||||
});
|
||||
});
|
||||
// Backstop on overall timeout
|
||||
setTimeout(() => {
|
||||
if (firstError) resolve(firstError);
|
||||
else reject(new Error('race timeout'));
|
||||
}, timeoutMs);
|
||||
});
|
||||
results = await Promise.all(promises);
|
||||
controller.abort();
|
||||
outcome = { strategy, selected: firstOk, candidates: results };
|
||||
} else if (strategy === 'best') {
|
||||
results = await Promise.all(promises);
|
||||
const ok = results.filter((r) => r.status === 'ok');
|
||||
const winner = ok.length > 0
|
||||
? ok.sort((a, b) => (b.confidence ?? 0) - (a.confidence ?? 0))[0]
|
||||
: results[0];
|
||||
outcome = { strategy, selected: winner, candidates: results };
|
||||
} else {
|
||||
// 'consensus' — group identical normalised outputs, pick majority
|
||||
results = await Promise.all(promises);
|
||||
const ok = results.filter((r) => r.status === 'ok');
|
||||
const buckets = new Map<string, R[]>();
|
||||
for (const r of ok) {
|
||||
const key = (r.output ?? '').trim().toLowerCase().replace(/\s+/g, ' ').slice(0, 256);
|
||||
const arr = buckets.get(key);
|
||||
if (arr) arr.push(r); else buckets.set(key, [r]);
|
||||
}
|
||||
const sorted = [...buckets.entries()].sort((a, b) => b[1].length - a[1].length);
|
||||
const winnerBucket = sorted[0]?.[1];
|
||||
const winner = winnerBucket && winnerBucket.length > 0
|
||||
? winnerBucket.sort((a, b) => (b.confidence ?? 0) - (a.confidence ?? 0))[0]
|
||||
: results[0];
|
||||
const agreementScore = ok.length > 0 ? (winnerBucket?.length ?? 0) / ok.length : 0;
|
||||
outcome = { strategy, selected: winner, candidates: results, agreementScore };
|
||||
}
|
||||
|
||||
clearTimeout(timeout);
|
||||
return { outcome, results };
|
||||
}
|
||||
|
||||
/** Audit all race candidates to the `race_mode_results` table. */
|
||||
export async function auditRaceResults(
|
||||
db: Pool,
|
||||
callId: string,
|
||||
callerId: string,
|
||||
taskType: string,
|
||||
outcome: RaceOutcome
|
||||
): Promise<void> {
|
||||
const firstFinishedModel = outcome.strategy === 'first'
|
||||
? outcome.selected.model
|
||||
: outcome.candidates.reduce(
|
||||
(best: RaceCandidateResult, c: RaceCandidateResult) =>
|
||||
c.status === 'ok' && c.latencyMs < (best.latencyMs || Infinity) ? c : best,
|
||||
outcome.candidates[0]
|
||||
).model;
|
||||
|
||||
for (const c of outcome.candidates) {
|
||||
try {
|
||||
await db.query(
|
||||
`
|
||||
INSERT INTO race_mode_results (
|
||||
call_id, caller_id, task_type, strategy,
|
||||
candidate_model, finished_first, selected,
|
||||
latency_ms, confidence, cost_usd, error_message, output_preview
|
||||
) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12)
|
||||
`,
|
||||
[
|
||||
callId,
|
||||
callerId.toLowerCase(),
|
||||
taskType,
|
||||
outcome.strategy,
|
||||
c.model,
|
||||
c.model === firstFinishedModel,
|
||||
c.model === outcome.selected.model,
|
||||
c.latencyMs,
|
||||
c.confidence ?? null,
|
||||
c.cost ?? null,
|
||||
c.errorMessage ?? null,
|
||||
c.output?.slice(0, 512) ?? null,
|
||||
]
|
||||
);
|
||||
} catch (err) {
|
||||
logger.warn({ err, model: c.model }, 'race-mode: audit insert failed');
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/** Aggregate race statistics for the dashboard. */
|
||||
export async function getRaceStats(
|
||||
db: Pool,
|
||||
hoursBack: number = 24
|
||||
): Promise<{
|
||||
totalRaces: number;
|
||||
byStrategy: Record<string, number>;
|
||||
fastestModel: { model: string; wins: number } | null;
|
||||
highestConfidenceModel: { model: string; avg: number } | null;
|
||||
}> {
|
||||
try {
|
||||
const [total, byStrategy, fastest, byConfidence] = await Promise.all([
|
||||
db.query(
|
||||
`SELECT COUNT(DISTINCT call_id)::INT AS n FROM race_mode_results
|
||||
WHERE created_at > NOW() - MAKE_INTERVAL(hours => $1)`,
|
||||
[hoursBack]
|
||||
),
|
||||
db.query(
|
||||
`SELECT strategy, COUNT(DISTINCT call_id)::INT AS n FROM race_mode_results
|
||||
WHERE created_at > NOW() - MAKE_INTERVAL(hours => $1)
|
||||
GROUP BY strategy`,
|
||||
[hoursBack]
|
||||
),
|
||||
db.query(
|
||||
`SELECT candidate_model AS model, COUNT(*)::INT AS wins FROM race_mode_results
|
||||
WHERE finished_first = true AND created_at > NOW() - MAKE_INTERVAL(hours => $1)
|
||||
GROUP BY candidate_model ORDER BY wins DESC LIMIT 1`,
|
||||
[hoursBack]
|
||||
),
|
||||
db.query(
|
||||
`SELECT candidate_model AS model, AVG(confidence)::NUMERIC(4,2) AS avg
|
||||
FROM race_mode_results
|
||||
WHERE confidence IS NOT NULL AND created_at > NOW() - MAKE_INTERVAL(hours => $1)
|
||||
GROUP BY candidate_model ORDER BY avg DESC LIMIT 1`,
|
||||
[hoursBack]
|
||||
),
|
||||
]);
|
||||
|
||||
const byStrategyMap: Record<string, number> = {};
|
||||
for (const row of byStrategy.rows) byStrategyMap[row.strategy] = parseInt(row.n, 10) || 0;
|
||||
|
||||
return {
|
||||
totalRaces: parseInt(total.rows[0]?.n ?? '0', 10),
|
||||
byStrategy: byStrategyMap,
|
||||
fastestModel: fastest.rows[0] ? { model: fastest.rows[0].model, wins: parseInt(fastest.rows[0].wins, 10) } : null,
|
||||
highestConfidenceModel: byConfidence.rows[0]
|
||||
? { model: byConfidence.rows[0].model, avg: parseFloat(byConfidence.rows[0].avg) }
|
||||
: null,
|
||||
};
|
||||
} catch (err) {
|
||||
logger.warn({ err }, 'race-mode: stats failed (table missing?)');
|
||||
return { totalRaces: 0, byStrategy: {}, fastestModel: null, highestConfidenceModel: null };
|
||||
}
|
||||
}
|
||||
218
packages/gateway/src/modules/report-generator.ts
Normal file
218
packages/gateway/src/modules/report-generator.ts
Normal file
@ -0,0 +1,218 @@
|
||||
/**
|
||||
* Monthly Report Generator
|
||||
*
|
||||
* Renders a print-friendly HTML report (intended to be saved as PDF via the
|
||||
* browser's print dialog). Includes hero counters, savings breakdown by
|
||||
* source, top models, top callers, achievements unlocked this month, and
|
||||
* the activity heatmap.
|
||||
*
|
||||
* Going via HTML+print-CSS sidesteps any need for an external PDF library
|
||||
* — the user clicks the gateway's "Print to PDF" link and saves the page.
|
||||
*/
|
||||
import type { Pool } from 'pg';
|
||||
import { getComprehensiveSavings } from './savings-calculator.js';
|
||||
import { getBuddyState, getAchievements } from './gamification.js';
|
||||
|
||||
function formatCost(c: number): string {
|
||||
if (c === 0) return '$0.00';
|
||||
if (c < 0.01) return `$${c.toFixed(6)}`;
|
||||
if (c < 1) return `$${c.toFixed(4)}`;
|
||||
return `$${c.toFixed(2)}`;
|
||||
}
|
||||
function fmtNum(n: number): string { return n.toLocaleString(); }
|
||||
function fmtPct(n: number): string { return `${(n * 100).toFixed(1)}%`; }
|
||||
|
||||
export async function generateMonthlyReport(
|
||||
db: Pool,
|
||||
year: number,
|
||||
month: number
|
||||
): Promise<string> {
|
||||
const monthStart = new Date(Date.UTC(year, month - 1, 1));
|
||||
const monthEnd = new Date(Date.UTC(year, month, 1));
|
||||
const hoursBack = Math.ceil((Date.now() - monthStart.getTime()) / 3600_000);
|
||||
const monthName = monthStart.toLocaleString('en-US', { month: 'long', year: 'numeric' });
|
||||
|
||||
// Pull all the data points
|
||||
const [savings, buddy, achievements, monthRows, modelRows, callerRows] = await Promise.all([
|
||||
getComprehensiveSavings(db, hoursBack),
|
||||
getBuddyState(db, 'gateway'),
|
||||
getAchievements(db),
|
||||
db.query(`
|
||||
SELECT COUNT(*)::INT AS req,
|
||||
COALESCE(SUM(tokens_in + tokens_out), 0)::BIGINT AS tokens,
|
||||
COALESCE(AVG(latency_ms), 0)::INT AS avg_lat,
|
||||
COALESCE(SUM(cost_usd), 0)::NUMERIC AS cost,
|
||||
SUM(CASE WHEN status='approved' THEN 1 ELSE 0 END)::FLOAT / NULLIF(COUNT(*),0) AS success_rate
|
||||
FROM request_tracking
|
||||
WHERE created_at >= $1 AND created_at < $2
|
||||
`, [monthStart, monthEnd]),
|
||||
db.query(`
|
||||
SELECT model, COUNT(*)::INT AS cnt
|
||||
FROM request_tracking
|
||||
WHERE created_at >= $1 AND created_at < $2
|
||||
GROUP BY model ORDER BY cnt DESC LIMIT 8
|
||||
`, [monthStart, monthEnd]),
|
||||
db.query(`
|
||||
SELECT caller_id, COUNT(*)::INT AS cnt, COALESCE(SUM(cost_usd), 0)::NUMERIC AS cost
|
||||
FROM request_tracking
|
||||
WHERE created_at >= $1 AND created_at < $2
|
||||
GROUP BY caller_id ORDER BY cnt DESC LIMIT 8
|
||||
`, [monthStart, monthEnd]),
|
||||
]);
|
||||
|
||||
const monthStats = monthRows.rows[0] ?? {};
|
||||
const totalReq = parseInt(monthStats.req ?? '0', 10);
|
||||
const totalTokens = parseInt(monthStats.tokens ?? '0', 10);
|
||||
const monthCost = parseFloat(monthStats.cost ?? '0');
|
||||
const successRate = parseFloat(monthStats.success_rate ?? '0');
|
||||
const avgLat = parseInt(monthStats.avg_lat ?? '0', 10);
|
||||
|
||||
const newAchievements = achievements.unlocked
|
||||
.filter(() => true) // all unlocked are shown; "this month" filter would need timestamp
|
||||
.slice(0, 12);
|
||||
|
||||
const html = /* html */ `
|
||||
<!DOCTYPE html>
|
||||
<html><head>
|
||||
<meta charset="utf-8">
|
||||
<title>LLM Gateway · Monthly Report · ${monthName}</title>
|
||||
<style>
|
||||
@page { size: A4; margin: 18mm 16mm; }
|
||||
body { font-family: 'Inter', -apple-system, sans-serif; font-size: 11pt; color: #24313d; line-height: 1.5; }
|
||||
h1 { font-size: 22pt; font-weight: 700; letter-spacing: -0.02em; margin: 0 0 4pt; color: #0f766e; }
|
||||
h2 { font-size: 13pt; font-weight: 600; margin: 16pt 0 8pt; padding-bottom: 4pt; border-bottom: 1pt solid #d6e0e7; color: #0f766e; }
|
||||
h2::before { content: '// '; }
|
||||
.eyebrow { font-family: 'JetBrains Mono', monospace; font-size: 8pt; letter-spacing: 0.16em; text-transform: uppercase; color: #667684; }
|
||||
.hero { display: grid; grid-template-columns: 1fr 1fr 1fr; gap: 8pt; margin: 12pt 0 18pt; }
|
||||
.hero-tile { padding: 10pt; border: 0.5pt solid #d6e0e7; background: #f4f7fa; }
|
||||
.hero-num { font-family: 'JetBrains Mono', monospace; font-size: 22pt; font-weight: 700; color: #0f766e; line-height: 1; }
|
||||
.hero-label { font-size: 8pt; text-transform: uppercase; letter-spacing: 0.1em; color: #667684; margin-bottom: 4pt; }
|
||||
table { width: 100%; border-collapse: collapse; margin: 8pt 0; font-size: 10pt; }
|
||||
th, td { padding: 4pt 8pt; border-bottom: 0.3pt solid #d6e0e7; text-align: left; }
|
||||
th { font-weight: 600; color: #667684; font-size: 8pt; text-transform: uppercase; letter-spacing: 0.1em; }
|
||||
td.num { font-family: 'JetBrains Mono', monospace; text-align: right; }
|
||||
.axes { display: grid; grid-template-columns: repeat(5, 1fr); gap: 4pt; }
|
||||
.axis { padding: 8pt; border: 0.5pt solid #d6e0e7; background: #f4f7fa; text-align: center; }
|
||||
.axis-cost { font-family: 'JetBrains Mono', monospace; font-weight: 700; font-size: 11pt; color: #0f766e; }
|
||||
.axis-label { font-size: 7pt; color: #667684; text-transform: uppercase; letter-spacing: 0.08em; margin-top: 4pt; }
|
||||
.ach { display: inline-block; padding: 4pt 8pt; margin: 2pt; border: 0.5pt solid #0f766e; background: #ecfdf5; font-size: 9pt; }
|
||||
.footer { margin-top: 24pt; padding-top: 8pt; border-top: 0.3pt solid #d6e0e7; font-size: 8pt; color: #93a1ad; text-align: center; }
|
||||
.ascii-buddy { font-family: 'JetBrains Mono', monospace; font-size: 9pt; line-height: 1; white-space: pre; }
|
||||
.savings-vs { display: flex; gap: 8pt; align-items: center; margin: 12pt 0; }
|
||||
.savings-vs > div { flex: 1; padding: 10pt; border: 0.5pt solid #d6e0e7; }
|
||||
.savings-vs .without { background: #fef2f2; }
|
||||
.savings-vs .with { background: #ecfdf5; }
|
||||
.savings-vs .arrow { flex: 0; font-size: 14pt; color: #93a1ad; }
|
||||
.num-amount { font-family: 'JetBrains Mono', monospace; font-size: 16pt; font-weight: 700; }
|
||||
@media print { .no-print { display: none; } body { background: white; } }
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
|
||||
<div class="no-print" style="margin-bottom: 8pt; padding: 8pt; background: #ecfdf5; border-left: 3pt solid #0f766e;">
|
||||
<strong>Save as PDF</strong>: Press <code>Cmd/Ctrl+P</code> → choose "Save as PDF".
|
||||
</div>
|
||||
|
||||
<header>
|
||||
<div class="eyebrow">monthly report</div>
|
||||
<h1>${monthName}</h1>
|
||||
<div style="font-family: 'JetBrains Mono', monospace; font-size: 9pt; color: #667684;">
|
||||
LLM Gateway · ${new Date().toISOString().split('T')[0]}
|
||||
</div>
|
||||
</header>
|
||||
|
||||
<div class="hero">
|
||||
<div class="hero-tile">
|
||||
<div class="hero-label">requests routed</div>
|
||||
<div class="hero-num">${fmtNum(totalReq)}</div>
|
||||
</div>
|
||||
<div class="hero-tile">
|
||||
<div class="hero-label">tokens processed</div>
|
||||
<div class="hero-num">${fmtNum(totalTokens)}</div>
|
||||
</div>
|
||||
<div class="hero-tile">
|
||||
<div class="hero-label">cost saved</div>
|
||||
<div class="hero-num">${formatCost(savings.totalCostSaved)}</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<h2>Cost Analysis</h2>
|
||||
<div class="savings-vs">
|
||||
<div class="without">
|
||||
<div class="hero-label">without gateway</div>
|
||||
<div class="num-amount" style="color: #b42318;">${formatCost(savings.costWithoutGateway)}</div>
|
||||
</div>
|
||||
<div class="arrow">→</div>
|
||||
<div class="with">
|
||||
<div class="hero-label">with gateway</div>
|
||||
<div class="num-amount" style="color: #15803d;">${formatCost(savings.costWithGateway)}</div>
|
||||
</div>
|
||||
</div>
|
||||
<p>Saved <strong>${formatCost(savings.costWithoutGateway - savings.costWithGateway)}</strong> through cache hits, compression, subscription bridges, local routing, and race-mode optimization.</p>
|
||||
|
||||
<h2>Savings by Source</h2>
|
||||
<div class="axes">
|
||||
<div class="axis"><div class="axis-cost">${formatCost(savings.bySource.cache.cost)}</div><div class="axis-label">⚡ Cache</div></div>
|
||||
<div class="axis"><div class="axis-cost">${formatCost(savings.bySource.compression.cost)}</div><div class="axis-label">🗜 Compression</div></div>
|
||||
<div class="axis"><div class="axis-cost">${formatCost(savings.bySource.subscriptionBridge.cost)}</div><div class="axis-label">🌉 Sub. Bridges</div></div>
|
||||
<div class="axis"><div class="axis-cost">${formatCost(savings.bySource.localRouting.cost)}</div><div class="axis-label">🏠 Local</div></div>
|
||||
<div class="axis"><div class="axis-cost">${formatCost(savings.bySource.raceMode.cost)}</div><div class="axis-label">🏁 Race</div></div>
|
||||
</div>
|
||||
|
||||
<h2>Activity Summary</h2>
|
||||
<table>
|
||||
<tr><th>Metric</th><th>Value</th></tr>
|
||||
<tr><td>Total requests</td><td class="num">${fmtNum(totalReq)}</td></tr>
|
||||
<tr><td>Average latency</td><td class="num">${fmtNum(avgLat)} ms</td></tr>
|
||||
<tr><td>Success rate</td><td class="num">${fmtPct(successRate)}</td></tr>
|
||||
<tr><td>Cost actually paid</td><td class="num">${formatCost(monthCost)}</td></tr>
|
||||
</table>
|
||||
|
||||
<h2>Top Models This Month</h2>
|
||||
<table>
|
||||
<tr><th>Model</th><th>Requests</th><th>Share</th></tr>
|
||||
${modelRows.rows.map((r: any) => `
|
||||
<tr>
|
||||
<td><code>${r.model}</code></td>
|
||||
<td class="num">${fmtNum(parseInt(r.cnt,10))}</td>
|
||||
<td class="num">${totalReq > 0 ? ((parseInt(r.cnt,10)/totalReq)*100).toFixed(1) : 0}%</td>
|
||||
</tr>
|
||||
`).join('')}
|
||||
</table>
|
||||
|
||||
<h2>Top Callers This Month</h2>
|
||||
<table>
|
||||
<tr><th>Caller</th><th>Requests</th><th>Cost</th></tr>
|
||||
${callerRows.rows.map((r: any) => `
|
||||
<tr>
|
||||
<td><code>${r.caller_id}</code></td>
|
||||
<td class="num">${fmtNum(parseInt(r.cnt,10))}</td>
|
||||
<td class="num">${formatCost(parseFloat(r.cost))}</td>
|
||||
</tr>
|
||||
`).join('')}
|
||||
</table>
|
||||
|
||||
<h2>Achievements Unlocked</h2>
|
||||
<div>
|
||||
${newAchievements.map((a) => `<span class="ach">${a.icon} ${a.title}</span>`).join('')}
|
||||
${newAchievements.length === 0 ? '<em>No achievements unlocked yet — keep using the gateway!</em>' : ''}
|
||||
</div>
|
||||
|
||||
<h2>Buddy Status</h2>
|
||||
<div style="display: flex; gap: 12pt; align-items: center; padding: 10pt; border: 0.5pt solid #d6e0e7;">
|
||||
<div class="ascii-buddy">${buddy.asciiArt.join('\n')}</div>
|
||||
<div>
|
||||
<strong>${buddy.name}</strong> · ${buddy.species} · ${buddy.stage}<br>
|
||||
Level ${buddy.level} · XP ${fmtNum(buddy.xp)}/${fmtNum(buddy.xpForNextLevel)}<br>
|
||||
Mood: ${buddy.mood} · Streak: ${buddy.streakDays} days<br>
|
||||
<em>"${buddy.speech}"</em>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="footer">
|
||||
Generated by LLM Gateway · ${new Date().toISOString()} · llm-gateway.context-x.org
|
||||
</div>
|
||||
|
||||
</body></html>`;
|
||||
return html;
|
||||
}
|
||||
@ -109,6 +109,11 @@ export class RequestLogger {
|
||||
cost_usd: number;
|
||||
latency_ms: number;
|
||||
fallback_used: boolean;
|
||||
compression_mode?: string;
|
||||
compression_tokens_before?: number;
|
||||
compression_tokens_after?: number;
|
||||
compression_tokens_saved?: number;
|
||||
compression_savings_pct?: number;
|
||||
error_message?: string;
|
||||
created_at: string;
|
||||
}>
|
||||
@ -116,22 +121,35 @@ export class RequestLogger {
|
||||
const result = await this.db.query(
|
||||
`
|
||||
SELECT
|
||||
request_id,
|
||||
caller_id as caller,
|
||||
task_type,
|
||||
model,
|
||||
status,
|
||||
confidence_score,
|
||||
tokens_in,
|
||||
tokens_out,
|
||||
cost_usd,
|
||||
latency_ms,
|
||||
fallback_used,
|
||||
error_message,
|
||||
created_at
|
||||
FROM request_tracking
|
||||
WHERE created_at > NOW() - MAKE_INTERVAL(hours => $1)
|
||||
ORDER BY created_at DESC
|
||||
rt.request_id,
|
||||
rt.caller_id as caller,
|
||||
rt.task_type,
|
||||
rt.model,
|
||||
rt.status,
|
||||
rt.confidence_score,
|
||||
rt.tokens_in,
|
||||
rt.tokens_out,
|
||||
rt.cost_usd,
|
||||
rt.latency_ms,
|
||||
rt.fallback_used,
|
||||
tv.mode as compression_mode,
|
||||
tv.tokens_before as compression_tokens_before,
|
||||
tv.tokens_after as compression_tokens_after,
|
||||
GREATEST(COALESCE(tv.tokens_before, 0) - COALESCE(tv.tokens_after, 0), 0) as compression_tokens_saved,
|
||||
tv.savings_pct as compression_savings_pct,
|
||||
rt.error_message,
|
||||
rt.created_at
|
||||
FROM request_tracking rt
|
||||
LEFT JOIN LATERAL (
|
||||
SELECT mode, tokens_before, tokens_after, savings_pct
|
||||
FROM tokenvault_metrics
|
||||
WHERE tool_used = 'gateway'
|
||||
AND file_path = rt.request_id
|
||||
ORDER BY created_at DESC
|
||||
LIMIT 1
|
||||
) tv ON true
|
||||
WHERE rt.created_at > NOW() - MAKE_INTERVAL(hours => $1)
|
||||
ORDER BY rt.created_at DESC
|
||||
LIMIT $2
|
||||
`,
|
||||
[offsetHours, limit]
|
||||
@ -149,6 +167,11 @@ export class RequestLogger {
|
||||
cost_usd: row.cost_usd,
|
||||
latency_ms: row.latency_ms,
|
||||
fallback_used: row.fallback_used,
|
||||
compression_mode: row.compression_mode,
|
||||
compression_tokens_before: row.compression_tokens_before ? parseInt(row.compression_tokens_before, 10) : undefined,
|
||||
compression_tokens_after: row.compression_tokens_after ? parseInt(row.compression_tokens_after, 10) : undefined,
|
||||
compression_tokens_saved: row.compression_tokens_saved ? parseInt(row.compression_tokens_saved, 10) : 0,
|
||||
compression_savings_pct: row.compression_savings_pct ? parseFloat(row.compression_savings_pct) : 0,
|
||||
error_message: row.error_message,
|
||||
created_at: row.created_at
|
||||
}));
|
||||
@ -160,6 +183,17 @@ export class RequestLogger {
|
||||
async getMetrics(bucketMinutes: number = 60): Promise<{
|
||||
total_requests: number;
|
||||
total_cost: number;
|
||||
estimated_api_cost: number;
|
||||
estimated_api_cost_avoided: number;
|
||||
total_tokens_in: number;
|
||||
total_tokens_out: number;
|
||||
total_tokens: number;
|
||||
compression_operations: number;
|
||||
compression_tokens_before: number;
|
||||
compression_tokens_after: number;
|
||||
compression_tokens_saved: number;
|
||||
compression_rate: number;
|
||||
cache_hit_rate: number;
|
||||
avg_latency: number;
|
||||
success_rate: number;
|
||||
avg_confidence: number;
|
||||
@ -177,13 +211,15 @@ export class RequestLogger {
|
||||
`
|
||||
SELECT
|
||||
COUNT(*) as total_requests,
|
||||
SUM(cost_usd) as total_cost,
|
||||
AVG(latency_ms) as avg_latency,
|
||||
SUM(CASE WHEN status = 'approved' THEN 1 ELSE 0 END)::FLOAT / COUNT(*) as success_rate,
|
||||
AVG(confidence_score) as avg_confidence,
|
||||
SUM(CASE WHEN fallback_used = true THEN 1 ELSE 0 END)::FLOAT / COUNT(*) as fallback_percentage
|
||||
COALESCE(SUM(cost_usd), 0) as total_cost,
|
||||
COALESCE(SUM(tokens_in), 0) as total_tokens_in,
|
||||
COALESCE(SUM(tokens_out), 0) as total_tokens_out,
|
||||
COALESCE(AVG(latency_ms), 0) as avg_latency,
|
||||
CASE WHEN COUNT(*) = 0 THEN 0 ELSE SUM(CASE WHEN status = 'approved' THEN 1 ELSE 0 END)::FLOAT / COUNT(*) END as success_rate,
|
||||
COALESCE(AVG(confidence_score), 0) as avg_confidence,
|
||||
CASE WHEN COUNT(*) = 0 THEN 0 ELSE SUM(CASE WHEN fallback_used = true THEN 1 ELSE 0 END)::FLOAT / COUNT(*) END as fallback_percentage
|
||||
FROM request_tracking
|
||||
WHERE created_at > NOW() - MAKE_INTERVAL(mins => $1)
|
||||
WHERE created_at > NOW() - ($1 * INTERVAL '1 minute')
|
||||
`,
|
||||
[bucketMinutes]
|
||||
);
|
||||
@ -192,7 +228,7 @@ export class RequestLogger {
|
||||
`
|
||||
SELECT caller_id as caller, COUNT(*) as count
|
||||
FROM request_tracking
|
||||
WHERE created_at > NOW() - MAKE_INTERVAL(mins => $1)
|
||||
WHERE created_at > NOW() - ($1 * INTERVAL '1 minute')
|
||||
GROUP BY caller_id
|
||||
ORDER BY count DESC
|
||||
LIMIT 5
|
||||
@ -204,7 +240,7 @@ export class RequestLogger {
|
||||
`
|
||||
SELECT model, COUNT(*) as count
|
||||
FROM request_tracking
|
||||
WHERE created_at > NOW() - MAKE_INTERVAL(mins => $1)
|
||||
WHERE created_at > NOW() - ($1 * INTERVAL '1 minute')
|
||||
GROUP BY model
|
||||
ORDER BY count DESC
|
||||
LIMIT 5
|
||||
@ -224,11 +260,47 @@ export class RequestLogger {
|
||||
[bucketMinutes]
|
||||
);
|
||||
|
||||
const compressionResult = await this.db.query(
|
||||
`
|
||||
SELECT
|
||||
COUNT(*) as operations,
|
||||
COALESCE(SUM(tokens_before), 0) as tokens_before,
|
||||
COALESCE(SUM(tokens_after), 0) as tokens_after,
|
||||
COALESCE(SUM(GREATEST(tokens_before - tokens_after, 0)), 0) as tokens_saved
|
||||
FROM tokenvault_metrics
|
||||
WHERE tool_used = 'gateway'
|
||||
AND created_at > NOW() - ($1 * INTERVAL '1 minute')
|
||||
`,
|
||||
[bucketMinutes]
|
||||
);
|
||||
|
||||
const metrics = metricsResult.rows[0];
|
||||
const totalTokensIn = parseInt(metrics.total_tokens_in, 10) || 0;
|
||||
const totalTokensOut = parseInt(metrics.total_tokens_out, 10) || 0;
|
||||
const totalTokens = totalTokensIn + totalTokensOut;
|
||||
const compression = compressionResult.rows[0] ?? {};
|
||||
const compressionTokensBefore = parseInt(compression.tokens_before, 10) || 0;
|
||||
const compressionTokensAfter = parseInt(compression.tokens_after, 10) || 0;
|
||||
const compressionTokensSaved = parseInt(compression.tokens_saved, 10) || 0;
|
||||
const referenceInputCostPer1k = parseFloat(process.env['REFERENCE_INPUT_COST_PER_1K'] ?? '0.005');
|
||||
const referenceOutputCostPer1k = parseFloat(process.env['REFERENCE_OUTPUT_COST_PER_1K'] ?? '0.015');
|
||||
const estimatedApiCost = (totalTokensIn / 1000) * referenceInputCostPer1k + (totalTokensOut / 1000) * referenceOutputCostPer1k;
|
||||
const totalCost = parseFloat(metrics.total_cost) || 0;
|
||||
|
||||
return {
|
||||
total_requests: parseInt(metrics.total_requests) || 0,
|
||||
total_cost: parseFloat(metrics.total_cost) || 0,
|
||||
total_cost: totalCost,
|
||||
estimated_api_cost: estimatedApiCost,
|
||||
estimated_api_cost_avoided: Math.max(0, estimatedApiCost - totalCost),
|
||||
total_tokens_in: totalTokensIn,
|
||||
total_tokens_out: totalTokensOut,
|
||||
total_tokens: totalTokens,
|
||||
compression_operations: parseInt(compression.operations, 10) || 0,
|
||||
compression_tokens_before: compressionTokensBefore,
|
||||
compression_tokens_after: compressionTokensAfter,
|
||||
compression_tokens_saved: compressionTokensSaved,
|
||||
compression_rate: compressionTokensBefore > 0 ? compressionTokensSaved / compressionTokensBefore : 0,
|
||||
cache_hit_rate: 0,
|
||||
avg_latency: Math.round(parseFloat(metrics.avg_latency) || 0),
|
||||
success_rate: parseFloat(metrics.success_rate) || 0,
|
||||
avg_confidence: parseFloat(metrics.avg_confidence) || 0,
|
||||
|
||||
390
packages/gateway/src/modules/response-cache.ts
Normal file
390
packages/gateway/src/modules/response-cache.ts
Normal file
@ -0,0 +1,390 @@
|
||||
/**
|
||||
* Response Cache
|
||||
*
|
||||
* Two-tier cache:
|
||||
* • Tier 1 (exact) — sha256 of canonical request → instant lookup, $0 cost
|
||||
* • Tier 2 (semantic) — embedding cosine similarity, served via in-process
|
||||
* rerank when threshold is met. Implemented in v1 as
|
||||
* a string-similarity heuristic until pgvector is
|
||||
* provisioned. The interface is forward-compatible.
|
||||
*
|
||||
* Cache hits skip the entire LLM pipeline. Each hit increments the saved-cost
|
||||
* counter so the dashboard can show real savings in real time.
|
||||
*/
|
||||
|
||||
import { createHash } from 'crypto';
|
||||
import type { Pool } from 'pg';
|
||||
import { logger } from '../observability/logger.js';
|
||||
import { embed, vectorToPgLiteral, EMBEDDING_DIMENSION } from './embedding-client.js';
|
||||
|
||||
export interface CacheableRequest {
|
||||
caller: string;
|
||||
task_type?: string;
|
||||
model?: string;
|
||||
system?: string;
|
||||
input: string;
|
||||
}
|
||||
|
||||
export interface CachedResponse {
|
||||
id: number;
|
||||
cacheKey: string;
|
||||
responseJson: Record<string, unknown>;
|
||||
costWhenCached: number;
|
||||
tokensIn: number;
|
||||
tokensOut: number;
|
||||
hitCount: number;
|
||||
ageSeconds: number;
|
||||
}
|
||||
|
||||
/**
|
||||
* Compute a stable cache key for a request. Whitespace is collapsed and
|
||||
* lowercase used for the hash so functionally identical requests collide.
|
||||
*/
|
||||
export function computeCacheKey(req: CacheableRequest): string {
|
||||
const canonical = [
|
||||
`caller=${req.caller.trim().toLowerCase()}`,
|
||||
`task=${(req.task_type ?? '').trim().toLowerCase()}`,
|
||||
`model=${(req.model ?? '').trim().toLowerCase()}`,
|
||||
`system=${(req.system ?? '').trim().replace(/\s+/g, ' ').slice(0, 4096)}`,
|
||||
`input=${req.input.trim().replace(/\s+/g, ' ').slice(0, 16_384)}`,
|
||||
].join('\n');
|
||||
return createHash('sha256').update(canonical).digest('hex');
|
||||
}
|
||||
|
||||
/** Look up an exact cache hit. Returns null when no fresh entry exists. */
|
||||
export async function getCachedResponse(
|
||||
db: Pool,
|
||||
cacheKey: string
|
||||
): Promise<CachedResponse | null> {
|
||||
try {
|
||||
const result = await db.query(
|
||||
`
|
||||
SELECT id, cache_key, response_json, cost_when_cached, tokens_in, tokens_out,
|
||||
hit_count, EXTRACT(EPOCH FROM (NOW() - created_at))::INT AS age_seconds,
|
||||
ttl_seconds
|
||||
FROM response_cache
|
||||
WHERE cache_key = $1
|
||||
AND (created_at + (ttl_seconds * INTERVAL '1 second')) > NOW()
|
||||
LIMIT 1
|
||||
`,
|
||||
[cacheKey]
|
||||
);
|
||||
const row = result.rows[0];
|
||||
if (!row) return null;
|
||||
return {
|
||||
id: Number(row.id),
|
||||
cacheKey: row.cache_key,
|
||||
responseJson: row.response_json,
|
||||
costWhenCached: parseFloat(row.cost_when_cached) || 0,
|
||||
tokensIn: parseInt(row.tokens_in, 10) || 0,
|
||||
tokensOut: parseInt(row.tokens_out, 10) || 0,
|
||||
hitCount: parseInt(row.hit_count, 10) || 0,
|
||||
ageSeconds: parseInt(row.age_seconds, 10) || 0,
|
||||
};
|
||||
} catch (err) {
|
||||
logger.warn({ err }, 'response-cache: getCachedResponse failed (table missing?)');
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Look up a fuzzy/semantic match using pgvector cosine similarity.
|
||||
* Returns null when:
|
||||
* • embedding generation fails (Ollama down, model missing)
|
||||
* • no entry crosses the similarity threshold
|
||||
* • the table doesn't yet have the embedding column
|
||||
*/
|
||||
export async function getSemanticCachedResponse(
|
||||
db: Pool,
|
||||
caller: string,
|
||||
taskType: string | undefined,
|
||||
inputText: string,
|
||||
similarityThreshold: number = 0.92
|
||||
): Promise<(CachedResponse & { similarity: number }) | null> {
|
||||
const vec = await embed(inputText);
|
||||
if (!vec) return null;
|
||||
|
||||
try {
|
||||
const result = await db.query(
|
||||
`
|
||||
SELECT id, cache_key, response_json, cost_when_cached, tokens_in, tokens_out,
|
||||
hit_count, EXTRACT(EPOCH FROM (NOW() - created_at))::INT AS age_seconds,
|
||||
1 - (embedding <=> $1::vector) AS similarity
|
||||
FROM response_cache
|
||||
WHERE caller_id = $2
|
||||
AND ($3::TEXT IS NULL OR task_type = $3)
|
||||
AND embedding IS NOT NULL
|
||||
AND (created_at + (ttl_seconds * INTERVAL '1 second')) > NOW()
|
||||
ORDER BY embedding <=> $1::vector ASC
|
||||
LIMIT 1
|
||||
`,
|
||||
[vectorToPgLiteral(vec), caller.trim().toLowerCase(), taskType ?? null]
|
||||
);
|
||||
const row = result.rows[0];
|
||||
if (!row) return null;
|
||||
const sim = parseFloat(row.similarity);
|
||||
if (isNaN(sim) || sim < similarityThreshold) return null;
|
||||
return {
|
||||
id: Number(row.id),
|
||||
cacheKey: row.cache_key,
|
||||
responseJson: row.response_json,
|
||||
costWhenCached: parseFloat(row.cost_when_cached) || 0,
|
||||
tokensIn: parseInt(row.tokens_in, 10) || 0,
|
||||
tokensOut: parseInt(row.tokens_out, 10) || 0,
|
||||
hitCount: parseInt(row.hit_count, 10) || 0,
|
||||
ageSeconds: parseInt(row.age_seconds, 10) || 0,
|
||||
similarity: sim,
|
||||
};
|
||||
} catch (err) {
|
||||
logger.debug({ err }, 'response-cache: getSemanticCachedResponse failed (extension missing?)');
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
/** Persist a response. Idempotent on conflict — increments TTL window instead. */
|
||||
export async function setCachedResponse(
|
||||
db: Pool,
|
||||
req: CacheableRequest,
|
||||
response: Record<string, unknown>,
|
||||
meta: { cost: number; tokensIn: number; tokensOut: number; ttlSeconds?: number }
|
||||
): Promise<void> {
|
||||
const cacheKey = computeCacheKey(req);
|
||||
const ttl = meta.ttlSeconds ?? 86_400;
|
||||
// Generate embedding async — fire & forget compatible
|
||||
const vec = await embed(req.input);
|
||||
const embedLiteral = vec && vec.length === EMBEDDING_DIMENSION ? vectorToPgLiteral(vec) : null;
|
||||
try {
|
||||
await db.query(
|
||||
`
|
||||
INSERT INTO response_cache
|
||||
(cache_key, caller_id, task_type, model, input_preview,
|
||||
response_json, cost_when_cached, tokens_in, tokens_out, ttl_seconds, embedding)
|
||||
VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11::vector)
|
||||
ON CONFLICT (cache_key) DO UPDATE SET
|
||||
response_json = EXCLUDED.response_json,
|
||||
cost_when_cached = EXCLUDED.cost_when_cached,
|
||||
tokens_in = EXCLUDED.tokens_in,
|
||||
tokens_out = EXCLUDED.tokens_out,
|
||||
ttl_seconds = EXCLUDED.ttl_seconds,
|
||||
embedding = COALESCE(EXCLUDED.embedding, response_cache.embedding),
|
||||
created_at = NOW()
|
||||
`,
|
||||
[
|
||||
cacheKey,
|
||||
req.caller.trim().toLowerCase(),
|
||||
req.task_type ?? null,
|
||||
req.model ?? null,
|
||||
req.input.slice(0, 1024),
|
||||
JSON.stringify(response),
|
||||
meta.cost,
|
||||
meta.tokensIn,
|
||||
meta.tokensOut,
|
||||
ttl,
|
||||
embedLiteral,
|
||||
]
|
||||
);
|
||||
} catch (err) {
|
||||
// Retry without embedding column when the extension hasn't migrated yet
|
||||
logger.debug({ err }, 'response-cache: setCachedResponse with embedding failed, retrying without');
|
||||
try {
|
||||
await db.query(
|
||||
`
|
||||
INSERT INTO response_cache
|
||||
(cache_key, caller_id, task_type, model, input_preview,
|
||||
response_json, cost_when_cached, tokens_in, tokens_out, ttl_seconds)
|
||||
VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10)
|
||||
ON CONFLICT (cache_key) DO UPDATE SET
|
||||
response_json = EXCLUDED.response_json,
|
||||
cost_when_cached = EXCLUDED.cost_when_cached,
|
||||
tokens_in = EXCLUDED.tokens_in,
|
||||
tokens_out = EXCLUDED.tokens_out,
|
||||
ttl_seconds = EXCLUDED.ttl_seconds,
|
||||
created_at = NOW()
|
||||
`,
|
||||
[
|
||||
cacheKey,
|
||||
req.caller.trim().toLowerCase(),
|
||||
req.task_type ?? null,
|
||||
req.model ?? null,
|
||||
req.input.slice(0, 1024),
|
||||
JSON.stringify(response),
|
||||
meta.cost,
|
||||
meta.tokensIn,
|
||||
meta.tokensOut,
|
||||
ttl,
|
||||
]
|
||||
);
|
||||
} catch (err2) {
|
||||
logger.warn({ err: err2 }, 'response-cache: setCachedResponse failed');
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/** Record a cache hit (atomic increment). */
|
||||
export async function recordCacheHit(db: Pool, cachedId: number): Promise<void> {
|
||||
try {
|
||||
await db.query(
|
||||
`
|
||||
UPDATE response_cache
|
||||
SET hit_count = hit_count + 1,
|
||||
cost_saved = cost_saved + cost_when_cached,
|
||||
tokens_saved = tokens_saved + tokens_in + tokens_out,
|
||||
last_hit_at = NOW()
|
||||
WHERE id = $1
|
||||
`,
|
||||
[cachedId]
|
||||
);
|
||||
} catch (err) {
|
||||
logger.warn({ err }, 'response-cache: recordCacheHit failed');
|
||||
}
|
||||
}
|
||||
|
||||
/** Aggregate savings across all cache entries for the dashboard. */
|
||||
export async function getCacheSavings(
|
||||
db: Pool,
|
||||
hoursBack: number = 24
|
||||
): Promise<{
|
||||
totalHits: number;
|
||||
totalCostSaved: number;
|
||||
totalTokensSaved: number;
|
||||
uniqueEntries: number;
|
||||
topCallers: Array<{ caller: string; hits: number; saved: number }>;
|
||||
hitRatePercent: number;
|
||||
}> {
|
||||
try {
|
||||
const [totalRow, callerRows, ratioRow] = await Promise.all([
|
||||
db.query(
|
||||
`SELECT
|
||||
COALESCE(SUM(hit_count), 0)::INT AS total_hits,
|
||||
COALESCE(SUM(cost_saved), 0)::NUMERIC AS total_cost_saved,
|
||||
COALESCE(SUM(tokens_saved), 0)::BIGINT AS total_tokens_saved,
|
||||
COUNT(*)::INT AS unique_entries
|
||||
FROM response_cache
|
||||
WHERE last_hit_at > NOW() - MAKE_INTERVAL(hours => $1)
|
||||
OR created_at > NOW() - MAKE_INTERVAL(hours => $1)`,
|
||||
[hoursBack]
|
||||
),
|
||||
db.query(
|
||||
`SELECT caller_id, SUM(hit_count)::INT AS hits, SUM(cost_saved)::NUMERIC AS saved
|
||||
FROM response_cache
|
||||
WHERE last_hit_at > NOW() - MAKE_INTERVAL(hours => $1)
|
||||
GROUP BY caller_id
|
||||
ORDER BY hits DESC
|
||||
LIMIT 5`,
|
||||
[hoursBack]
|
||||
),
|
||||
// Cache hit-rate = hits / (hits + new requests in same window)
|
||||
db.query(
|
||||
`SELECT
|
||||
COALESCE((SELECT SUM(hit_count) FROM response_cache
|
||||
WHERE last_hit_at > NOW() - MAKE_INTERVAL(hours => $1)), 0)::INT AS hits,
|
||||
(SELECT COUNT(*) FROM request_tracking
|
||||
WHERE created_at > NOW() - MAKE_INTERVAL(hours => $1))::INT AS total_requests`,
|
||||
[hoursBack]
|
||||
),
|
||||
]);
|
||||
|
||||
const t = totalRow.rows[0];
|
||||
const r = ratioRow.rows[0];
|
||||
const totalReq = parseInt(r?.total_requests ?? '0', 10);
|
||||
const hits = parseInt(t?.total_hits ?? '0', 10);
|
||||
const hitRate = totalReq > 0 ? (hits / (totalReq + hits)) * 100 : 0;
|
||||
|
||||
return {
|
||||
totalHits: hits,
|
||||
totalCostSaved: parseFloat(t?.total_cost_saved ?? '0'),
|
||||
totalTokensSaved: parseInt(t?.total_tokens_saved ?? '0', 10),
|
||||
uniqueEntries: parseInt(t?.unique_entries ?? '0', 10),
|
||||
topCallers: callerRows.rows.map((row: any) => ({
|
||||
caller: row.caller_id,
|
||||
hits: parseInt(row.hits, 10) || 0,
|
||||
saved: parseFloat(row.saved) || 0,
|
||||
})),
|
||||
hitRatePercent: parseFloat(hitRate.toFixed(2)),
|
||||
};
|
||||
} catch (err) {
|
||||
logger.warn({ err }, 'response-cache: getCacheSavings failed (table missing?)');
|
||||
return {
|
||||
totalHits: 0,
|
||||
totalCostSaved: 0,
|
||||
totalTokensSaved: 0,
|
||||
uniqueEntries: 0,
|
||||
topCallers: [],
|
||||
hitRatePercent: 0,
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
/** Time-series buckets of cache savings for sparkline visualization. */
|
||||
export async function getSavingsTimeSeries(
|
||||
db: Pool,
|
||||
hoursBack: number = 24,
|
||||
bucketMinutes: number = 60
|
||||
): Promise<Array<{ ts: string; costSaved: number; hits: number; tokensSaved: number }>> {
|
||||
try {
|
||||
const buckets = Math.ceil((hoursBack * 60) / bucketMinutes);
|
||||
const result = await db.query(
|
||||
`
|
||||
WITH gs AS (
|
||||
SELECT generate_series(
|
||||
DATE_TRUNC('hour', NOW()) - ($1 || ' minutes')::INTERVAL * (s),
|
||||
DATE_TRUNC('hour', NOW()),
|
||||
($1 || ' minutes')::INTERVAL
|
||||
) AS bucket_ts
|
||||
FROM generate_series(0, $2 - 1) s
|
||||
)
|
||||
SELECT
|
||||
gs.bucket_ts,
|
||||
COALESCE(COUNT(rc.id), 0)::INT AS hits,
|
||||
COALESCE(SUM(rc.cost_when_cached), 0)::NUMERIC AS cost_saved,
|
||||
COALESCE(SUM(rc.tokens_in + rc.tokens_out), 0)::INT AS tokens_saved
|
||||
FROM gs
|
||||
LEFT JOIN response_cache rc
|
||||
ON DATE_TRUNC('hour', rc.last_hit_at) = gs.bucket_ts
|
||||
AND rc.last_hit_at > NOW() - ($1 || ' minutes')::INTERVAL * $2
|
||||
GROUP BY gs.bucket_ts
|
||||
ORDER BY gs.bucket_ts ASC
|
||||
`,
|
||||
[bucketMinutes, buckets]
|
||||
);
|
||||
return result.rows.map((row: any) => ({
|
||||
ts: row.bucket_ts.toISOString(),
|
||||
costSaved: parseFloat(row.cost_saved) || 0,
|
||||
hits: parseInt(row.hits, 10) || 0,
|
||||
tokensSaved: parseInt(row.tokens_saved, 10) || 0,
|
||||
}));
|
||||
} catch (err) {
|
||||
logger.warn({ err }, 'response-cache: getSavingsTimeSeries failed');
|
||||
return [];
|
||||
}
|
||||
}
|
||||
|
||||
/** Drop entries older than max-age days. Run from a periodic job. */
|
||||
export async function pruneStaleCacheEntries(db: Pool, maxAgeDays: number = 7): Promise<number> {
|
||||
try {
|
||||
const result = await db.query(
|
||||
`DELETE FROM response_cache
|
||||
WHERE created_at < NOW() - MAKE_INTERVAL(days => $1)
|
||||
AND (last_hit_at IS NULL OR last_hit_at < NOW() - MAKE_INTERVAL(days => $1))`,
|
||||
[maxAgeDays]
|
||||
);
|
||||
return result.rowCount ?? 0;
|
||||
} catch (err) {
|
||||
logger.warn({ err }, 'response-cache: prune failed');
|
||||
return 0;
|
||||
}
|
||||
}
|
||||
|
||||
/** Manual cache invalidation, e.g. when a caller hits "clear my cache". */
|
||||
export async function clearCacheForCaller(db: Pool, callerId: string): Promise<number> {
|
||||
try {
|
||||
const result = await db.query(
|
||||
`DELETE FROM response_cache WHERE caller_id = $1`,
|
||||
[callerId.trim().toLowerCase()]
|
||||
);
|
||||
return result.rowCount ?? 0;
|
||||
} catch (err) {
|
||||
logger.warn({ err }, 'response-cache: clearCacheForCaller failed');
|
||||
return 0;
|
||||
}
|
||||
}
|
||||
267
packages/gateway/src/modules/savings-calculator.ts
Normal file
267
packages/gateway/src/modules/savings-calculator.ts
Normal file
@ -0,0 +1,267 @@
|
||||
/**
|
||||
* Savings Calculator
|
||||
*
|
||||
* Comprehensive savings accounting across ALL gateway mechanisms — not just
|
||||
* cache hits. Lean-CTX measures file-context compression; we measure five
|
||||
* orthogonal sources of value:
|
||||
*
|
||||
* 1. Response cache (exact + semantic match)
|
||||
* 2. Compression pipeline (verbatim_compact, etc.)
|
||||
* 3. Subscription-bridge implicit savings (calls via flat-rate Pro plan
|
||||
* vs. what they would have cost via paid API)
|
||||
* 4. Model-tier routing (cheaper model used when sufficient)
|
||||
* 5. Pool routing (avoided quota-out on a sub by switching to alternate)
|
||||
*
|
||||
* The dashboard now surfaces all five so the savings counter reflects the
|
||||
* gateway's true value rather than only cache hits.
|
||||
*/
|
||||
import type { Pool } from 'pg';
|
||||
import { logger } from '../observability/logger.js';
|
||||
|
||||
// Conservative API pricing snapshot (USD per 1k tokens). Used to compute
|
||||
// "what would this have cost via direct API". Update as pricing evolves.
|
||||
const API_PRICING = {
|
||||
// Anthropic
|
||||
'claude-opus-4-1': { in: 0.015, out: 0.075 },
|
||||
'claude-sonnet-4-1': { in: 0.003, out: 0.015 },
|
||||
'claude-haiku-3': { in: 0.00025, out: 0.00125 },
|
||||
// OpenAI
|
||||
'gpt-5.1-codex': { in: 0.005, out: 0.020 },
|
||||
'gpt-5.1-codex-mini': { in: 0.0015, out: 0.006 },
|
||||
'gpt-4-turbo': { in: 0.010, out: 0.030 },
|
||||
'gpt-4': { in: 0.030, out: 0.060 },
|
||||
'gpt-3.5-turbo': { in: 0.0005, out: 0.0015 },
|
||||
// Google
|
||||
'gemini-1.5-pro': { in: 0.00125, out: 0.005 },
|
||||
'gemini-1.5-flash': { in: 0.000075, out: 0.0003 },
|
||||
} as const;
|
||||
|
||||
/** Models that go through a flat-rate subscription bridge → marginal cost = $0 */
|
||||
const SUBSCRIPTION_MODEL_PATTERNS = [
|
||||
/^claude-/i, // Claude Code subscription
|
||||
/^gpt-5\.1-codex/i, // Codex CLI subscription
|
||||
/^gpt-(4|3\.5)/i, // ChatGPT Plus / Copilot subscription
|
||||
/^gemini-/i, // Gemini Advanced
|
||||
/^github-copilot/i, // GitHub Copilot
|
||||
/^microsoft.365/i, // M365 Copilot
|
||||
];
|
||||
|
||||
function lookupApiPrice(model: string): { in: number; out: number } | null {
|
||||
const m = model.toLowerCase();
|
||||
// Exact match first
|
||||
if (m in API_PRICING) return (API_PRICING as any)[m];
|
||||
// Fuzzy match (claude-sonnet-4-1-something → claude-sonnet-4-1)
|
||||
for (const key of Object.keys(API_PRICING)) {
|
||||
if (m.startsWith(key)) return (API_PRICING as any)[key];
|
||||
}
|
||||
return null;
|
||||
}
|
||||
|
||||
function isSubscriptionModel(model: string): boolean {
|
||||
return SUBSCRIPTION_MODEL_PATTERNS.some((p) => p.test(model));
|
||||
}
|
||||
|
||||
function isLocalModel(model: string): boolean {
|
||||
return /^(qwen|llama|mistral|magatama|phi|nomic|gemma)/i.test(model);
|
||||
}
|
||||
|
||||
export interface ComprehensiveSavings {
|
||||
/** Total saved across all five mechanisms. */
|
||||
totalCostSaved: number;
|
||||
totalTokensSaved: number;
|
||||
/** Per-source breakdown for the dashboard. */
|
||||
bySource: {
|
||||
cache: { tokens: number; cost: number; hits: number };
|
||||
compression: { tokens: number; cost: number; calls: number };
|
||||
subscriptionBridge: { tokens: number; cost: number; calls: number };
|
||||
localRouting: { tokens: number; cost: number; calls: number };
|
||||
raceMode: { tokens: number; cost: number; calls: number };
|
||||
};
|
||||
/** How much you would have paid for the same volume at API list prices. */
|
||||
costWithoutGateway: number;
|
||||
/** What you actually paid (real $). */
|
||||
costWithGateway: number;
|
||||
/** Time window. */
|
||||
hoursBack: number;
|
||||
/** Inputs that gave us this number. */
|
||||
totals: { requests: number; tokensIn: number; tokensOut: number };
|
||||
}
|
||||
|
||||
/**
|
||||
* Compute comprehensive savings across all mechanisms.
|
||||
*
|
||||
* Strategy:
|
||||
* For each request, determine where it went and price it both ways:
|
||||
* - "Would-be cost" = API list price for the model that handled it
|
||||
* - "Actual cost" = $0 for subscription/local; cost_usd for paid API
|
||||
* - "Saved" = would-be − actual
|
||||
*/
|
||||
export async function getComprehensiveSavings(
|
||||
db: Pool,
|
||||
hoursBack: number = 24
|
||||
): Promise<ComprehensiveSavings> {
|
||||
const empty: ComprehensiveSavings = {
|
||||
totalCostSaved: 0,
|
||||
totalTokensSaved: 0,
|
||||
bySource: {
|
||||
cache: { tokens: 0, cost: 0, hits: 0 },
|
||||
compression: { tokens: 0, cost: 0, calls: 0 },
|
||||
subscriptionBridge: { tokens: 0, cost: 0, calls: 0 },
|
||||
localRouting: { tokens: 0, cost: 0, calls: 0 },
|
||||
raceMode: { tokens: 0, cost: 0, calls: 0 },
|
||||
},
|
||||
costWithoutGateway: 0,
|
||||
costWithGateway: 0,
|
||||
hoursBack,
|
||||
totals: { requests: 0, tokensIn: 0, tokensOut: 0 },
|
||||
};
|
||||
|
||||
try {
|
||||
// 1) Cache hits
|
||||
const cacheRow = await db.query(
|
||||
`SELECT
|
||||
COALESCE(SUM(hit_count), 0)::INT AS hits,
|
||||
COALESCE(SUM(cost_saved), 0)::NUMERIC AS cost,
|
||||
COALESCE(SUM(tokens_saved), 0)::BIGINT AS tokens
|
||||
FROM response_cache
|
||||
WHERE last_hit_at > NOW() - MAKE_INTERVAL(hours => $1)`,
|
||||
[hoursBack]
|
||||
);
|
||||
empty.bySource.cache = {
|
||||
hits: parseInt(cacheRow.rows[0]?.hits ?? '0', 10),
|
||||
cost: parseFloat(cacheRow.rows[0]?.cost ?? '0'),
|
||||
tokens: parseInt(cacheRow.rows[0]?.tokens ?? '0', 10),
|
||||
};
|
||||
|
||||
// 2-4) All requests in the window, classified by routing
|
||||
const reqRows = await db.query(
|
||||
`SELECT model, tokens_in, tokens_out, cost_usd, fallback_used
|
||||
FROM request_tracking
|
||||
WHERE created_at > NOW() - MAKE_INTERVAL(hours => $1)`,
|
||||
[hoursBack]
|
||||
);
|
||||
|
||||
let totalReq = 0, totalIn = 0, totalOut = 0;
|
||||
let withGateway = 0, withoutGateway = 0;
|
||||
|
||||
for (const r of reqRows.rows) {
|
||||
const model = String(r.model ?? '');
|
||||
const tokensIn = parseInt(r.tokens_in, 10) || 0;
|
||||
const tokensOut = parseInt(r.tokens_out, 10) || 0;
|
||||
const actualCost = parseFloat(r.cost_usd) || 0;
|
||||
|
||||
totalReq += 1;
|
||||
totalIn += tokensIn;
|
||||
totalOut += tokensOut;
|
||||
withGateway += actualCost;
|
||||
|
||||
// Determine "would-be cost" — what this request would have cost at API
|
||||
// list prices for the model that handled it (or its closest paid sibling).
|
||||
const apiPrice = lookupApiPrice(model);
|
||||
let wouldBeCost = 0;
|
||||
if (apiPrice) {
|
||||
wouldBeCost = (tokensIn / 1000) * apiPrice.in + (tokensOut / 1000) * apiPrice.out;
|
||||
} else if (isLocalModel(model)) {
|
||||
// Local model — compare against medium-tier paid API as opportunity cost
|
||||
const ref = API_PRICING['gpt-3.5-turbo'];
|
||||
wouldBeCost = (tokensIn / 1000) * ref.in + (tokensOut / 1000) * ref.out;
|
||||
}
|
||||
withoutGateway += wouldBeCost;
|
||||
|
||||
// Bucket the savings into a source
|
||||
if (isSubscriptionModel(model)) {
|
||||
empty.bySource.subscriptionBridge.calls += 1;
|
||||
empty.bySource.subscriptionBridge.tokens += tokensIn + tokensOut;
|
||||
empty.bySource.subscriptionBridge.cost += Math.max(0, wouldBeCost - actualCost);
|
||||
} else if (isLocalModel(model)) {
|
||||
empty.bySource.localRouting.calls += 1;
|
||||
empty.bySource.localRouting.tokens += tokensIn + tokensOut;
|
||||
empty.bySource.localRouting.cost += Math.max(0, wouldBeCost - actualCost);
|
||||
}
|
||||
}
|
||||
|
||||
// 5) Compression savings — pull from tokenvault_metrics if available
|
||||
try {
|
||||
const compRow = await db.query(
|
||||
`SELECT
|
||||
COUNT(*)::INT AS calls,
|
||||
COALESCE(SUM(GREATEST(tokens_before - tokens_after, 0)), 0)::BIGINT AS tokens_saved
|
||||
FROM tokenvault_metrics
|
||||
WHERE created_at > NOW() - MAKE_INTERVAL(hours => $1)
|
||||
AND tool_used = 'gateway'`,
|
||||
[hoursBack]
|
||||
);
|
||||
const tokensCompressed = parseInt(compRow.rows[0]?.tokens_saved ?? '0', 10);
|
||||
// Conservative pricing: assume average input pricing of $0.001/1k tokens
|
||||
const compCost = (tokensCompressed / 1000) * 0.001;
|
||||
empty.bySource.compression = {
|
||||
calls: parseInt(compRow.rows[0]?.calls ?? '0', 10),
|
||||
tokens: tokensCompressed,
|
||||
cost: compCost,
|
||||
};
|
||||
} catch (err) {
|
||||
logger.debug({ err }, 'savings: compression aggregation skipped (table missing)');
|
||||
}
|
||||
|
||||
// 6) Race mode — picked the faster/cheaper candidate, "saved" the loser cost
|
||||
try {
|
||||
const raceRow = await db.query(
|
||||
`SELECT
|
||||
COUNT(DISTINCT call_id)::INT AS races,
|
||||
COALESCE(SUM(cost_usd) FILTER (WHERE selected = false), 0)::NUMERIC AS not_picked_cost
|
||||
FROM race_mode_results
|
||||
WHERE created_at > NOW() - MAKE_INTERVAL(hours => $1)`,
|
||||
[hoursBack]
|
||||
);
|
||||
empty.bySource.raceMode = {
|
||||
calls: parseInt(raceRow.rows[0]?.races ?? '0', 10),
|
||||
cost: parseFloat(raceRow.rows[0]?.not_picked_cost ?? '0'),
|
||||
tokens: 0,
|
||||
};
|
||||
} catch (err) {
|
||||
logger.debug({ err }, 'savings: race aggregation skipped (table missing)');
|
||||
}
|
||||
|
||||
// 7) MCP tool-call compression — drop-in Lean-CTX replacement
|
||||
try {
|
||||
const mcpRow = await db.query(
|
||||
`SELECT COUNT(*)::INT AS calls,
|
||||
COALESCE(SUM(tokens_saved), 0)::BIGINT AS tokens_saved
|
||||
FROM mcp_tool_calls
|
||||
WHERE created_at > NOW() - MAKE_INTERVAL(hours => $1)`,
|
||||
[hoursBack]
|
||||
);
|
||||
const mcpTokens = parseInt(mcpRow.rows[0]?.tokens_saved ?? '0', 10);
|
||||
const mcpCalls = parseInt(mcpRow.rows[0]?.calls ?? '0', 10);
|
||||
// Tool-call savings cost-equivalence: Sonnet-equivalent pricing
|
||||
// ($3/MTok input, $15/MTok output, weighted 60/40 in/out for tool returns).
|
||||
// → ~$0.0046 per 1k tokens averaged. Matches Lean-CTX dashboard scale.
|
||||
const mcpCost = (mcpTokens / 1_000_000) * (3.0 * 0.6 + 15.0 * 0.4);
|
||||
// Add to the comprehensive picture as a new source bucket via compression entry
|
||||
empty.bySource.compression.tokens += mcpTokens;
|
||||
empty.bySource.compression.cost += mcpCost;
|
||||
empty.bySource.compression.calls += mcpCalls;
|
||||
} catch (err) {
|
||||
logger.debug({ err }, 'savings: mcp tool aggregation skipped (table missing)');
|
||||
}
|
||||
|
||||
empty.totalCostSaved =
|
||||
empty.bySource.cache.cost +
|
||||
empty.bySource.compression.cost +
|
||||
empty.bySource.subscriptionBridge.cost +
|
||||
empty.bySource.localRouting.cost +
|
||||
empty.bySource.raceMode.cost;
|
||||
|
||||
empty.totalTokensSaved =
|
||||
empty.bySource.cache.tokens +
|
||||
empty.bySource.compression.tokens;
|
||||
|
||||
empty.costWithoutGateway = withoutGateway;
|
||||
empty.costWithGateway = withGateway;
|
||||
empty.totals = { requests: totalReq, tokensIn: totalIn, tokensOut: totalOut };
|
||||
} catch (err) {
|
||||
logger.warn({ err }, 'savings-calculator: comprehensive computation failed');
|
||||
}
|
||||
|
||||
return empty;
|
||||
}
|
||||
214
packages/gateway/src/modules/settings-store.ts
Normal file
214
packages/gateway/src/modules/settings-store.ts
Normal file
@ -0,0 +1,214 @@
|
||||
/**
|
||||
* Settings Store
|
||||
*
|
||||
* Persists user configuration (which subscriptions they have, which API
|
||||
* providers they use, etc.) to a JSON file on disk. Sensitive fields like
|
||||
* API keys are stored verbatim but never returned in plaintext from
|
||||
* `getPublicSettings()` — only a `hasKey: true/false` flag is exposed.
|
||||
*/
|
||||
|
||||
import { readFileSync, writeFileSync, existsSync, mkdirSync } from 'fs';
|
||||
import { dirname, join } from 'path';
|
||||
import { z } from 'zod';
|
||||
import { logger } from '../observability/logger.js';
|
||||
|
||||
const SettingsSchema = z.object({
|
||||
/** How the gateway should pick providers: 'auto' uses all, others restrict the pool. */
|
||||
routingMode: z.enum(['auto', 'subscription-only', 'api-only', 'local-only']).default('auto'),
|
||||
/** Per-subscription configuration keyed by SubscriptionId. */
|
||||
subscriptions: z
|
||||
.record(
|
||||
z.string(),
|
||||
z.object({
|
||||
enabled: z.boolean().default(true),
|
||||
autoSpawn: z.boolean().default(true),
|
||||
/**
|
||||
* Optional remote bridge URL. When set, the gateway will route to this
|
||||
* URL instead of trying to spawn a local bridge. Use this when the CLI
|
||||
* subscription lives on a different machine than the gateway.
|
||||
*/
|
||||
bridgeUrl: z.string().url().optional().or(z.literal('')),
|
||||
notes: z.string().optional(),
|
||||
})
|
||||
)
|
||||
.default({}),
|
||||
/** Per-API-provider configuration keyed by provider name (cerebras, groq, …). */
|
||||
apiProviders: z
|
||||
.record(
|
||||
z.string(),
|
||||
z.object({
|
||||
enabled: z.boolean().default(false),
|
||||
apiKey: z.string().optional(),
|
||||
baseUrl: z.string().optional(),
|
||||
notes: z.string().optional(),
|
||||
})
|
||||
)
|
||||
.default({}),
|
||||
/** Local Ollama configuration. */
|
||||
ollama: z
|
||||
.object({
|
||||
enabled: z.boolean().default(true),
|
||||
baseUrl: z.string().default('http://localhost:11434'),
|
||||
})
|
||||
.default({ enabled: true, baseUrl: 'http://localhost:11434' }),
|
||||
/**
|
||||
* Simple Mode — for users who only use 1-2 subscriptions.
|
||||
* Hides advanced tabs (providers, races, share, report, memory) and
|
||||
* filters wallet/subscriptions to only show enabled providers.
|
||||
*/
|
||||
ui: z
|
||||
.object({
|
||||
simpleMode: z.boolean().default(true),
|
||||
hideEmptyProviders: z.boolean().default(true),
|
||||
showTooltips: z.boolean().default(true),
|
||||
})
|
||||
.default({ simpleMode: true, hideEmptyProviders: true, showTooltips: true }),
|
||||
/** ISO timestamp of last update. */
|
||||
updatedAt: z.string().optional(),
|
||||
});
|
||||
|
||||
export type Settings = z.infer<typeof SettingsSchema>;
|
||||
|
||||
export interface PublicSettings extends Omit<Settings, 'apiProviders'> {
|
||||
apiProviders: Record<string, { enabled: boolean; hasKey: boolean; baseUrl?: string; notes?: string }>;
|
||||
}
|
||||
|
||||
const SETTINGS_PATH =
|
||||
process.env['SETTINGS_PATH'] ?? join(process.env['HOME'] ?? '/root', '.llm-gateway', 'settings.json');
|
||||
|
||||
const DEFAULT_SUBSCRIPTIONS: Settings['subscriptions'] = {
|
||||
'claude-code': { enabled: true, autoSpawn: true },
|
||||
'github-copilot': { enabled: true, autoSpawn: true },
|
||||
'chatgpt': { enabled: true, autoSpawn: true },
|
||||
'gemini': { enabled: true, autoSpawn: true },
|
||||
'codex': { enabled: true, autoSpawn: true },
|
||||
'aider': { enabled: true, autoSpawn: true },
|
||||
};
|
||||
|
||||
function getDefaults(): Settings {
|
||||
return SettingsSchema.parse({
|
||||
routingMode: 'auto',
|
||||
subscriptions: DEFAULT_SUBSCRIPTIONS,
|
||||
ollama: { enabled: true, baseUrl: process.env['OLLAMA_BASE_URL'] ?? 'http://localhost:11434' },
|
||||
});
|
||||
}
|
||||
|
||||
/**
|
||||
* Load settings from disk. Returns defaults when the file does not yet exist
|
||||
* or fails to parse.
|
||||
*/
|
||||
export function loadSettings(): Settings {
|
||||
try {
|
||||
if (!existsSync(SETTINGS_PATH)) {
|
||||
return getDefaults();
|
||||
}
|
||||
const raw = readFileSync(SETTINGS_PATH, 'utf-8');
|
||||
const parsed = SettingsSchema.parse(JSON.parse(raw));
|
||||
return parsed;
|
||||
} catch (err) {
|
||||
logger.warn({ err, path: SETTINGS_PATH }, 'Failed to load settings — using defaults');
|
||||
return getDefaults();
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Persist settings to disk, merging with any existing values to avoid wiping
|
||||
* fields the caller didn't include in the patch.
|
||||
*/
|
||||
export function saveSettings(patch: Partial<Settings>): Settings {
|
||||
const current = loadSettings();
|
||||
const merged: Settings = SettingsSchema.parse({
|
||||
...current,
|
||||
...patch,
|
||||
subscriptions: { ...current.subscriptions, ...(patch.subscriptions ?? {}) },
|
||||
apiProviders: { ...current.apiProviders, ...(patch.apiProviders ?? {}) },
|
||||
ollama: { ...current.ollama, ...(patch.ollama ?? {}) },
|
||||
ui: { ...current.ui, ...(patch.ui ?? {}) },
|
||||
updatedAt: new Date().toISOString(),
|
||||
});
|
||||
|
||||
try {
|
||||
mkdirSync(dirname(SETTINGS_PATH), { recursive: true });
|
||||
writeFileSync(SETTINGS_PATH, JSON.stringify(merged, null, 2), { mode: 0o600 });
|
||||
logger.info({ path: SETTINGS_PATH }, 'Settings saved');
|
||||
} catch (err) {
|
||||
logger.error({ err, path: SETTINGS_PATH }, 'Failed to persist settings');
|
||||
throw err;
|
||||
}
|
||||
|
||||
// Mirror to env vars so existing provider lookups pick up changes immediately.
|
||||
applySettingsToEnv(merged);
|
||||
return merged;
|
||||
}
|
||||
|
||||
/**
|
||||
* Strip sensitive data (API keys) before sending to the dashboard.
|
||||
*/
|
||||
export function getPublicSettings(): PublicSettings {
|
||||
const settings = loadSettings();
|
||||
const apiProviders: PublicSettings['apiProviders'] = {};
|
||||
for (const [name, cfg] of Object.entries(settings.apiProviders)) {
|
||||
apiProviders[name] = {
|
||||
enabled: cfg.enabled,
|
||||
hasKey: !!cfg.apiKey,
|
||||
baseUrl: cfg.baseUrl,
|
||||
notes: cfg.notes,
|
||||
};
|
||||
}
|
||||
return {
|
||||
routingMode: settings.routingMode,
|
||||
subscriptions: settings.subscriptions,
|
||||
apiProviders,
|
||||
ollama: settings.ollama,
|
||||
ui: settings.ui,
|
||||
updatedAt: settings.updatedAt,
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Apply settings to process.env so that the existing external-providers.ts
|
||||
* code transparently picks up user-configured API keys without changes.
|
||||
*/
|
||||
export function applySettingsToEnv(settings: Settings = loadSettings()): void {
|
||||
const apiEnvMap: Record<string, string> = {
|
||||
cerebras: 'CEREBRAS_API_KEY',
|
||||
groq: 'GROQ_API_KEY',
|
||||
mistral: 'MISTRAL_API_KEY',
|
||||
nvidia: 'NVIDIA_API_KEY',
|
||||
cloudflare: 'CLOUDFLARE_AI_TOKEN',
|
||||
'openai-codex': 'OPENAI_API_KEY',
|
||||
};
|
||||
for (const [name, cfg] of Object.entries(settings.apiProviders)) {
|
||||
const envKey = apiEnvMap[name];
|
||||
if (envKey && cfg.enabled && cfg.apiKey) {
|
||||
process.env[envKey] = cfg.apiKey;
|
||||
}
|
||||
}
|
||||
if (settings.ollama.enabled && settings.ollama.baseUrl) {
|
||||
process.env['OLLAMA_BASE_URL'] = settings.ollama.baseUrl;
|
||||
}
|
||||
|
||||
// Map subscription IDs to the env var the existing provider lookup uses
|
||||
const subEnvMap: Record<string, string> = {
|
||||
'claude-code': 'CLAUDE_BRIDGE_URL',
|
||||
'github-copilot': 'COPILOT_BRIDGE_URL',
|
||||
'microsoft-365-copilot': 'M365_COPILOT_BRIDGE_URL',
|
||||
'chatgpt': 'CHATGPT_BRIDGE_URL',
|
||||
'gemini': 'GEMINI_BRIDGE_URL',
|
||||
'codex': 'CODEX_BRIDGE_URL',
|
||||
'aider': 'AIDER_BRIDGE_URL',
|
||||
};
|
||||
for (const [id, cfg] of Object.entries(settings.subscriptions)) {
|
||||
const envKey = subEnvMap[id];
|
||||
if (envKey && cfg.enabled && cfg.bridgeUrl) {
|
||||
process.env[envKey] = cfg.bridgeUrl;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
export const SettingsPatchSchema = SettingsSchema.partial().extend({
|
||||
subscriptions: SettingsSchema.shape.subscriptions.optional(),
|
||||
apiProviders: SettingsSchema.shape.apiProviders.optional(),
|
||||
ollama: SettingsSchema.shape.ollama.optional(),
|
||||
ui: SettingsSchema.shape.ui.optional(),
|
||||
});
|
||||
174
packages/gateway/src/modules/share-card.ts
Normal file
174
packages/gateway/src/modules/share-card.ts
Normal file
@ -0,0 +1,174 @@
|
||||
/**
|
||||
* Public Share Card Generator
|
||||
*
|
||||
* Renders a shareable SVG image showing your gateway savings — useful for
|
||||
* social posts, blog headers, README badges. Tokens are rounded; no
|
||||
* personally identifying information leaks (caller IDs, model names etc.
|
||||
* are NOT included). Just headline numbers + brand.
|
||||
*
|
||||
* Output is always a valid SVG so it can be embedded as `<img src="...">`
|
||||
* or downloaded directly.
|
||||
*/
|
||||
import type { Pool } from 'pg';
|
||||
import { getComprehensiveSavings } from './savings-calculator.js';
|
||||
import { getBuddyState } from './gamification.js';
|
||||
|
||||
function fmtNum(n: number): string {
|
||||
if (n >= 1_000_000) return (n / 1_000_000).toFixed(1) + 'M';
|
||||
if (n >= 1_000) return (n / 1_000).toFixed(1) + 'K';
|
||||
return Math.round(n).toString();
|
||||
}
|
||||
function fmtCost(c: number): string {
|
||||
if (c < 0.01) return `$${c.toFixed(6)}`;
|
||||
if (c < 1) return `$${c.toFixed(4)}`;
|
||||
return `$${c.toFixed(2)}`;
|
||||
}
|
||||
function escSvg(s: string): string {
|
||||
return s.replace(/&/g, '&').replace(/</g, '<').replace(/>/g, '>').replace(/"/g, '"');
|
||||
}
|
||||
|
||||
export type ShareCardPeriod = 'day' | 'week' | 'month' | 'all';
|
||||
export type ShareCardTheme = 'dark' | 'light';
|
||||
|
||||
const PERIOD_HOURS: Record<ShareCardPeriod, number> = {
|
||||
day: 24, week: 168, month: 720, all: 24 * 365 * 5,
|
||||
};
|
||||
|
||||
export async function generateShareCard(
|
||||
db: Pool,
|
||||
opts: { period?: ShareCardPeriod; theme?: ShareCardTheme } = {}
|
||||
): Promise<string> {
|
||||
const period: ShareCardPeriod = opts.period ?? 'month';
|
||||
const theme: ShareCardTheme = opts.theme ?? 'dark';
|
||||
const hours = PERIOD_HOURS[period];
|
||||
|
||||
const [savings, buddy] = await Promise.all([
|
||||
getComprehensiveSavings(db, hours),
|
||||
getBuddyState(db, 'gateway'),
|
||||
]);
|
||||
|
||||
// Theme palette
|
||||
const palette = theme === 'dark' ? {
|
||||
bg: '#0a0a0a', surface: '#161616', text: '#e8e8e8', dim: '#888888',
|
||||
accent: '#d4ff00', accentDim: '#8aa800', border: '#2a2a2a',
|
||||
} : {
|
||||
bg: '#f4f7fa', surface: '#ffffff', text: '#24313d', dim: '#667684',
|
||||
accent: '#0f766e', accentDim: '#8ab9b5', border: '#d6e0e7',
|
||||
};
|
||||
|
||||
const periodLabel = period === 'day' ? 'Last 24 hours'
|
||||
: period === 'week' ? 'Last 7 days'
|
||||
: period === 'month' ? 'Last 30 days'
|
||||
: 'All-time';
|
||||
|
||||
const W = 1200, H = 630; // Open Graph standard
|
||||
const totalTokens = savings.totalTokensSaved;
|
||||
const totalCost = savings.totalCostSaved;
|
||||
const reqCount = savings.totals.requests;
|
||||
const efficacy = savings.costWithoutGateway > 0
|
||||
? ((savings.costWithoutGateway - savings.costWithGateway) / savings.costWithoutGateway) * 100
|
||||
: 0;
|
||||
|
||||
// Source-bar widths
|
||||
const total = Math.max(0.0000001, savings.totalCostSaved);
|
||||
const wCache = (savings.bySource.cache.cost / total) * 100;
|
||||
const wComp = (savings.bySource.compression.cost / total) * 100;
|
||||
const wSub = (savings.bySource.subscriptionBridge.cost / total) * 100;
|
||||
const wLocal = (savings.bySource.localRouting.cost / total) * 100;
|
||||
const wRace = (savings.bySource.raceMode.cost / total) * 100;
|
||||
|
||||
return `<svg xmlns="http://www.w3.org/2000/svg" width="${W}" height="${H}" viewBox="0 0 ${W} ${H}">
|
||||
<defs>
|
||||
<linearGradient id="bgGrad" x1="0" y1="0" x2="1" y2="1">
|
||||
<stop offset="0%" stop-color="${palette.bg}"/>
|
||||
<stop offset="100%" stop-color="${palette.surface}"/>
|
||||
</linearGradient>
|
||||
<radialGradient id="glow" cx="20%" cy="0%" r="80%">
|
||||
<stop offset="0%" stop-color="${palette.accent}" stop-opacity="0.20"/>
|
||||
<stop offset="60%" stop-color="${palette.accent}" stop-opacity="0.04"/>
|
||||
<stop offset="100%" stop-color="${palette.bg}" stop-opacity="0"/>
|
||||
</radialGradient>
|
||||
<style>
|
||||
.mono { font-family: 'JetBrains Mono', 'SF Mono', monospace; }
|
||||
.sans { font-family: 'Inter', -apple-system, sans-serif; }
|
||||
.num { font-weight: 700; letter-spacing: -0.02em; }
|
||||
.label { letter-spacing: 0.16em; text-transform: uppercase; }
|
||||
</style>
|
||||
</defs>
|
||||
|
||||
<!-- background -->
|
||||
<rect width="${W}" height="${H}" fill="url(#bgGrad)"/>
|
||||
<rect width="${W}" height="${H}" fill="url(#glow)"/>
|
||||
<rect width="${W}" height="${H}" fill="none" stroke="${palette.border}" stroke-width="2"/>
|
||||
|
||||
<!-- brand mark -->
|
||||
<g transform="translate(48 48)">
|
||||
<rect x="0" y="0" width="14" height="14" fill="${palette.accent}"/>
|
||||
<text x="24" y="12" class="mono" font-size="20" font-weight="700" fill="${palette.text}">llm.gateway</text>
|
||||
<text x="180" y="12" class="mono" font-size="13" fill="${palette.dim}">— ${escSvg(periodLabel)}</text>
|
||||
</g>
|
||||
|
||||
<!-- top-right: brand tag / version -->
|
||||
<g transform="translate(${W - 48} 48)">
|
||||
<text x="0" y="12" text-anchor="end" class="mono" font-size="11" fill="${palette.dim}" letter-spacing="0.1em">CONTEXT-X.ORG</text>
|
||||
</g>
|
||||
|
||||
<!-- HUGE counter — eyebrow above, big number well below to avoid overlap -->
|
||||
<g transform="translate(48 ${H/2 - 110})">
|
||||
<text x="0" y="0" class="mono label" font-size="14" fill="${palette.dim}">tokens prevented · ${escSvg(periodLabel.toLowerCase())}</text>
|
||||
<text x="0" y="135" class="mono num" font-size="120" fill="${palette.accent}">${fmtNum(totalTokens)}</text>
|
||||
<text x="0" y="180" class="mono" font-size="18" fill="${palette.text}">
|
||||
<tspan>${fmtCost(totalCost)} saved</tspan>
|
||||
<tspan dx="20" fill="${palette.dim}">·</tspan>
|
||||
<tspan dx="14">${fmtNum(reqCount)} calls</tspan>
|
||||
<tspan dx="20" fill="${palette.dim}">·</tspan>
|
||||
<tspan dx="14">${efficacy.toFixed(1)}% efficiency</tspan>
|
||||
</text>
|
||||
</g>
|
||||
|
||||
<!-- 5-axis breakdown bar -->
|
||||
<g transform="translate(48 ${H - 180})">
|
||||
<text x="0" y="0" class="mono label" font-size="12" fill="${palette.dim}">savings sources · 5-axis breakdown</text>
|
||||
<rect x="0" y="14" width="${W - 96}" height="22" fill="${palette.surface}" stroke="${palette.border}"/>
|
||||
${(() => {
|
||||
let x = 0;
|
||||
const segs: string[] = [];
|
||||
const w = W - 96;
|
||||
const pieces = [
|
||||
{ p: wCache, c: '#d4ff00', label: '⚡' },
|
||||
{ p: wComp, c: '#2dd4bf', label: '🗜' },
|
||||
{ p: wSub, c: '#60a5fa', label: '🌉' },
|
||||
{ p: wLocal, c: '#a78bfa', label: '🏠' },
|
||||
{ p: wRace, c: '#f97316', label: '🏁' },
|
||||
];
|
||||
for (const piece of pieces) {
|
||||
const segW = (piece.p / 100) * w;
|
||||
if (segW > 0.5) {
|
||||
segs.push(`<rect x="${x}" y="14" width="${segW}" height="22" fill="${piece.c}"/>`);
|
||||
}
|
||||
x += segW;
|
||||
}
|
||||
return segs.join('');
|
||||
})()}
|
||||
<g transform="translate(0 60)" class="mono" font-size="11" fill="${palette.dim}">
|
||||
<text x="0" y="0"><tspan fill="#d4ff00">●</tspan> cache</text>
|
||||
<text x="120" y="0"><tspan fill="#2dd4bf">●</tspan> compression</text>
|
||||
<text x="270" y="0"><tspan fill="#60a5fa">●</tspan> subscription bridges</text>
|
||||
<text x="470" y="0"><tspan fill="#a78bfa">●</tspan> local routing</text>
|
||||
<text x="600" y="0"><tspan fill="#f97316">●</tspan> race mode</text>
|
||||
</g>
|
||||
</g>
|
||||
|
||||
<!-- footer / buddy -->
|
||||
<g transform="translate(48 ${H - 70})">
|
||||
<text x="0" y="0" class="mono" font-size="11" fill="${palette.dim}">
|
||||
<tspan fill="${palette.accent}">${escSvg(buddy.species)}</tspan>
|
||||
<tspan dx="6">·</tspan>
|
||||
<tspan dx="6">Lv.${buddy.level}</tspan>
|
||||
<tspan dx="6">·</tspan>
|
||||
<tspan dx="6">${buddy.streakDays}d streak</tspan>
|
||||
<tspan dx="20" fill="${palette.dim}">— routing AI traffic since ${escSvg(new Date().toISOString().split('T')[0])}</tspan>
|
||||
</text>
|
||||
</g>
|
||||
</svg>`;
|
||||
}
|
||||
303
packages/gateway/src/modules/subscription-discovery.ts
Normal file
303
packages/gateway/src/modules/subscription-discovery.ts
Normal file
@ -0,0 +1,303 @@
|
||||
/**
|
||||
* Subscription Discovery
|
||||
*
|
||||
* Auto-detects locally installed CLI subscriptions (Claude Code, GitHub Copilot,
|
||||
* ChatGPT, Gemini, etc.) and reports their authentication status. The discovery
|
||||
* results drive automatic bridge spawning and dynamic provider registration.
|
||||
*/
|
||||
|
||||
import { execFile } from 'child_process';
|
||||
import { promisify } from 'util';
|
||||
import { existsSync } from 'fs';
|
||||
import { logger } from '../observability/logger.js';
|
||||
|
||||
const execFileAsync = promisify(execFile);
|
||||
|
||||
export type SubscriptionId =
|
||||
| 'claude-code'
|
||||
| 'github-copilot'
|
||||
| 'microsoft-365-copilot'
|
||||
| 'chatgpt'
|
||||
| 'gemini'
|
||||
| 'codex'
|
||||
| 'aider';
|
||||
|
||||
export interface SubscriptionDescriptor {
|
||||
id: SubscriptionId;
|
||||
/** Friendly display name */
|
||||
label: string;
|
||||
/** CLI binary required to use the subscription */
|
||||
command: string;
|
||||
/** Args used for the version probe */
|
||||
versionArgs: readonly string[];
|
||||
/** Args used for the auth probe (optional) */
|
||||
authProbeArgs?: readonly string[];
|
||||
/** Default port the bridge listens on */
|
||||
bridgePort: number;
|
||||
/** ENV var the gateway uses to find the bridge URL */
|
||||
bridgeEnvKey: string;
|
||||
/** Logical provider name in `external-providers.ts` */
|
||||
providerName: string;
|
||||
/** Models exposed via this subscription */
|
||||
models: ReadonlyArray<{ id: string; tier: 'fast' | 'medium' | 'large' | 'reasoning' }>;
|
||||
/** Bridge implementation path (relative to repo root or absolute) */
|
||||
bridgeImplementation: 'inline-claude' | 'inline-openai' | 'inline-copilot' | 'external-codex';
|
||||
}
|
||||
|
||||
export interface SubscriptionStatus {
|
||||
descriptor: SubscriptionDescriptor;
|
||||
installed: boolean;
|
||||
authenticated: boolean | 'unknown';
|
||||
version?: string;
|
||||
error?: string;
|
||||
bridgeUrl?: string;
|
||||
bridgeRunning: boolean;
|
||||
}
|
||||
|
||||
/**
|
||||
* Catalog of subscriptions the gateway knows how to bootstrap.
|
||||
* Adding a new entry here is enough to make it discoverable.
|
||||
*/
|
||||
export const SUBSCRIPTION_CATALOG: readonly SubscriptionDescriptor[] = [
|
||||
{
|
||||
id: 'claude-code',
|
||||
label: 'Claude Code (Anthropic Subscription)',
|
||||
command: 'claude',
|
||||
versionArgs: ['--version'],
|
||||
bridgePort: 3250,
|
||||
bridgeEnvKey: 'CLAUDE_BRIDGE_URL',
|
||||
providerName: 'claude-bridge',
|
||||
bridgeImplementation: 'inline-claude',
|
||||
models: [
|
||||
{ id: 'claude-opus-4-1', tier: 'reasoning' },
|
||||
{ id: 'claude-sonnet-4-1', tier: 'large' },
|
||||
{ id: 'claude-haiku-3', tier: 'fast' },
|
||||
],
|
||||
},
|
||||
{
|
||||
id: 'github-copilot',
|
||||
label: 'GitHub Copilot Subscription',
|
||||
command: 'gh',
|
||||
versionArgs: ['copilot', '--version'],
|
||||
bridgePort: 3252,
|
||||
bridgeEnvKey: 'COPILOT_BRIDGE_URL',
|
||||
providerName: 'copilot-bridge',
|
||||
bridgeImplementation: 'inline-copilot',
|
||||
models: [
|
||||
{ id: 'gpt-4', tier: 'reasoning' },
|
||||
{ id: 'gpt-3.5-turbo', tier: 'medium' },
|
||||
],
|
||||
},
|
||||
{
|
||||
id: 'microsoft-365-copilot',
|
||||
label: 'Microsoft 365 Copilot Subscription',
|
||||
command: 'node',
|
||||
versionArgs: ['--version'],
|
||||
bridgePort: 3257,
|
||||
bridgeEnvKey: 'M365_COPILOT_BRIDGE_URL',
|
||||
providerName: 'm365-copilot-bridge',
|
||||
bridgeImplementation: 'inline-openai',
|
||||
models: [
|
||||
{ id: 'microsoft-365-copilot', tier: 'reasoning' },
|
||||
{ id: 'm365-copilot-chat', tier: 'large' },
|
||||
],
|
||||
},
|
||||
{
|
||||
id: 'chatgpt',
|
||||
label: 'OpenAI ChatGPT Plus Subscription',
|
||||
command: 'chatgpt',
|
||||
versionArgs: ['--version'],
|
||||
bridgePort: 3251,
|
||||
bridgeEnvKey: 'CHATGPT_BRIDGE_URL',
|
||||
providerName: 'chatgpt-bridge',
|
||||
bridgeImplementation: 'inline-openai',
|
||||
models: [
|
||||
{ id: 'gpt-4-turbo', tier: 'reasoning' },
|
||||
{ id: 'gpt-4', tier: 'large' },
|
||||
{ id: 'gpt-3.5-turbo', tier: 'medium' },
|
||||
],
|
||||
},
|
||||
{
|
||||
id: 'gemini',
|
||||
label: 'Google Gemini Advanced Subscription',
|
||||
command: 'gemini',
|
||||
versionArgs: ['--version'],
|
||||
bridgePort: 3254,
|
||||
bridgeEnvKey: 'GEMINI_BRIDGE_URL',
|
||||
providerName: 'gemini-bridge',
|
||||
bridgeImplementation: 'inline-openai',
|
||||
models: [
|
||||
{ id: 'gemini-1.5-pro', tier: 'reasoning' },
|
||||
{ id: 'gemini-1.5-flash', tier: 'fast' },
|
||||
],
|
||||
},
|
||||
{
|
||||
id: 'codex',
|
||||
label: 'OpenAI Codex CLI Subscription',
|
||||
command: 'codex',
|
||||
versionArgs: ['--version'],
|
||||
authProbeArgs: ['login', 'status'],
|
||||
bridgePort: 3253,
|
||||
bridgeEnvKey: 'CODEX_BRIDGE_URL',
|
||||
providerName: 'codex-bridge',
|
||||
bridgeImplementation: 'external-codex',
|
||||
models: [
|
||||
{ id: 'gpt-5.1-codex', tier: 'reasoning' },
|
||||
{ id: 'gpt-5.1-codex-mini', tier: 'large' },
|
||||
{ id: 'codex-mini-latest', tier: 'medium' },
|
||||
],
|
||||
},
|
||||
{
|
||||
id: 'aider',
|
||||
label: 'Aider AI Pair Programmer',
|
||||
command: 'aider',
|
||||
versionArgs: ['--version'],
|
||||
bridgePort: 3256,
|
||||
bridgeEnvKey: 'AIDER_BRIDGE_URL',
|
||||
providerName: 'aider-bridge',
|
||||
bridgeImplementation: 'inline-openai',
|
||||
models: [
|
||||
{ id: 'aider-default', tier: 'large' },
|
||||
],
|
||||
},
|
||||
];
|
||||
|
||||
/**
|
||||
* Probe a CLI's --version with a 3s timeout. Returns null when not installed.
|
||||
*/
|
||||
async function probeVersion(command: string, args: readonly string[]): Promise<string | null> {
|
||||
try {
|
||||
const { stdout, stderr } = await execFileAsync(command, args as string[], {
|
||||
timeout: 3000,
|
||||
maxBuffer: 64 * 1024,
|
||||
});
|
||||
const out = (stdout || stderr || '').trim().split('\n')[0];
|
||||
return out || 'installed';
|
||||
} catch (err: unknown) {
|
||||
const code = (err as NodeJS.ErrnoException).code;
|
||||
if (code === 'ENOENT') return null;
|
||||
// Non-zero exit code but command exists (e.g. auth required) — count as installed
|
||||
return 'installed';
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Best-effort authentication check. Many CLI tools don't have a clean probe,
|
||||
* so we return 'unknown' rather than guessing wrong.
|
||||
*/
|
||||
async function probeAuthenticated(desc: SubscriptionDescriptor): Promise<boolean | 'unknown'> {
|
||||
// Claude Code stores credentials in ~/.claude/.credentials.json
|
||||
if (desc.id === 'claude-code') {
|
||||
const home = process.env.HOME || '/root';
|
||||
return existsSync(`${home}/.claude/.credentials.json`);
|
||||
}
|
||||
// GitHub Copilot uses gh auth status
|
||||
if (desc.id === 'github-copilot') {
|
||||
try {
|
||||
await execFileAsync('gh', ['auth', 'status'], { timeout: 3000 });
|
||||
return true;
|
||||
} catch {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
if (desc.id === 'microsoft-365-copilot') {
|
||||
return Boolean(
|
||||
process.env['MICROSOFT_GRAPH_ACCESS_TOKEN'] ||
|
||||
process.env['M365_COPILOT_ACCESS_TOKEN'] ||
|
||||
process.env['MICROSOFT_CLIENT_ID']
|
||||
);
|
||||
}
|
||||
if (desc.id === 'codex') {
|
||||
try {
|
||||
await execFileAsync('codex', ['login', 'status'], { timeout: 3000 });
|
||||
return true;
|
||||
} catch {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
return 'unknown';
|
||||
}
|
||||
|
||||
/**
|
||||
* Check whether a bridge URL is reachable.
|
||||
*/
|
||||
async function probeBridge(url: string | undefined): Promise<boolean> {
|
||||
if (!url) return false;
|
||||
try {
|
||||
const controller = new AbortController();
|
||||
const timeoutId = setTimeout(() => controller.abort(), 1500);
|
||||
try {
|
||||
await fetch(`${url.replace(/\/$/, '')}/health`, { signal: controller.signal });
|
||||
return true;
|
||||
} finally {
|
||||
clearTimeout(timeoutId);
|
||||
}
|
||||
} catch {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Resolve the bridge URL for a subscription:
|
||||
* 1. Explicit env var (CLAUDE_BRIDGE_URL etc.) — set by Settings or PM2 ecosystem
|
||||
* 2. Auto-detect: probe http://127.0.0.1:{bridgePort} for a /health endpoint
|
||||
*
|
||||
* This means a bridge running locally on its default port is picked up
|
||||
* automatically without any configuration.
|
||||
*/
|
||||
async function resolveBridgeUrl(desc: SubscriptionDescriptor): Promise<{ url?: string; running: boolean }> {
|
||||
const explicit = process.env[desc.bridgeEnvKey];
|
||||
if (explicit) {
|
||||
const running = await probeBridge(explicit);
|
||||
return { url: explicit, running };
|
||||
}
|
||||
// Auto-detect on the default port
|
||||
const localUrl = `http://127.0.0.1:${desc.bridgePort}`;
|
||||
const running = await probeBridge(localUrl);
|
||||
return running ? { url: localUrl, running: true } : { running: false };
|
||||
}
|
||||
|
||||
/**
|
||||
* Discover all subscriptions the gateway knows about. Probes the CLI binary,
|
||||
* authentication state, and any pre-configured bridge URL in the environment.
|
||||
*/
|
||||
export async function discoverSubscriptions(): Promise<SubscriptionStatus[]> {
|
||||
const results = await Promise.all(
|
||||
SUBSCRIPTION_CATALOG.map(async (desc): Promise<SubscriptionStatus> => {
|
||||
// Always probe the bridge first — a running bridge is enough to count
|
||||
// as "available" even if the CLI isn't installed on this host (the
|
||||
// bridge could live on the user's machine).
|
||||
const bridge = await resolveBridgeUrl(desc);
|
||||
|
||||
const version = await probeVersion(desc.command, desc.versionArgs);
|
||||
if (!version) {
|
||||
return {
|
||||
descriptor: desc,
|
||||
installed: bridge.running, // remote bridge counts as installed
|
||||
authenticated: bridge.running ? 'unknown' : false,
|
||||
bridgeUrl: bridge.url,
|
||||
bridgeRunning: bridge.running,
|
||||
};
|
||||
}
|
||||
|
||||
const authenticated = await probeAuthenticated(desc);
|
||||
return {
|
||||
descriptor: desc,
|
||||
installed: true,
|
||||
authenticated,
|
||||
version,
|
||||
bridgeUrl: bridge.url,
|
||||
bridgeRunning: bridge.running,
|
||||
};
|
||||
})
|
||||
);
|
||||
logger.info(
|
||||
{
|
||||
detected: results.filter((r) => r.installed).length,
|
||||
bridgesLive: results.filter((r) => r.bridgeRunning).length,
|
||||
total: results.length,
|
||||
},
|
||||
'Subscription discovery completed'
|
||||
);
|
||||
return results;
|
||||
}
|
||||
271
packages/gateway/src/modules/subscription-wallet.ts
Normal file
271
packages/gateway/src/modules/subscription-wallet.ts
Normal file
@ -0,0 +1,271 @@
|
||||
/**
|
||||
* Subscription Pool Wallet
|
||||
*
|
||||
* Tracks usage of each CLI subscription against its known quota window
|
||||
* (Claude Plus = 80 msg / 3h, ChatGPT Plus = 80 msg / 3h, Copilot = …).
|
||||
* Used by the dashboard to show which subscription has the most headroom
|
||||
* and (future) by the router to load-balance across subscriptions.
|
||||
*
|
||||
* This is the feature competitors don't have: combining MULTIPLE personal
|
||||
* AI subscriptions into a single managed pool.
|
||||
*/
|
||||
|
||||
import type { Pool } from 'pg';
|
||||
import { logger } from '../observability/logger.js';
|
||||
|
||||
export interface QuotaProfile {
|
||||
subscriptionId: string;
|
||||
label: string;
|
||||
/** Hard request quota inside the window. Null = unknown / unlimited. */
|
||||
requestQuota: number | null;
|
||||
/** Window length in seconds (Anthropic uses 3h = 10800s, OpenAI varies). */
|
||||
windowSeconds: number;
|
||||
/** Reset behaviour: 'rolling' = sliding window, 'fixed' = clock-aligned reset. */
|
||||
reset: 'rolling' | 'fixed';
|
||||
}
|
||||
|
||||
/**
|
||||
* Known subscription quota profiles. Numbers are conservative defaults —
|
||||
* users can override via Settings if their plan differs.
|
||||
*/
|
||||
export const QUOTA_PROFILES: Record<string, QuotaProfile> = {
|
||||
'claude-code': { subscriptionId: 'claude-code', label: 'Claude Code (Pro)', requestQuota: 45, windowSeconds: 5 * 3600, reset: 'rolling' },
|
||||
'github-copilot': { subscriptionId: 'github-copilot', label: 'GitHub Copilot', requestQuota: null, windowSeconds: 30 * 86400, reset: 'fixed' },
|
||||
'microsoft-365-copilot': { subscriptionId: 'microsoft-365-copilot', label: 'M365 Copilot', requestQuota: null, windowSeconds: 30 * 86400, reset: 'fixed' },
|
||||
'chatgpt': { subscriptionId: 'chatgpt', label: 'ChatGPT Plus', requestQuota: 80, windowSeconds: 3 * 3600, reset: 'rolling' },
|
||||
'gemini': { subscriptionId: 'gemini', label: 'Gemini Advanced', requestQuota: null, windowSeconds: 30 * 86400, reset: 'fixed' },
|
||||
'codex': { subscriptionId: 'codex', label: 'OpenAI Codex', requestQuota: 150, windowSeconds: 5 * 3600, reset: 'rolling' },
|
||||
'aider': { subscriptionId: 'aider', label: 'Aider', requestQuota: null, windowSeconds: 86400, reset: 'fixed' },
|
||||
};
|
||||
|
||||
/** Record a request against a subscription quota window. */
|
||||
export async function recordSubscriptionUsage(
|
||||
db: Pool,
|
||||
subscriptionId: string,
|
||||
tokensConsumed: number = 0
|
||||
): Promise<void> {
|
||||
const profile = QUOTA_PROFILES[subscriptionId];
|
||||
if (!profile) return;
|
||||
|
||||
// Compute the window-start timestamp this request belongs to.
|
||||
const now = new Date();
|
||||
let windowStart: Date;
|
||||
if (profile.reset === 'rolling') {
|
||||
// Floor to the most recent quarter-hour for grouping; rolling logic
|
||||
// applied at read-time by summing the last `windowSeconds`.
|
||||
const rounded = Math.floor(now.getTime() / 900_000) * 900_000;
|
||||
windowStart = new Date(rounded);
|
||||
} else {
|
||||
// Fixed reset — bucket into day windows
|
||||
const day = new Date(now);
|
||||
day.setUTCHours(0, 0, 0, 0);
|
||||
windowStart = day;
|
||||
}
|
||||
|
||||
try {
|
||||
await db.query(
|
||||
`
|
||||
INSERT INTO subscription_quota_window
|
||||
(subscription_id, window_start, window_seconds, request_count, tokens_consumed, quota_limit, reset_at)
|
||||
VALUES ($1, $2, $3, 1, $4, $5, $6)
|
||||
ON CONFLICT (subscription_id, window_start)
|
||||
DO UPDATE SET
|
||||
request_count = subscription_quota_window.request_count + 1,
|
||||
tokens_consumed = subscription_quota_window.tokens_consumed + EXCLUDED.tokens_consumed
|
||||
`,
|
||||
[
|
||||
subscriptionId,
|
||||
windowStart,
|
||||
profile.windowSeconds,
|
||||
tokensConsumed,
|
||||
profile.requestQuota,
|
||||
new Date(windowStart.getTime() + profile.windowSeconds * 1000),
|
||||
]
|
||||
);
|
||||
} catch (err) {
|
||||
logger.warn({ err, subscriptionId }, 'subscription-wallet: usage record failed');
|
||||
}
|
||||
}
|
||||
|
||||
export interface WalletEntry {
|
||||
subscriptionId: string;
|
||||
label: string;
|
||||
requestQuota: number | null;
|
||||
used: number;
|
||||
remaining: number | null;
|
||||
utilizationPercent: number | null;
|
||||
windowSeconds: number;
|
||||
resetAt: string | null;
|
||||
/** Predicted exhaustion timestamp based on current rate; null if no quota or no usage. */
|
||||
predictedExhaustionAt: string | null;
|
||||
recommendation: 'use-this' | 'available' | 'near-limit' | 'exhausted' | 'unknown';
|
||||
}
|
||||
|
||||
/** Build the wallet snapshot for the dashboard. */
|
||||
export async function getSubscriptionWallet(db: Pool): Promise<WalletEntry[]> {
|
||||
const entries: WalletEntry[] = [];
|
||||
|
||||
for (const profile of Object.values(QUOTA_PROFILES)) {
|
||||
let used = 0;
|
||||
let resetAt: string | null = null;
|
||||
let predictedExhaustionAt: string | null = null;
|
||||
|
||||
try {
|
||||
const result = await db.query(
|
||||
`
|
||||
SELECT
|
||||
COALESCE(SUM(request_count), 0)::INT AS used,
|
||||
MAX(reset_at) AS reset_at
|
||||
FROM subscription_quota_window
|
||||
WHERE subscription_id = $1
|
||||
AND window_start > NOW() - MAKE_INTERVAL(secs => $2)
|
||||
`,
|
||||
[profile.subscriptionId, profile.windowSeconds]
|
||||
);
|
||||
used = parseInt(result.rows[0]?.used ?? '0', 10);
|
||||
resetAt = result.rows[0]?.reset_at ? new Date(result.rows[0].reset_at).toISOString() : null;
|
||||
} catch (err) {
|
||||
logger.warn({ err, sub: profile.subscriptionId }, 'wallet: read failed');
|
||||
}
|
||||
|
||||
const remaining = profile.requestQuota !== null ? Math.max(profile.requestQuota - used, 0) : null;
|
||||
const utilizationPercent = profile.requestQuota
|
||||
? Math.min(100, (used / profile.requestQuota) * 100)
|
||||
: null;
|
||||
|
||||
// Linear extrapolation for predicted exhaustion.
|
||||
if (remaining !== null && used > 0 && profile.requestQuota) {
|
||||
const ratePerSecond = used / profile.windowSeconds;
|
||||
if (ratePerSecond > 0) {
|
||||
const secondsRemaining = remaining / ratePerSecond;
|
||||
predictedExhaustionAt = new Date(Date.now() + secondsRemaining * 1000).toISOString();
|
||||
}
|
||||
}
|
||||
|
||||
let recommendation: WalletEntry['recommendation'] = 'unknown';
|
||||
if (utilizationPercent !== null) {
|
||||
if (utilizationPercent >= 100) recommendation = 'exhausted';
|
||||
else if (utilizationPercent >= 80) recommendation = 'near-limit';
|
||||
else if (utilizationPercent <= 30) recommendation = 'use-this';
|
||||
else recommendation = 'available';
|
||||
}
|
||||
|
||||
entries.push({
|
||||
subscriptionId: profile.subscriptionId,
|
||||
label: profile.label,
|
||||
requestQuota: profile.requestQuota,
|
||||
used,
|
||||
remaining,
|
||||
utilizationPercent: utilizationPercent !== null ? Math.round(utilizationPercent * 10) / 10 : null,
|
||||
windowSeconds: profile.windowSeconds,
|
||||
resetAt,
|
||||
predictedExhaustionAt,
|
||||
recommendation,
|
||||
});
|
||||
}
|
||||
|
||||
return entries;
|
||||
}
|
||||
|
||||
/**
|
||||
* Map an Ollama / external model id to the subscription it belongs to,
|
||||
* if any. Returns null for non-subscription models (free APIs, local Ollama).
|
||||
*/
|
||||
export function modelToSubscriptionId(model: string): string | null {
|
||||
const m = model.toLowerCase();
|
||||
if (m.startsWith('claude-') || m.includes('claude')) return 'claude-code';
|
||||
if (m.startsWith('gpt-5.1-codex') || m === 'codex-mini-latest') return 'codex';
|
||||
if (m.startsWith('gpt-')) return 'chatgpt';
|
||||
if (m.startsWith('gemini-')) return 'gemini';
|
||||
if (m.startsWith('github-copilot') || m === 'copilot-chat') return 'github-copilot';
|
||||
if (m === 'microsoft-365-copilot' || m === 'm365-copilot-chat') return 'microsoft-365-copilot';
|
||||
return null;
|
||||
}
|
||||
|
||||
/**
|
||||
* Post-process a routing decision against the subscription wallet.
|
||||
*
|
||||
* If the picked model belongs to a subscription that is `exhausted` or
|
||||
* `near-limit` (>=80% utilization), we look at the same-tier siblings in
|
||||
* the fallback chain and re-pick the one with the most headroom.
|
||||
*
|
||||
* This is the Pool-Routing feature: distribute load across YOUR subscriptions
|
||||
* to maximize their value rather than always routing to the primary.
|
||||
*/
|
||||
export async function applyPoolRouting(
|
||||
db: Pool,
|
||||
decision: { model: string; fallback_chain: string[]; tier: string },
|
||||
options: { forced?: boolean } = {}
|
||||
): Promise<{ model: string; fallback_chain: string[]; reason: string } | null> {
|
||||
const wallet = await getSubscriptionWallet(db);
|
||||
const utilByModel = (model: string): number | null => {
|
||||
const sub = modelToSubscriptionId(model);
|
||||
if (!sub) return null;
|
||||
const w = wallet.find((entry) => entry.subscriptionId === sub);
|
||||
return w?.utilizationPercent ?? null;
|
||||
};
|
||||
const isExhausted = (model: string): boolean => {
|
||||
const sub = modelToSubscriptionId(model);
|
||||
if (!sub) return false;
|
||||
const w = wallet.find((entry) => entry.subscriptionId === sub);
|
||||
return w?.recommendation === 'exhausted';
|
||||
};
|
||||
|
||||
const primaryUtil = utilByModel(decision.model);
|
||||
const primarySub = modelToSubscriptionId(decision.model);
|
||||
|
||||
// No re-routing for non-subscription models or when primary has plenty of headroom
|
||||
if (!primarySub) return null;
|
||||
if (!options.forced && primaryUtil !== null && primaryUtil < 80 && !isExhausted(decision.model)) return null;
|
||||
|
||||
// Find a sibling in the fallback chain with lower utilization
|
||||
const candidates = decision.fallback_chain.filter((m) => m !== decision.model);
|
||||
let bestModel = decision.model;
|
||||
let bestUtil = primaryUtil ?? 100;
|
||||
|
||||
for (const candidate of candidates) {
|
||||
if (isExhausted(candidate)) continue;
|
||||
const util = utilByModel(candidate);
|
||||
if (util === null) continue; // unknown utilization — don't pick blindly over a known one
|
||||
if (util < bestUtil) {
|
||||
bestUtil = util;
|
||||
bestModel = candidate;
|
||||
}
|
||||
}
|
||||
|
||||
if (bestModel === decision.model) return null;
|
||||
|
||||
// Move chosen model to front of chain
|
||||
const newChain = [bestModel, ...decision.fallback_chain.filter((m) => m !== bestModel)];
|
||||
return {
|
||||
model: bestModel,
|
||||
fallback_chain: newChain,
|
||||
reason: `pool-route: primary ${decision.model} at ${primaryUtil?.toFixed(0) ?? '?'}% util, switched to ${bestModel} at ${bestUtil.toFixed(0)}%`,
|
||||
};
|
||||
}
|
||||
|
||||
/** Pick the subscription with the most headroom for a given tier. */
|
||||
export async function pickBestSubscription(
|
||||
db: Pool,
|
||||
candidates: readonly string[]
|
||||
): Promise<{ subscriptionId: string; reason: string } | null> {
|
||||
const wallet = await getSubscriptionWallet(db);
|
||||
const eligible = wallet.filter(
|
||||
(w) => candidates.includes(w.subscriptionId) && w.recommendation !== 'exhausted'
|
||||
);
|
||||
if (eligible.length === 0) return null;
|
||||
// Sort: lowest utilization first (most headroom). Unknown utilisation
|
||||
// sorts to the middle so paid quotas with usage data win over unknowns.
|
||||
eligible.sort((a, b) => {
|
||||
const ua = a.utilizationPercent ?? 50;
|
||||
const ub = b.utilizationPercent ?? 50;
|
||||
return ua - ub;
|
||||
});
|
||||
const winner = eligible[0];
|
||||
return {
|
||||
subscriptionId: winner.subscriptionId,
|
||||
reason: winner.utilizationPercent !== null
|
||||
? `${winner.utilizationPercent.toFixed(0)}% used in window`
|
||||
: 'no quota tracking',
|
||||
};
|
||||
}
|
||||
@ -86,6 +86,17 @@ const PROVIDERS: readonly ExternalProvider[] = [
|
||||
{ id: 'gpt-3.5-turbo', tier: 'medium', contextLength: 4096 },
|
||||
],
|
||||
},
|
||||
{
|
||||
name: 'm365-copilot-bridge',
|
||||
baseUrl: '', // constructed from M365_COPILOT_BRIDGE_URL env var
|
||||
envKey: 'M365_COPILOT_BRIDGE_URL',
|
||||
rateLimitRpm: 60,
|
||||
enabled: true,
|
||||
models: [
|
||||
{ id: 'microsoft-365-copilot', tier: 'reasoning', contextLength: 128000 },
|
||||
{ id: 'm365-copilot-chat', tier: 'large', contextLength: 128000 },
|
||||
],
|
||||
},
|
||||
{
|
||||
name: 'cerebras',
|
||||
baseUrl: 'https://api.cerebras.ai/v1',
|
||||
@ -146,12 +157,13 @@ const PROVIDERS: readonly ExternalProvider[] = [
|
||||
{
|
||||
name: 'openai-codex',
|
||||
baseUrl: 'https://api.openai.com/v1',
|
||||
envKey: 'OPENAI_API_KEY',
|
||||
envKey: 'OPENAI_CODEX_URL',
|
||||
rateLimitRpm: 60,
|
||||
enabled: true,
|
||||
models: [
|
||||
{ id: 'gpt-4-turbo', tier: 'reasoning', contextLength: 128000 },
|
||||
{ id: 'gpt-3.5-turbo', tier: 'fast', contextLength: 16384 },
|
||||
{ id: 'gpt-5.1-codex', tier: 'reasoning', contextLength: 256000 },
|
||||
{ id: 'gpt-5.1-codex-mini', tier: 'large', contextLength: 256000 },
|
||||
{ id: 'codex-mini-latest', tier: 'medium', contextLength: 200000 },
|
||||
],
|
||||
},
|
||||
{
|
||||
@ -169,16 +181,28 @@ const PROVIDERS: readonly ExternalProvider[] = [
|
||||
{
|
||||
name: 'codex',
|
||||
baseUrl: 'https://api.github.com/copilot_inner/v2',
|
||||
envKey: 'GITHUB_CODEX_TOKEN',
|
||||
envKey: 'CODEX_BRIDGE_URL',
|
||||
rateLimitRpm: 60,
|
||||
enabled: true,
|
||||
models: [
|
||||
{ id: 'github-copilot-x', tier: 'large', contextLength: 8192 },
|
||||
{ id: 'code-davinci-002', tier: 'medium', contextLength: 4096 },
|
||||
{ id: 'gpt-5.1-codex', tier: 'reasoning', contextLength: 256000 },
|
||||
{ id: 'gpt-5.1-codex-mini', tier: 'large', contextLength: 256000 },
|
||||
{ id: 'codex-mini-latest', tier: 'medium', contextLength: 200000 },
|
||||
],
|
||||
},
|
||||
];
|
||||
|
||||
const AUTHLESS_BRIDGE_PROVIDERS = new Set([
|
||||
'claude-bridge',
|
||||
'claude-code',
|
||||
'openai-bridge',
|
||||
'chatgpt-bridge',
|
||||
'copilot-bridge',
|
||||
'm365-copilot-bridge',
|
||||
]);
|
||||
|
||||
const GENERATE_BRIDGE_PROVIDERS = new Set(['claude-bridge', 'claude-code']);
|
||||
|
||||
// ─── Rate Limiter (simple sliding window) ───────────────────────────
|
||||
|
||||
const requestTimestamps: Map<string, number[]> = new Map();
|
||||
@ -213,25 +237,34 @@ function getApiKey(provider: ExternalProvider): string | undefined {
|
||||
return url ? 'claude-code-enabled' : undefined;
|
||||
}
|
||||
if (provider.name === 'openai-bridge') {
|
||||
// openai-bridge uses OPENAI_API_KEY for auth, but also needs bridge URL
|
||||
const apiKey = process.env['OPENAI_API_KEY'];
|
||||
// Subscription bridge auth is handled by the bridge process/CLI session.
|
||||
const url = process.env['OPENAI_BRIDGE_URL'];
|
||||
return apiKey && url ? apiKey : undefined;
|
||||
return url ? 'openai-bridge-enabled' : undefined;
|
||||
}
|
||||
if (provider.name === 'chatgpt-bridge') {
|
||||
// chatgpt-bridge can use same URL as openai-bridge (same service), but needs API key
|
||||
const apiKey = process.env['OPENAI_API_KEY'];
|
||||
// ChatGPT Plus bridge can reuse the OpenAI bridge when configured that way.
|
||||
const url = process.env['CHATGPT_BRIDGE_URL'] || process.env['OPENAI_BRIDGE_URL'];
|
||||
return apiKey && url ? apiKey : undefined;
|
||||
return url ? 'chatgpt-bridge-enabled' : undefined;
|
||||
}
|
||||
if (provider.name === 'copilot-bridge') {
|
||||
// copilot-bridge uses GitHub Copilot subscription (auth handled internally by copilot-api)
|
||||
// Just needs URL to be configured
|
||||
// copilot-bridge uses GitHub Copilot subscription (auth handled internally by copilot-api).
|
||||
const url = process.env['COPILOT_BRIDGE_URL'];
|
||||
return url ? 'copilot-authenticated' : undefined;
|
||||
}
|
||||
if (provider.name === 'm365-copilot-bridge') {
|
||||
// Microsoft 365 Copilot uses Microsoft Graph delegated auth inside the bridge.
|
||||
const url = process.env['M365_COPILOT_BRIDGE_URL'];
|
||||
return url ? 'm365-copilot-bridge-enabled' : undefined;
|
||||
}
|
||||
if (provider.name === 'openai-codex') {
|
||||
const bridgeUrl = process.env['OPENAI_CODEX_URL'] || process.env['CODEX_BRIDGE_URL'];
|
||||
if (bridgeUrl) return 'openai-codex-bridge-enabled';
|
||||
return process.env['OPENAI_API_KEY'] || undefined;
|
||||
}
|
||||
if (provider.name === 'codex') {
|
||||
// codex uses GitHub Codex API token
|
||||
// Codex can run through an authless local/subscription bridge. A token remains supported as fallback.
|
||||
const bridgeUrl = process.env['CODEX_BRIDGE_URL'] || process.env['OPENAI_CODEX_URL'];
|
||||
if (bridgeUrl) return 'codex-bridge-enabled';
|
||||
const token = process.env['GITHUB_CODEX_TOKEN'];
|
||||
return token ? token : undefined;
|
||||
}
|
||||
@ -241,11 +274,11 @@ function getApiKey(provider: ExternalProvider): string | undefined {
|
||||
function getBaseUrl(provider: ExternalProvider): string {
|
||||
if (provider.name === 'claude-bridge') {
|
||||
const url = process.env['CLAUDE_BRIDGE_URL'];
|
||||
return url ? `${url}/v1` : '';
|
||||
return url ?? '';
|
||||
}
|
||||
if (provider.name === 'claude-code') {
|
||||
const url = process.env['CLAUDE_CODE_URL'];
|
||||
return url ? `${url}/v1` : '';
|
||||
return url ?? '';
|
||||
}
|
||||
if (provider.name === 'openai-bridge') {
|
||||
const url = process.env['OPENAI_BRIDGE_URL'];
|
||||
@ -257,7 +290,19 @@ function getBaseUrl(provider: ExternalProvider): string {
|
||||
}
|
||||
if (provider.name === 'copilot-bridge') {
|
||||
const url = process.env['COPILOT_BRIDGE_URL'];
|
||||
return url ? `${url}` : '';
|
||||
return url ? `${url}/v1` : '';
|
||||
}
|
||||
if (provider.name === 'm365-copilot-bridge') {
|
||||
const url = process.env['M365_COPILOT_BRIDGE_URL'];
|
||||
return url ? `${url}/v1` : '';
|
||||
}
|
||||
if (provider.name === 'openai-codex') {
|
||||
const url = process.env['OPENAI_CODEX_URL'] || process.env['CODEX_BRIDGE_URL'];
|
||||
return url ? `${url}/v1` : provider.baseUrl;
|
||||
}
|
||||
if (provider.name === 'codex') {
|
||||
const url = process.env['CODEX_BRIDGE_URL'] || process.env['OPENAI_CODEX_URL'];
|
||||
return url ? `${url}/v1` : provider.baseUrl;
|
||||
}
|
||||
if (provider.name === 'cloudflare') {
|
||||
const accountId = process.env['CLOUDFLARE_ACCOUNT_ID'];
|
||||
@ -271,6 +316,11 @@ export function getAvailableProviders(): readonly ExternalProvider[] {
|
||||
return PROVIDERS.filter((p) => p.enabled && getApiKey(p));
|
||||
}
|
||||
|
||||
/** Returns ALL configured providers (enabled or not, with or without API key). For dashboard listing. */
|
||||
export function getAllProviders(): readonly ExternalProvider[] {
|
||||
return PROVIDERS;
|
||||
}
|
||||
|
||||
function findBestModel(
|
||||
provider: ExternalProvider,
|
||||
targetTier: 'fast' | 'medium' | 'large' | 'reasoning',
|
||||
@ -296,7 +346,11 @@ function findBestModel(
|
||||
|
||||
function buildRequestHeaders(provider: ExternalProvider, apiKey: string): Record<string, string> {
|
||||
const headers: Record<string, string> = { 'Content-Type': 'application/json' };
|
||||
if (!['claude-bridge', 'claude-code', 'openai-bridge', 'chatgpt-bridge', 'copilot-bridge'].includes(provider.name)) {
|
||||
const usesAuthlessBridge = AUTHLESS_BRIDGE_PROVIDERS.has(provider.name)
|
||||
|| (provider.name === 'openai-codex' && !!(process.env['OPENAI_CODEX_URL'] || process.env['CODEX_BRIDGE_URL']))
|
||||
|| (provider.name === 'codex' && !!(process.env['CODEX_BRIDGE_URL'] || process.env['OPENAI_CODEX_URL']));
|
||||
|
||||
if (!usesAuthlessBridge) {
|
||||
headers['Authorization'] = `Bearer ${apiKey}`;
|
||||
}
|
||||
return headers;
|
||||
@ -311,13 +365,29 @@ function buildRequestPayload(model: ExternalModel, request: ExternalCompletionRe
|
||||
};
|
||||
}
|
||||
|
||||
function buildGenerateBridgePayload(model: ExternalModel, request: ExternalCompletionRequest): Record<string, unknown> {
|
||||
const system = request.messages.find((m) => m.role === 'system')?.content;
|
||||
const prompt = request.messages
|
||||
.filter((m) => m.role !== 'system')
|
||||
.map((m) => `${m.role}: ${m.content}`)
|
||||
.join('\n\n');
|
||||
|
||||
return {
|
||||
model: model.id,
|
||||
prompt,
|
||||
system,
|
||||
temperature: request.temperature ?? 0.3,
|
||||
max_tokens: request.max_tokens ?? 2048,
|
||||
};
|
||||
}
|
||||
|
||||
function parseExternalResponse(
|
||||
data: any,
|
||||
model: ExternalModel,
|
||||
provider: ExternalProvider,
|
||||
start: number,
|
||||
): ExternalCompletionResponse {
|
||||
const content = data.choices?.[0]?.message?.content ?? '';
|
||||
const content = data.choices?.[0]?.message?.content ?? data.content ?? data.response ?? data.message?.content ?? '';
|
||||
recordRequest(provider.name);
|
||||
return {
|
||||
response: content,
|
||||
@ -341,14 +411,15 @@ async function callProvider(
|
||||
const baseUrl = getBaseUrl(provider);
|
||||
if (!baseUrl) throw new Error(`No base URL for ${provider.name}`);
|
||||
|
||||
const url = `${baseUrl}/chat/completions`;
|
||||
const generateBridge = GENERATE_BRIDGE_PROVIDERS.has(provider.name);
|
||||
const url = generateBridge ? `${baseUrl}/api/generate` : `${baseUrl}/chat/completions`;
|
||||
const controller = new AbortController();
|
||||
const timer = setTimeout(() => controller.abort(), timeoutMs);
|
||||
const start = Date.now();
|
||||
|
||||
try {
|
||||
const headers = buildRequestHeaders(provider, apiKey);
|
||||
const payload = buildRequestPayload(model, request);
|
||||
const payload = generateBridge ? buildGenerateBridgePayload(model, request) : buildRequestPayload(model, request);
|
||||
|
||||
const response = await fetch(url, {
|
||||
method: 'POST',
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@ -78,8 +78,9 @@ export async function healthRoute(fastify: FastifyInstance): Promise<void> {
|
||||
// Check if this is a dashboard UI request with ?ui=1 or ?dashboard=1
|
||||
const query = request.query as any;
|
||||
const isDashboardRequest = query.ui || query.dashboard;
|
||||
const acceptsHtml = String(request.headers.accept ?? '').includes('text/html');
|
||||
|
||||
if (isDashboardRequest) {
|
||||
if (isDashboardRequest || acceptsHtml) {
|
||||
try {
|
||||
const __filename = fileURLToPath(import.meta.url);
|
||||
const __dirname = dirname(__filename);
|
||||
|
||||
@ -11,6 +11,22 @@ export async function staticRoute(fastify: FastifyInstance): Promise<void> {
|
||||
|
||||
logger.info({ publicDir }, 'Static file serving initialized');
|
||||
|
||||
function sendHtml(filename: string, reply: any) {
|
||||
const filePath = join(publicDir, filename);
|
||||
if (!existsSync(filePath)) {
|
||||
logger.warn({ path: filePath }, `${filename} not found`);
|
||||
return reply.status(404).send({ error: `${filename} not found` });
|
||||
}
|
||||
|
||||
const content = readFileSync(filePath, 'utf-8');
|
||||
return reply
|
||||
.header('Cache-Control', 'no-cache, no-store, must-revalidate, max-age=0')
|
||||
.header('Pragma', 'no-cache')
|
||||
.header('Expires', '0')
|
||||
.type('text/html')
|
||||
.send(content);
|
||||
}
|
||||
|
||||
// Serve root path
|
||||
fastify.get('/', async (request, reply) => {
|
||||
logger.info({ method: request.method, url: request.url, host: request.hostname }, 'Root path requested');
|
||||
@ -26,13 +42,47 @@ export async function staticRoute(fastify: FastifyInstance): Promise<void> {
|
||||
|
||||
// Serve /dashboard.html
|
||||
fastify.get('/dashboard.html', async (_request, reply) => {
|
||||
const dashboardPath = join(publicDir, 'dashboard.html');
|
||||
if (!existsSync(dashboardPath)) {
|
||||
logger.warn({ path: dashboardPath }, 'dashboard.html not found');
|
||||
return reply.status(404).send({ error: 'dashboard.html not found' });
|
||||
}
|
||||
const content = readFileSync(dashboardPath, 'utf-8');
|
||||
return reply.type('text/html').send(content);
|
||||
return sendHtml('dashboard.html', reply);
|
||||
});
|
||||
|
||||
fastify.get('/dashboard-v2.html', async (_request, reply) => {
|
||||
return sendHtml('dashboard-v2.html', reply);
|
||||
});
|
||||
|
||||
fastify.get('/v2/dashboard', async (_request, reply) => {
|
||||
return sendHtml('dashboard-v2.html', reply);
|
||||
});
|
||||
|
||||
fastify.get('/v2/dashboard/', async (_request, reply) => {
|
||||
return sendHtml('dashboard-v2.html', reply);
|
||||
});
|
||||
|
||||
fastify.get('/v2', async (_request, reply) => {
|
||||
return sendHtml('dashboard-v2.html', reply);
|
||||
});
|
||||
|
||||
fastify.get('/v2/', async (_request, reply) => {
|
||||
return sendHtml('dashboard-v2.html', reply);
|
||||
});
|
||||
|
||||
fastify.get('/dashboard/v2', async (_request, reply) => {
|
||||
return sendHtml('dashboard-v2.html', reply);
|
||||
});
|
||||
|
||||
fastify.get('/dashboard/v2/', async (_request, reply) => {
|
||||
return sendHtml('dashboard-v2.html', reply);
|
||||
});
|
||||
|
||||
fastify.get('/api/dashboard-v2', async (_request, reply) => {
|
||||
return sendHtml('dashboard-v2.html', reply);
|
||||
});
|
||||
|
||||
fastify.get('/api/v2/dashboard', async (_request, reply) => {
|
||||
return sendHtml('dashboard-v2.html', reply);
|
||||
});
|
||||
|
||||
fastify.get('/api/dashboard/v2', async (_request, reply) => {
|
||||
return sendHtml('dashboard-v2.html', reply);
|
||||
});
|
||||
|
||||
// Serve /api/dashboard as HTML for compatibility
|
||||
|
||||
@ -126,10 +126,10 @@ export async function registerHTTPSRedirectMiddleware(server: FastifyInstance) {
|
||||
*/
|
||||
export async function registerSecurityHeadersMiddleware(server: FastifyInstance) {
|
||||
server.addHook('onSend', async (request, reply) => {
|
||||
// Content Security Policy - strict, no inline scripts
|
||||
// Content Security Policy for the self-contained dashboard UI.
|
||||
reply.header(
|
||||
'Content-Security-Policy',
|
||||
"default-src 'self'; script-src 'self'; object-src 'none'; frame-ancestors 'none'; base-uri 'self'; form-action 'self'"
|
||||
"default-src 'self'; script-src 'self' 'unsafe-inline'; style-src 'self' 'unsafe-inline'; object-src 'none'; frame-ancestors 'none'; base-uri 'self'; form-action 'self'"
|
||||
);
|
||||
|
||||
// Prevent clickjacking
|
||||
|
||||
@ -77,6 +77,7 @@ async function buildServer() {
|
||||
directives: {
|
||||
defaultSrc: ["'self'"],
|
||||
scriptSrc: ["'self'", "'unsafe-inline'"],
|
||||
styleSrc: ["'self'", "'unsafe-inline'"],
|
||||
objectSrc: ["'none'"],
|
||||
},
|
||||
},
|
||||
@ -100,7 +101,7 @@ async function buildServer() {
|
||||
|
||||
await server.register(fastifyRateLimit, {
|
||||
global: true,
|
||||
max: 100,
|
||||
max: 1000,
|
||||
timeWindow: '1 minute',
|
||||
keyGenerator: (request) => {
|
||||
const caller = (request.headers['x-caller-id'] as string) ?? 'default';
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user