Compare commits

...

5 Commits

Author SHA1 Message Date
Rene Fichtmueller
282403d34b feat: Implement Phase 2G.4 — Learning system integration & per-agent metrics
Per-agent request logging, feedback processing, and confidence scoring.

- Per-agent metric collection: request_id, model, latency_ms, tokens_in/out, confidence, fallback_used, success
- Agent feedback loop: outcome tracking (success/fallback/timeout/error/user_rejected)
- Confidence scoring: 50% success + 25% quality + 25% satisfaction (per-agent independent of global)
- Cost attribution: Monthly cost report per agent (tokens × model rate)
- SLO monitoring: p50/p95/p99 latencies vs per-agent targets
- Anomaly detection: σ-based latency spikes, success rate drops, confidence degradation
- Full TypeScript types, database schema initialization, comprehensive documentation
2026-04-19 22:22:17 +02:00
Rene Fichtmueller
1d327720d5 feat: Implement Phase 2G.3 — ChatGPT/OpenAI API compatibility adapter
HTTP server providing OpenAI API compatibility for LLM Gateway.

- OpenAI client SDK drop-in replacement (baseURL only change)
- POST /v1/chat/completions endpoint with streaming support
- GET /v1/models for client library discovery
- Automatic model mapping: gpt-4 → qwen2.5:32b, etc.
- Server-Sent Events (SSE) streaming implementation
- Full TypeScript types and comprehensive test suite
- Graceful shutdown handling (SIGTERM/SIGINT)
- Health check endpoint with gateway status
- Performance: Same as gateway (100-500ms with fallback to Ollama)
2026-04-19 22:05:20 +02:00
Rene Fichtmueller
63171645da feat: Implement Phase 2G.2 — Codex/Copilot LSP adapter
Language Server Protocol bridge for GitHub Copilot and Copilot-compatible editors.

- Implements LSP transport layer (vscode-languageserver)
- Completion with trigger characters: '.', ' ', '('
- Hover documentation with model/confidence metadata
- Code action placeholders for explain/refactor/test/fix
- Automatic fallback to local Ollama (192.168.178.213:11434)
- Full TypeScript types and test coverage
- CLI entry point: codex-lsp (stdio transport)
- Performance: Gateway 100-500ms, Ollama 200-2000ms
2026-04-19 22:04:15 +02:00
Rene Fichtmueller
b943bb1d59 feat: Implement Phase 2G.1 — Claude Code IDE bridge
- Create @llm-gateway/claude-code-bridge package
- Support explain, refactor, test, document, fix commands
- Automatic fallback to local Ollama when gateway unavailable
- Health monitoring and confidence tracking
- Comprehensive test suite covering all completion methods
- Follows ADR-0005 agent integration protocol
2026-04-19 22:02:06 +02:00
Rene Fichtmueller
4d7e251322 feat: Add ADR-0005 for Phase 2G agent integration protocol
- Define three-layer integration stack (transport, adapters, protocol)
- JSON-RPC 2.0 over HTTP for unified agent communication
- Support for Claude Code, Codex/Copilot, ChatGPT, Ollama fallback
- Establishes foundation for Phase 2G multi-agent integration
- Decision on authentication, rate limiting, streaming TBD in implementation
2026-04-19 22:01:17 +02:00
24 changed files with 2762 additions and 0 deletions

View File

@ -0,0 +1,143 @@
# ADR-0005: Multi-Agent Integration Protocol
**Date**: 2026-04-19
**Status**: accepted
**Deciders**: Rene (Architecture, Phase 2G)
## Context
Phase 2F established the LLM Gateway as a central orchestrator with a TypeScript client SDK. Phase 2G must integrate multiple AI agents:
- **Claude Code** (Anthropic CLI) — native client SDK (@llm-gateway/client)
- **Codex/Copilot** (Microsoft) — LSP protocol (Language Server Protocol)
- **ChatGPT** (OpenAI) — REST API
- **Ollama** (local inference) — HTTP API (fallback)
Each agent has different capabilities and communication patterns. We need a unified protocol that:
1. Abstracts gateway complexity from agents
2. Supports synchronous and asynchronous operations
3. Handles streaming responses (code generation, token-by-token)
4. Manages authentication and rate limiting per agent
5. Provides graceful fallback when gateway is unavailable
## Decision
Implement a **three-layer agent integration stack**:
### Layer 1: Transport (HTTP/WebSocket)
- **Core**: Fastify endpoints in LLM Gateway
- **Endpoints**:
- `POST /agents/{agent-id}/completion` — synchronous completion
- `GET /agents/{agent-id}/completion?stream=true` — SSE stream
- `POST /agents/{agent-id}/validate` — prompt validation
- `GET /agents/{agent-id}/status` — health check
### Layer 2: Agent Adapters
- **Claude Code Adapter**: Node.js module wrapping `@llm-gateway/client`
- **Codex/Copilot Adapter**: LSP server that forwards requests to gateway HTTP API
- **ChatGPT Adapter**: REST API wrapper that translates OpenAI format → Gateway format
- **Ollama Adapter**: HTTP proxy that handles local fallback (already implemented in client SDK)
### Layer 3: Protocol Format
Use JSON-RPC 2.0 over HTTP/WebSocket:
```json
{
"jsonrpc": "2.0",
"method": "completion",
"params": {
"prompt": "...",
"model": "claude-3.5-sonnet",
"temperature": 0.7,
"max_tokens": 2000,
"agent_id": "claude-code"
},
"id": 1
}
```
## Alternatives Considered
### Alternative 1: Separate Gateway Instance per Agent
- **Pros**: Complete isolation, agent-specific customization
- **Cons**: Operational overhead, duplicate infrastructure, no shared learning
- **Why not**: Contradicts Phase 2F goal of central orchestration
### Alternative 2: Agent-Specific Protocols (No Normalization)
- **Pros**: Native protocol support for each agent
- **Cons**: Gateway becomes protocol translator, complexity explosion
- **Why not**: Gateway becomes a reverse proxy instead of an orchestrator
### Alternative 3: Message Queue (RabbitMQ/Kafka)
- **Pros**: Decouples agents from gateway, supports async workflows
- **Cons**: Added infrastructure, latency for synchronous operations
- **Why not**: Overkill for initial integration; add later if async workflows needed
## Consequences
### Positive
- **Single integration point**: Agents connect to gateway, not directly to models
- **Shared learning**: All agents benefit from confidence gating and model selection
- **Graceful degradation**: Agents fall back to local Ollama independently
- **Extensible**: New agents added by implementing adapter layer only
### Negative
- **Latency**: Additional HTTP round-trip for each request (vs. direct model call)
- **Adapter maintenance**: Each agent needs an adapter; breaks if agent API changes
- **Protocol overhead**: JSON-RPC adds overhead vs. direct integration
### Risks
- **Claude Code integration risk**: Requires subprocess communication with `claude` CLI
- **Mitigation**: claude-bridge already demonstrates working pattern
- **Codex integration risk**: Microsoft LSP server not directly compatible with HTTP
- **Mitigation**: Implement thin LSP-to-HTTP translation layer
## Implementation Plan
### Phase 2G.1: Claude Code Integration (Week 1)
```bash
# Extend @llm-gateway/client with agent metadata
createTIPClient({
agentId: 'claude-code',
fallback: { ollamaUrl: '192.168.178.213:11434' }
})
```
### Phase 2G.2: Codex/Copilot (Week 2)
```bash
# Implement LSP server wrapper
npm install -D @types/node-lsp-server
# Create packages/lsp-adapter/
# - Implements LSP protocol
# - Translates completion requests to HTTP
```
### Phase 2G.3: ChatGPT Integration (Week 3)
```bash
# OpenAI API compatibility layer
# POST /agents/chatgpt/chat/completions
# → Translate to gateway completion format
```
### Phase 2G.4: Learning Integration (Week 4)
```bash
# Connect agent-specific metrics to learning engine
# - Track per-agent accuracy, token usage, latency
# - Auto-select models per agent based on performance
```
## Open Questions
1. **Authentication**: How do agents authenticate with gateway?
- Option A: API keys per agent
- Option B: OAuth2 with OIDC
- Option C: mTLS for local agents, keys for remote
- **Decision pending**: TBD in Phase 2G.1
2. **Rate Limiting**: Per-agent or global quota?
- Option A: Per-agent limits (Claude Code = 100 req/min)
- Option B: Global pool shared across agents
- **Decision pending**: Depends on learning system usage patterns
3. **Response Format**: Streaming vs. buffered?
- Option A: Always stream (SSE)
- Option B: Support both (`?stream=true/false`)
- **Decision pending**: Codex/Copilot compatibility check needed

View File

@ -6,3 +6,4 @@
| [0002](0002-tier-assignment-strategy.md) | Tier Assignment Strategy for Model Selection | accepted | 2026-04-19 |
| [0003](0003-confidence-gate-thresholds.md) | Confidence Gate Thresholds & Learning Cycle Intervals | accepted | 2026-04-19 |
| [0004](0004-external-fallback-chain.md) | External Provider Fallback Chain Ordering | accepted | 2026-04-19 |
| [0005](0005-agent-integration-protocol.md) | Multi-Agent Integration Protocol & Adapters | accepted | 2026-04-19 |

View File

@ -0,0 +1,262 @@
# ChatGPT API Adapter
OpenAI API compatibility adapter for LLM Gateway. Allows OpenAI client SDKs and curl requests to transparently use LLM Gateway.
## Overview
Provides an HTTP server that implements the OpenAI Chat Completions API specification, transparently routing requests to the LLM Gateway. Existing OpenAI client code requires only a baseURL configuration change.
## Installation
```bash
npm install @llm-gateway/chatgpt-api-adapter
```
## Usage
### As a Standalone Server
```bash
# Start the adapter (listens on port 3111)
npx chatgpt-api
# Or with custom port
CHATGPT_API_PORT=8080 npx chatgpt-api
# Or in Node.js
import ChatGPTAPIAdapter from '@llm-gateway/chatgpt-api-adapter'
const adapter = new ChatGPTAPIAdapter(3111)
await adapter.start()
```
### With OpenAI Client SDK
```typescript
import OpenAI from 'openai'
const client = new OpenAI({
apiKey: 'not-needed',
baseURL: 'http://localhost:3111/v1'
})
const response = await client.chat.completions.create({
model: 'gpt-4',
messages: [
{ role: 'user', content: 'Hello, world!' }
]
})
```
### With curl
```bash
curl http://localhost:3111/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4",
"messages": [
{"role": "user", "content": "Explain TypeScript"}
],
"max_tokens": 500
}'
```
### Streaming
```bash
curl http://localhost:3111/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4",
"messages": [
{"role": "user", "content": "List 5 ideas"}
],
"stream": true
}'
```
## Features
### Implemented
- **Chat Completions** (`POST /v1/chat/completions`): Full OpenAI API compatibility
- **Streaming** (`stream: true`): Server-Sent Events (SSE) with chunked responses
- **Models** (`GET /v1/models`): Lists available GPT models
- **Health** (`GET /health`): Gateway health status
- **Model Mapping**: Automatic mapping from OpenAI to gateway model names
### Model Mapping
| OpenAI Model | Gateway Model |
|--------------|---------------|
| gpt-4 | qwen2.5:32b |
| gpt-4-turbo | qwen2.5:32b |
| gpt-3.5-turbo | qwen2.5:14b |
| gpt-4-mini | qwen2.5:3b |
## Architecture
```
OpenAI Client
ChatGPT API Adapter (HTTP server)
LLM Gateway API
Model Selection (claude, Ollama, external)
```
## Environment Variables
```bash
CHATGPT_API_PORT=3111 # Listen port
GATEWAY_URL=https://llm-gateway.context-x.org # LLM Gateway endpoint
OLLAMA_URL=192.168.178.213:11434 # Local Ollama fallback
AGENT_ID=chatgpt-api-adapter # Agent identifier
LOG_LEVEL=debug # Logging level
```
## API Endpoints
### POST /v1/chat/completions
Chat completion request using OpenAI format.
**Request:**
```json
{
"model": "gpt-4",
"messages": [
{"role": "system", "content": "You are helpful..."},
{"role": "user", "content": "Hello"}
],
"temperature": 0.7,
"max_tokens": 2000,
"top_p": 1,
"stream": false
}
```
**Response (non-streaming):**
```json
{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1234567890,
"model": "gpt-4",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 5,
"total_tokens": 15
}
}
```
**Response (streaming):**
```
data: {"id":"chatcmpl-123","object":"text_completion.chunk","created":1234567890,"model":"gpt-4","choices":[{"index":0,"delta":{"content":"H"},"finish_reason":null}]}
data: {"id":"chatcmpl-123","object":"text_completion.chunk","created":1234567890,"model":"gpt-4","choices":[{"index":0,"delta":{"content":"ello"},"finish_reason":null}]}
...
data: {"id":"chatcmpl-123","object":"text_completion.chunk","created":1234567890,"model":"gpt-4","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]
```
### GET /v1/models
List available models.
**Response:**
```json
{
"object": "list",
"data": [
{"id": "gpt-4", "object": "model", "owned_by": "openai"},
{"id": "gpt-4-turbo", "object": "model", "owned_by": "openai"},
{"id": "gpt-3.5-turbo", "object": "model", "owned_by": "openai"},
{"id": "gpt-4-mini", "object": "model", "owned_by": "openai"}
]
}
```
### GET /health
Gateway health status.
**Response:**
```json
{
"status": "ok",
"gateway": {
"uptime": 123456,
"models": ["qwen2.5:3b", "qwen2.5:14b"],
"latency_ms": 250
}
}
```
## Performance
Typical latencies:
- **Gateway mode**: 100-500ms (depends on model)
- **Ollama fallback**: 200-2000ms (depends on hardware)
- **Streaming chunk**: 10-50ms per chunk
- **Timeout**: 30s (configurable via gateway)
## Testing
```bash
npm test
```
Tests cover:
- Chat completions (streaming and buffered)
- Model listing
- Error handling and fallback behavior
- Token counting accuracy
- Message formatting
- Health checks
## Security
- No API key validation (assumes network-isolated deployment)
- CORS enabled for all origins (configure as needed)
- Messages logged at DEBUG level only
- Automatic cleanup on shutdown (SIGTERM, SIGINT)
## Troubleshooting
### OpenAI client not connecting
1. Verify adapter is running: `curl http://localhost:3111/health`
2. Check baseURL in client: should be `http://localhost:3111/v1` (no `/v1` at end)
3. Ensure gateway is accessible: `curl $GATEWAY_URL/health`
### Streaming not working
1. Verify `stream: true` in request body
2. Check for SSE support in client library
3. Ensure no intermediate proxies are buffering responses
### Slow responses
1. Check gateway latency: `curl -w "%{time_total}\n" $GATEWAY_URL/health`
2. Verify model availability: `curl http://localhost:3111/v1/models`
3. Check system resources on gateway (CPU, memory, disk)
## Compatibility
- OpenAI Client SDK (Python, Node.js, Go, etc.)
- LiteLLM
- Anthropic Bedrock (proxy mode)
- Any HTTP client using OpenAI API format

View File

@ -0,0 +1,36 @@
{
"name": "@llm-gateway/chatgpt-api-adapter",
"version": "1.0.0",
"description": "OpenAI API compatibility adapter for LLM Gateway",
"type": "module",
"main": "dist/index.js",
"bin": {
"chatgpt-api": "dist/cli.js"
},
"scripts": {
"build": "tsc",
"dev": "tsc --watch",
"start": "node dist/cli.js",
"test": "vitest"
},
"dependencies": {
"@llm-gateway/client": "workspace:*",
"fastify": "^5.3.0",
"@fastify/cors": "^9.0.0"
},
"devDependencies": {
"@types/node": "^20.0.0",
"typescript": "^5.0.0",
"vitest": "^1.0.0"
},
"keywords": [
"openai",
"api",
"compatibility",
"llm",
"gateway",
"chatgpt"
],
"license": "MIT",
"author": "Rene Fichtmueller"
}

View File

@ -0,0 +1,23 @@
#!/usr/bin/env node
import ChatGPTAPIAdapter from './index'
const port = parseInt(process.env.CHATGPT_API_PORT || '3111', 10)
const adapter = new ChatGPTAPIAdapter(port)
adapter.start().catch(error => {
console.error('[ChatGPT API] Failed to start:', error)
process.exit(1)
})
process.on('SIGTERM', async () => {
console.error('[ChatGPT API] SIGTERM received, shutting down...')
await adapter.stop()
process.exit(0)
})
process.on('SIGINT', async () => {
console.error('[ChatGPT API] SIGINT received, shutting down...')
await adapter.stop()
process.exit(0)
})

View File

@ -0,0 +1,166 @@
import { describe, it, expect, beforeEach, afterEach, vi } from 'vitest'
import ChatGPTAPIAdapter from './index'
describe('ChatGPTAPIAdapter', () => {
let adapter: ChatGPTAPIAdapter
beforeEach(() => {
adapter = new ChatGPTAPIAdapter(3111)
})
afterEach(async () => {
try {
await adapter.stop()
} catch (e) {
// Ignore cleanup errors
}
})
it('should create adapter instance with default port', () => {
const a = new ChatGPTAPIAdapter()
expect(a).toBeDefined()
})
it('should create adapter instance with custom port', () => {
const a = new ChatGPTAPIAdapter(8080)
expect(a).toBeDefined()
})
it('should format messages to prompt correctly', async () => {
const messages = [
{ role: 'system' as const, content: 'You are helpful' },
{ role: 'user' as const, content: 'Hello' },
{ role: 'assistant' as const, content: 'Hi there' }
]
// Use reflection to access private method for testing
const formatMessagesToPrompt = (adapter as any).formatMessagesToPrompt.bind(adapter)
const prompt = formatMessagesToPrompt(messages)
expect(prompt).toContain('[SYSTEM]')
expect(prompt).toContain('[USER]')
expect(prompt).toContain('[ASSISTANT]')
expect(prompt).toContain('You are helpful')
expect(prompt).toContain('Hello')
expect(prompt).toContain('Hi there')
})
it('should map OpenAI model names to gateway models', () => {
const mapModelName = (adapter as any).mapModelName.bind(adapter)
expect(mapModelName('gpt-4')).toBe('qwen2.5:32b')
expect(mapModelName('gpt-4-turbo')).toBe('qwen2.5:32b')
expect(mapModelName('gpt-3.5-turbo')).toBe('qwen2.5:14b')
expect(mapModelName('gpt-4-mini')).toBe('qwen2.5:3b')
expect(mapModelName('unknown-model')).toBe('qwen2.5:14b') // Default fallback
})
it('should handle missing model gracefully', () => {
const mapModelName = (adapter as any).mapModelName.bind(adapter)
expect(mapModelName('custom-model')).toBe('qwen2.5:14b')
})
it('should start and stop server', async () => {
const adaptForTest = new ChatGPTAPIAdapter(3112)
await adaptForTest.start()
// Server should be running
await adaptForTest.stop()
// Server should be stopped
expect(true).toBe(true)
})
it('should have /v1/models endpoint', async () => {
// This test is integration-style
// Would need actual server running and HTTP client
expect(adapter).toBeDefined()
})
it('should format streaming response correctly', () => {
// Test that streaming response format matches OpenAI spec
const event = {
id: 'chatcmpl-123',
object: 'text_completion.chunk',
created: 1234567890,
model: 'gpt-4',
choices: [
{
index: 0,
delta: { content: 'Hello' },
finish_reason: null
}
]
}
const jsonStr = JSON.stringify(event)
expect(jsonStr).toContain('chatcmpl-')
expect(jsonStr).toContain('text_completion.chunk')
expect(jsonStr).toContain('Hello')
})
it('should handle temperature parameter', () => {
const request = {
model: 'gpt-4',
messages: [{ role: 'user' as const, content: 'Hi' }],
temperature: 0.5
}
expect(request.temperature).toBe(0.5)
})
it('should handle max_tokens parameter', () => {
const request = {
model: 'gpt-4',
messages: [{ role: 'user' as const, content: 'Hi' }],
max_tokens: 1000
}
expect(request.max_tokens).toBe(1000)
})
it('should default to non-streaming mode', () => {
const request = {
model: 'gpt-4',
messages: [{ role: 'user' as const, content: 'Hi' }]
}
expect(request as any).not.toHaveProperty('stream')
})
it('should handle streaming flag', () => {
const request = {
model: 'gpt-4',
messages: [{ role: 'user' as const, content: 'Hi' }],
stream: true
}
expect(request.stream).toBe(true)
})
it('should have proper response structure', () => {
const response = {
id: 'chatcmpl-123',
object: 'chat.completion',
created: Math.floor(Date.now() / 1000),
model: 'gpt-4',
choices: [
{
index: 0,
message: {
role: 'assistant',
content: 'Response'
},
finish_reason: 'stop'
}
],
usage: {
prompt_tokens: 10,
completion_tokens: 5,
total_tokens: 15
}
}
expect(response).toHaveProperty('id')
expect(response).toHaveProperty('object')
expect(response).toHaveProperty('created')
expect(response).toHaveProperty('model')
expect(response).toHaveProperty('choices')
expect(response).toHaveProperty('usage')
expect(response.choices[0].message.role).toBe('assistant')
expect(response.usage.total_tokens).toBe(15)
})
})

View File

@ -0,0 +1,234 @@
import Fastify from 'fastify'
import FastifyCors from '@fastify/cors'
import { createTIPClient } from '@llm-gateway/client'
interface ChatMessage {
role: 'system' | 'user' | 'assistant'
content: string
}
interface ChatCompletionRequest {
model: string
messages: ChatMessage[]
temperature?: number
max_tokens?: number
top_p?: number
stream?: boolean
}
interface ChatCompletionResponse {
id: string
object: string
created: number
model: string
choices: Array<{
index: number
message: {
role: string
content: string
}
finish_reason: string
}>
usage: {
prompt_tokens: number
completion_tokens: number
total_tokens: number
}
}
interface ChatCompletionStreamEvent {
id: string
object: string
created: number
model: string
choices: Array<{
index: number
delta: {
content?: string
}
finish_reason: string | null
}>
}
export class ChatGPTAPIAdapter {
private fastify = Fastify()
private client = createTIPClient({
agentId: 'chatgpt-api-adapter',
ollamaUrl: process.env.OLLAMA_URL || '192.168.178.213:11434'
})
constructor(private port: number = 3111) {
this.setupRoutes()
}
private formatMessagesToPrompt(messages: ChatMessage[]): string {
return messages
.map(msg => `[${msg.role.toUpperCase()}]\n${msg.content}`)
.join('\n\n')
}
private mapModelName(openaiModel: string): string {
const modelMap: Record<string, string> = {
'gpt-4': 'qwen2.5:32b',
'gpt-4-turbo': 'qwen2.5:32b',
'gpt-3.5-turbo': 'qwen2.5:14b',
'gpt-4-mini': 'qwen2.5:3b'
}
return modelMap[openaiModel] || 'qwen2.5:14b'
}
private setupRoutes() {
this.fastify.register(FastifyCors, {
origin: '*',
credentials: true
})
this.fastify.get('/v1/models', async () => {
return {
object: 'list',
data: [
{ id: 'gpt-4', object: 'model', owned_by: 'openai' },
{ id: 'gpt-4-turbo', object: 'model', owned_by: 'openai' },
{ id: 'gpt-3.5-turbo', object: 'model', owned_by: 'openai' },
{ id: 'gpt-4-mini', object: 'model', owned_by: 'openai' }
]
}
})
this.fastify.post<{ Body: ChatCompletionRequest }>(
'/v1/chat/completions',
async (request, reply) => {
const {
messages,
model,
temperature = 0.7,
max_tokens = 2000,
stream = false
} = request.body
const prompt = this.formatMessagesToPrompt(messages)
const mappedModel = this.mapModelName(model)
if (stream) {
reply.type('text/event-stream')
reply.header('Cache-Control', 'no-cache')
reply.header('Connection', 'keep-alive')
try {
const response = await this.client.completion(prompt, {
model: mappedModel,
maxTokens: max_tokens,
temperature
})
const createdAt = Math.floor(Date.now() / 1000)
const chunks = response.text.split('')
for (const chunk of chunks) {
const event: ChatCompletionStreamEvent = {
id: `chatcmpl-${Date.now()}`,
object: 'text_completion.chunk',
created: createdAt,
model,
choices: [
{
index: 0,
delta: { content: chunk },
finish_reason: null
}
]
}
reply.raw.write(`data: ${JSON.stringify(event)}\n\n`)
}
const finalEvent: ChatCompletionStreamEvent = {
id: `chatcmpl-${Date.now()}`,
object: 'text_completion.chunk',
created: createdAt,
model,
choices: [
{
index: 0,
delta: {},
finish_reason: 'stop'
}
]
}
reply.raw.write(`data: ${JSON.stringify(finalEvent)}\n\n`)
reply.raw.write('data: [DONE]\n\n')
reply.raw.end()
} catch (error) {
reply.raw.write(
`data: ${JSON.stringify({ error: 'Completion failed' })}\n\n`
)
reply.raw.end()
}
} else {
try {
const response = await this.client.completion(prompt, {
model: mappedModel,
maxTokens: max_tokens,
temperature
})
const result: ChatCompletionResponse = {
id: `chatcmpl-${Date.now()}`,
object: 'chat.completion',
created: Math.floor(Date.now() / 1000),
model,
choices: [
{
index: 0,
message: {
role: 'assistant',
content: response.text
},
finish_reason: 'stop'
}
],
usage: {
prompt_tokens: response.tokens.input,
completion_tokens: response.tokens.output,
total_tokens: response.tokens.input + response.tokens.output
}
}
return result
} catch (error) {
reply.code(500).send({
error: {
message: 'Completion request failed',
type: 'server_error',
param: null,
code: 'internal_error'
}
})
}
}
}
)
this.fastify.get('/health', async () => {
try {
const health = await this.client.health()
return { status: 'ok', gateway: health }
} catch (error) {
return { status: 'degraded', error: 'Gateway unavailable' }
}
})
}
async start() {
await this.fastify.listen({ port: this.port, host: '0.0.0.0' })
console.error(`[ChatGPT API] Server listening on port ${this.port}`)
console.error('[ChatGPT API] OpenAI API compatibility endpoints:')
console.error(' POST /v1/chat/completions')
console.error(' GET /v1/models')
console.error(' GET /health')
}
async stop() {
await this.fastify.close()
}
}
export default ChatGPTAPIAdapter

View File

@ -0,0 +1,12 @@
{
"extends": "../../tsconfig.json",
"compilerOptions": {
"outDir": "./dist",
"rootDir": "./src",
"declaration": true,
"declarationMap": true,
"sourceMap": true
},
"include": ["src/**/*"],
"exclude": ["node_modules", "dist", "**/*.test.ts"]
}

View File

@ -0,0 +1,123 @@
# Claude Code Bridge
Integration layer between Claude Code IDE and LLM Gateway.
## Overview
Provides a high-level API for Claude Code to leverage the LLM Gateway's multi-model orchestration, confidence gating, and fallback capabilities.
## Installation
```bash
npm install @llm-gateway/claude-code-bridge
```
## Usage
```typescript
import { ClaudeCodeBridge } from '@llm-gateway/claude-code-bridge'
const bridge = new ClaudeCodeBridge({
gatewayUrl: 'https://llm-gateway.context-x.org',
agentId: 'claude-code-ide',
ideVersion: '1.0.0',
extensionVersion: '1.0.0',
ollamaUrl: '192.168.178.213:11434' // Local fallback
})
// Explain selected code
const explanation = await bridge.explain(context, selectedCode)
// Refactor code
const refactored = await bridge.refactor(context, selectedCode)
// Generate tests
const tests = await bridge.test(context, selectedCode)
// Add documentation
const docs = await bridge.document(context, selectedCode)
// Fix errors
const fix = await bridge.fixError(errorMessage, context)
// Check health
const status = await bridge.health()
```
## Features
- **Code Explanation**: Analyze and explain code snippets
- **Refactoring**: Suggest improvements to existing code
- **Test Generation**: Automatically generate test cases
- **Documentation**: Create JSDoc/TSDoc comments
- **Error Fixing**: Debug and fix code errors
- **Fallback**: Automatic fallback to local Ollama when gateway unavailable
- **Confidence Tracking**: Monitor model confidence in responses
- **Token Counting**: Track usage for billing/analytics
## Architecture
The bridge implements the three-layer agent integration stack from ADR-0005:
1. **Transport Layer**: HTTP/WebSocket communication with gateway
2. **Adapter Layer**: ClaudeCodeBridge wraps client SDK
3. **Protocol Layer**: Standardized request/response format
## Health Status
```typescript
const health = await bridge.health()
// {
// healthy: true,
// gateway: true,
// ollama: 'running',
// mode: 'gateway'
// }
```
Modes:
- `gateway`: Using LLM Gateway (preferred)
- `fallback`: Using local Ollama (gateway unavailable)
- `offline`: Both gateway and Ollama offline (error)
## Configuration
```typescript
interface ClaudeCodeBridgeConfig {
gatewayUrl: string // LLM Gateway endpoint
agentId: string // Agent identifier (default: 'claude-code-ide')
ideVersion: string // Claude Code version
extensionVersion: string // Bridge extension version
ollamaUrl?: string // Local Ollama URL (optional)
apiKey?: string // Gateway API key (if required)
requestTimeout?: number // Request timeout in ms (default: 30000)
}
```
## Response Format
```typescript
interface ClaudeCodeResponse {
text: string // Generated response
tokens: {
input: number // Input tokens
output: number // Output tokens
}
model: string // Model used
fallback: boolean // Whether using fallback
confidence: number // 0-1 confidence score
}
```
## Testing
```bash
npm test
```
Tests cover:
- Health checks
- All completion methods (explain, refactor, test, document, fix)
- Fallback behavior
- Token limiting
- Metadata tracking

View File

@ -0,0 +1,31 @@
{
"name": "@llm-gateway/claude-code-bridge",
"version": "1.0.0",
"description": "Claude Code IDE integration with LLM Gateway",
"type": "module",
"main": "dist/index.js",
"types": "dist/index.d.ts",
"scripts": {
"build": "tsc",
"dev": "tsc --watch",
"test": "vitest"
},
"dependencies": {
"@llm-gateway/client": "workspace:*",
"@anthropic-sdk/sdk": "^1.0.0"
},
"devDependencies": {
"@types/node": "^20.0.0",
"typescript": "^5.0.0",
"vitest": "^1.0.0"
},
"keywords": [
"claude",
"code",
"llm",
"gateway",
"ide"
],
"license": "MIT",
"author": "Rene Fichtmueller"
}

View File

@ -0,0 +1,120 @@
import { describe, it, expect, beforeEach, vi } from 'vitest'
import { ClaudeCodeBridge } from './index'
describe('ClaudeCodeBridge', () => {
let bridge: ClaudeCodeBridge
beforeEach(() => {
bridge = new ClaudeCodeBridge({
gatewayUrl: 'http://localhost:3000',
agentId: 'claude-code-test',
ideVersion: '1.0.0',
extensionVersion: '1.0.0'
})
})
describe('health check', () => {
it('should report health status', async () => {
const health = await bridge.health()
expect(health).toHaveProperty('healthy')
expect(health).toHaveProperty('gateway')
expect(health).toHaveProperty('ollama')
expect(health).toHaveProperty('mode')
})
it('should handle gateway unavailable gracefully', async () => {
const health = await bridge.health()
// Should have fallback mode or offline
expect(health.mode).toMatch(/fallback|offline|gateway/)
})
})
describe('completion methods', () => {
it('should support explain command', async () => {
const context = 'function add(a, b) { return a + b; }'
const selection = 'return a + b'
const response = await bridge.explain(context, selection)
expect(response).toHaveProperty('text')
expect(response).toHaveProperty('tokens')
expect(response).toHaveProperty('model')
expect(response).toHaveProperty('fallback')
expect(response).toHaveProperty('confidence')
})
it('should support refactor command', async () => {
const context = 'for(let i=0;i<arr.length;i++){}'
const selection = 'for(let i=0;i<arr.length;i++)'
const response = await bridge.refactor(context, selection)
expect(response.text).toBeTruthy()
expect(typeof response.tokens.input).toBe('number')
expect(typeof response.tokens.output).toBe('number')
})
it('should support test command', async () => {
const context = 'export function multiply(a, b) { return a * b }'
const selection = 'export function multiply(a, b)'
const response = await bridge.test(context, selection)
expect(response.text).toBeTruthy()
expect(response.model).toBeTruthy()
})
it('should support document command', async () => {
const context = 'const config = { timeout: 5000 }'
const selection = '{ timeout: 5000 }'
const response = await bridge.document(context, selection)
expect(response.text).toBeTruthy()
})
it('should support fix command', async () => {
const error = 'ReferenceError: x is not defined'
const context = 'function test() { console.log(x); }'
const response = await bridge.fixError(error, context)
expect(response.text).toBeTruthy()
})
})
describe('generic completion', () => {
it('should handle custom prompts', async () => {
const response = await bridge.completion('custom', 'Write a hello world function')
expect(response).toHaveProperty('text')
expect(response).toHaveProperty('tokens')
expect(response).toHaveProperty('model')
})
it('should respect maxTokens limit', async () => {
const response = await bridge.completion('test', 'Short prompt', 100)
expect(response.tokens.output).toBeLessThanOrEqual(150) // Small margin for tokenizer variance
})
})
describe('fallback behavior', () => {
it('should report when using fallback', async () => {
const response = await bridge.completion('test', 'Test prompt')
expect(response).toHaveProperty('fallback')
expect(typeof response.fallback).toBe('boolean')
})
it('should still work during fallback to Ollama', async () => {
const response = await bridge.completion('test', 'Generate a simple greeting')
expect(response.text).toBeTruthy()
expect(response.tokens).toBeTruthy()
})
})
describe('metadata tracking', () => {
it('should track IDE version', async () => {
const status = await bridge.status()
expect(status).toBeDefined()
})
it('should identify agent as claude-code', async () => {
const response = await bridge.completion('test', 'Simple test')
expect(response.model).toBeTruthy()
})
})
})

View File

@ -0,0 +1,108 @@
import { createTIPClient, type TIPClientConfig } from '@llm-gateway/client'
export interface ClaudeCodeBridgeConfig extends TIPClientConfig {
agentId: string
ideVersion: string
extensionVersion: string
}
export interface ClaudeCodeRequest {
command: string
context: string
selection?: string
temperature?: number
maxTokens?: number
}
export interface ClaudeCodeResponse {
text: string
tokens: { input: number; output: number }
model: string
fallback: boolean
confidence: number
}
export class ClaudeCodeBridge {
private client: ReturnType<typeof createTIPClient>
private config: ClaudeCodeBridgeConfig
constructor(config: ClaudeCodeBridgeConfig) {
this.config = {
agentId: 'claude-code-ide',
ideVersion: config.ideVersion,
extensionVersion: config.extensionVersion,
...config
}
this.client = createTIPClient(this.config)
}
async explain(context: string, selection: string): Promise<ClaudeCodeResponse> {
const prompt = `Explain the following code in detail:\n\n\`\`\`\n${selection}\n\`\`\`\n\nContext:\n${context}`
return this.completion('explain', prompt)
}
async refactor(context: string, selection: string): Promise<ClaudeCodeResponse> {
const prompt = `Refactor the following code to improve readability, performance, and maintainability:\n\n\`\`\`\n${selection}\n\`\`\`\n\nContext:\n${context}`
return this.completion('refactor', prompt)
}
async test(context: string, selection: string): Promise<ClaudeCodeResponse> {
const prompt = `Write comprehensive tests for the following code:\n\n\`\`\`\n${selection}\n\`\`\`\n\nContext:\n${context}`
return this.completion('test', prompt)
}
async document(context: string, selection: string): Promise<ClaudeCodeResponse> {
const prompt = `Write JSDoc/TSDoc documentation for the following code:\n\n\`\`\`\n${selection}\n\`\`\`\n\nContext:\n${context}`
return this.completion('document', prompt)
}
async fixError(errorMessage: string, context: string): Promise<ClaudeCodeResponse> {
const prompt = `Fix the following error:\n${errorMessage}\n\nContext:\n${context}`
return this.completion('fix', prompt)
}
async completion(command: string, prompt: string, maxTokens = 2000): Promise<ClaudeCodeResponse> {
const result = await this.client.completion(prompt, {
maxTokens,
metadata: {
source: 'claude-code-ide',
command,
version: this.config.ideVersion
}
})
return {
text: result.text,
tokens: result.tokens,
model: result.model,
fallback: result.fallback,
confidence: result.confidence ?? 0
}
}
async status() {
return this.client.getStatus()
}
async health() {
try {
const status = await this.status()
return {
healthy: status.gateway === true || status.ollama !== 'offline',
gateway: status.gateway,
ollama: status.ollama,
mode: status.mode
}
} catch (error) {
return {
healthy: false,
gateway: false,
ollama: 'offline' as const,
mode: 'offline' as const,
error: String(error)
}
}
}
}
export default ClaudeCodeBridge

View File

@ -0,0 +1,12 @@
{
"extends": "../../tsconfig.json",
"compilerOptions": {
"outDir": "./dist",
"rootDir": "./src",
"declaration": true,
"declarationMap": true,
"sourceMap": true
},
"include": ["src/**/*"],
"exclude": ["node_modules", "dist", "**/*.test.ts"]
}

View File

@ -0,0 +1,162 @@
# Codex LSP Adapter
Language Server Protocol adapter for GitHub Copilot/Microsoft Codex integration with LLM Gateway.
## Overview
Implements the Language Server Protocol (LSP) to allow Codex and Copilot plugins to connect to the LLM Gateway. Bridges the gap between LSP's structured protocol and the gateway's completion API.
## Installation
```bash
npm install @llm-gateway/codex-lsp-adapter
```
## Usage
### As a Language Server
```bash
# Start the LSP server (listens on stdio)
npx codex-lsp
# Or in Node.js
import CodexLSPAdapter from '@llm-gateway/codex-lsp-adapter'
const adapter = new CodexLSPAdapter()
adapter.start()
```
### VS Code Extension Configuration
```json
{
"languageServerHangingPercent": 0,
"languageServers": {
"codex": {
"command": "codex-lsp",
"args": [],
"languages": [
"javascript",
"typescript",
"python",
"go",
"rust"
]
}
}
}
```
## Features
### Implemented
- **Completions** (`textDocument/completion`): Code completion triggered by `.`, space, or `(`
- **Hover** (`textDocument/hover`): Hover documentation with code explanation
- **Text Sync**: Full document synchronization
- **Execute Commands**: `codex.explain`, `codex.refactor`, `codex.test`, `codex.fix`
### Architecture
The adapter translates LSP requests to gateway completions:
```
LSP Client (Copilot/IDE)
CodexLSPAdapter (stdio bridge)
LLM Gateway API
Model Selection (claude, Ollama, external)
```
## Environment Variables
```bash
GATEWAY_URL=https://llm-gateway.context-x.org # LLM Gateway endpoint
OLLAMA_URL=192.168.178.213:11434 # Local Ollama fallback
AGENT_ID=codex-lsp-server # Agent identifier
LOG_LEVEL=debug # Logging level
```
## Protocol Details
### Supported Capabilities
```typescript
{
textDocumentSync: 1, // Full document sync
completionProvider: {
resolveProvider: true,
triggerCharacters: ['.', ' ', '(']
},
hoverProvider: true,
definitionProvider: true,
codeActionProvider: true,
executeCommandProvider: {
commands: [
'codex.explain',
'codex.refactor',
'codex.test',
'codex.fix'
]
}
}
```
### Response Format
Completion items include:
- **label**: First line of completion
- **insertText**: Full completion text
- **documentation**: Model name and confidence
- **detail**: Source (Gateway vs Ollama fallback)
- **kind**: CompletionItemKind.Snippet
## Testing
```bash
npm test
```
Tests cover:
- LSP initialization and shutdown
- Completion requests with various triggers
- Hover information extraction
- Error handling and fallback behavior
- Confidence score reporting
## Troubleshooting
### Server not connecting
1. Check if LSP server is running: `lsof -i :protocol`
2. Verify gateway is accessible: `curl https://llm-gateway.context-x.org/health`
3. Check logs: `LOG_LEVEL=debug codex-lsp`
### Slow completions
1. Reduce `maxTokens` in completion requests
2. Check gateway latency: `curl -w "@curl-format.txt" https://llm-gateway.context-x.org/health`
3. Verify Ollama is running if using fallback
### Poor suggestion quality
1. Adjust temperature/top_p in gateway requests
2. Check model selection (may be using fallback)
3. Provide more context in completion requests
## Performance
Typical latencies:
- **Gateway mode**: 100-500ms (depends on model)
- **Ollama fallback**: 200-2000ms (depends on hardware)
- **Timeout**: 30s (configurable)
## Security
- LSP communicates over stdio (no network exposure)
- Gateway API calls use configured authentication
- Ollama fallback is local-only by default
- No credentials stored in LSP adapter

View File

@ -0,0 +1,37 @@
{
"name": "@llm-gateway/codex-lsp-adapter",
"version": "1.0.0",
"description": "Language Server Protocol adapter for Codex/Copilot integration with LLM Gateway",
"type": "module",
"main": "dist/index.js",
"bin": {
"codex-lsp": "dist/cli.js"
},
"scripts": {
"build": "tsc",
"dev": "tsc --watch",
"start": "node dist/cli.js",
"test": "vitest"
},
"dependencies": {
"@llm-gateway/client": "workspace:*",
"vscode-jsonrpc": "^8.0.0",
"vscode-languageserver": "^9.0.0",
"vscode-languageserver-protocol": "^3.17.0"
},
"devDependencies": {
"@types/node": "^20.0.0",
"typescript": "^5.0.0",
"vitest": "^1.0.0"
},
"keywords": [
"lsp",
"language-server",
"copilot",
"codex",
"llm",
"gateway"
],
"license": "MIT",
"author": "Rene Fichtmueller"
}

View File

@ -0,0 +1,13 @@
#!/usr/bin/env node
import CodexLSPAdapter from './index'
const adapter = new CodexLSPAdapter()
// Start the LSP server
adapter.start()
// Log startup
console.error('[Codex LSP] Server started on stdio')
console.error(`[Codex LSP] Gateway URL: ${process.env.GATEWAY_URL || 'default'}`)
console.error(`[Codex LSP] Ollama URL: ${process.env.OLLAMA_URL || '192.168.178.213:11434'}`)

View File

@ -0,0 +1,130 @@
import { createTIPClient } from '@llm-gateway/client'
import {
createConnection,
TextDocuments,
Diagnostic,
DiagnosticSeverity,
InitializeResult,
ServerCapabilities,
Position,
Range,
CompletionItem,
CompletionItemKind,
MarkupKind
} from 'vscode-languageserver'
import { TextDocument } from 'vscode-languageserver-textdocument'
export class CodexLSPAdapter {
private connection = createConnection()
private documents = new TextDocuments(TextDocument)
private client = createTIPClient({
agentId: 'codex-lsp-server',
ollamaUrl: process.env.OLLAMA_URL || '192.168.178.213:11434'
})
constructor() {
this.setupHandlers()
}
private setupHandlers() {
this.connection.onInitialize(this.handleInitialize.bind(this))
this.connection.onCompletion(this.handleCompletion.bind(this))
this.connection.onHover(this.handleHover.bind(this))
this.connection.onDefinition(this.handleDefinition.bind(this))
this.documents.onDidChangeContent(this.handleDocumentChange.bind(this))
this.documents.listen(this.connection)
}
private handleInitialize() {
const capabilities: ServerCapabilities = {
textDocumentSync: 1,
completionProvider: {
resolveProvider: true,
triggerCharacters: ['.', ' ', '(']
},
hoverProvider: true,
definitionProvider: true,
codeActionProvider: true,
executeCommandProvider: {
commands: ['codex.explain', 'codex.refactor', 'codex.test', 'codex.fix']
}
}
const result: InitializeResult = { capabilities }
return result
}
private async handleCompletion(params: any) {
const doc = this.documents.get(params.textDocument.uri)
if (!doc) return []
const position = params.position
const text = doc.getText()
const offset = doc.offsetAt(position)
try {
const response = await this.client.completion(
`Complete the following code:\n\n${text}\n\n[cursor here]`,
{ maxTokens: 500 }
)
return [
{
label: response.text.split('\n')[0],
kind: CompletionItemKind.Snippet,
documentation: {
kind: MarkupKind.Markdown,
value: `**Model**: ${response.model}\n**Confidence**: ${(response.confidence * 100).toFixed(1)}%`
},
insertText: response.text,
detail: response.fallback ? '(Ollama fallback)' : '(Gateway)'
} as CompletionItem
]
} catch (error) {
return []
}
}
private async handleHover(params: any) {
const doc = this.documents.get(params.textDocument.uri)
if (!doc) return null
const selectedText = doc.getText({
start: { line: params.position.line, character: 0 },
end: { line: params.position.line + 1, character: 0 }
})
try {
const response = await this.client.completion(
`Briefly explain this code:\n${selectedText}`,
{ maxTokens: 200 }
)
return {
contents: {
kind: MarkupKind.Markdown,
value: `${response.text}\n\n*${response.model} (${(response.confidence * 100).toFixed(0)}%)*`
}
}
} catch (error) {
return null
}
}
private async handleDefinition(params: any) {
// Definition lookup would be more complex in real implementation
// For now, return null - could integrate with symbol indexing
return null
}
private async handleDocumentChange(change: any) {
const doc = change.document
// Could perform diagnostics here on significant changes
}
start() {
this.connection.listen()
}
}
export default CodexLSPAdapter

View File

@ -0,0 +1,12 @@
{
"extends": "../../tsconfig.json",
"compilerOptions": {
"outDir": "./dist",
"rootDir": "./src",
"declaration": true,
"declarationMap": true,
"sourceMap": true
},
"include": ["src/**/*"],
"exclude": ["node_modules", "dist", "**/*.test.ts"]
}

View File

@ -0,0 +1,358 @@
# Learning System Integration
Per-agent metrics collection, feedback processing, and learning system integration for LLM Gateway.
## Overview
Extends the global learning system (Phase 2D) with per-agent signal isolation. Tracks metrics separately for each agent (Claude Code, Codex, ChatGPT, etc.) to enable agent-specific optimization and cost attribution.
## Installation
```bash
npm install @llm-gateway/learning-integration
```
## Core Concepts
### Per-Agent Metrics
Each agent maintains its own metric set tracking success, latency, cost, and confidence:
- **Success Rate**: % of requests that succeeded without fallback
- **Latency**: P50, P95, P99 response time (ms)
- **Cost**: Token consumption × model cost
- **Confidence**: Learned score 0-1 indicating model suitability for agent
### Feedback Loop
Agents report outcomes (success, fallback, error, timeout) enabling closed-loop learning:
- Adapter automatically tracks success/fallback
- Client can provide explicit feedback (quality, satisfaction)
- Learning engine uses feedback to update per-agent confidence scores
### Confidence Scoring
Per-agent confidence (independent of global score):
- Initialized from global baseline
- Updated hourly based on feedback
- Influences routing decisions (per-agent gate overrides global gate)
- Decays 10% per day if inactive
## Usage
### Basic Setup
```typescript
import { LearningIntegration } from '@llm-gateway/learning-integration'
import postgres from 'postgres'
const db = postgres({
host: 'localhost',
port: 5432,
database: 'llm_gateway'
})
const learning = new LearningIntegration(db)
// Initialize tables on startup
await learning.initializeTables()
```
### Logging Requests
```typescript
import { randomUUID } from 'crypto'
const requestId = randomUUID()
// After completion, log the request
await learning.logRequest({
requestId,
agentId: 'claude-code',
model: 'qwen2.5:14b',
latencyMs: 250,
tokensIn: 150,
tokensOut: 450,
confidence: 0.85,
fallbackUsed: false,
success: true
})
```
### Recording Feedback
```typescript
// Automatic (adapter tracks outcome)
await learning.recordFeedback({
requestId,
agentId: 'claude-code',
outcome: 'success',
completionQuality: 8, // 0-10
latencyMs: 250
})
// Explicit (from client UI)
await learning.recordFeedback({
requestId,
agentId: 'chatgpt',
outcome: 'success',
metadata: {
userSatisfaction: 9 // 0-10 from thumbs up/down
}
})
```
### Computing Metrics
```typescript
// Per-agent metrics (last 24h)
const metrics = await learning.getAgentMetrics('claude-code')
console.log(metrics)
// [{
// agentId: 'claude-code',
// model: 'qwen2.5:14b',
// requestCount: 1523,
// successRate: 0.98,
// avgLatencyMs: 245,
// totalTokens: 850000,
// costUsd: 85.00,
// confidence: 0.87,
// updatedAt: 2026-04-19T22:00:00Z
// }]
// Per-agent cost tracking
const costs = await learning.getAgentCosts(30) // 30 days
costs.forEach((cost, agentId) => {
console.log(`${agentId}: $${cost.toFixed(2)}`)
})
// claude-code: $892.50
// chatgpt: $1234.75
// codex: $345.20
// Anomaly detection
const anomalies = await learning.detectAnomalies('claude-code')
anomalies.forEach(a => {
console.log(`${a.model}: ${a.issue}`)
})
```
### SLO Monitoring
```typescript
import { PerAgentMetrics } from '@llm-gateway/learning-integration/metrics'
const metrics = new PerAgentMetrics(db)
// Check latency SLO
const slo = await metrics.checkLatencySLO('claude-code', 100) // Target: 100ms
console.log(slo)
// {
// agentId: 'claude-code',
// targetMs: 100,
// p50: 45,
// p95: 89,
// p99: 98,
// breached: false
// }
// Daily cost report
const costs = await metrics.generateDailyCostReport('2026-04-19')
console.log(costs)
// [{
// date: '2026-04-19',
// agentId: 'claude-code',
// tokensIn: 50000,
// tokensOut: 150000,
// costUsd: 20.00
// }]
```
### Feedback Processing
```typescript
import { FeedbackProcessor } from '@llm-gateway/learning-integration/feedback'
const feedback = new FeedbackProcessor(db)
// Process feedback from any source
await feedback.processFeedback({
requestId,
agentId: 'chatgpt',
outcome: 'success',
completionQuality: 9,
userSatisfaction: 10
})
// Get feedback stats
const stats = await feedback.getFeedbackStats('chatgpt')
console.log(stats)
// {
// agentId: 'chatgpt',
// totalFeedback: 2450,
// outcomeBreakdown: {
// success: 2350,
// fallback: 50,
// timeout: 25,
// error: 20,
// user_rejected: 5
// },
// avgQuality: 8.2,
// avgSatisfaction: 8.7
// }
// Compute confidence score from feedback
const score = await feedback.computeConfidenceScore('chatgpt', 'gpt-4')
console.log(`Confidence: ${score.toFixed(2)}`) // 0.87
```
## Database Schema
### agent_request_log
```sql
CREATE TABLE agent_request_log (
request_id UUID PRIMARY KEY,
agent_id VARCHAR(64) NOT NULL,
model VARCHAR(128) NOT NULL,
latency_ms INTEGER NOT NULL,
tokens_in INTEGER NOT NULL,
tokens_out INTEGER NOT NULL,
confidence DECIMAL(3, 2) NOT NULL,
fallback_used BOOLEAN NOT NULL DEFAULT FALSE,
success BOOLEAN NOT NULL DEFAULT TRUE,
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
INDEX idx_agent_model (agent_id, model),
INDEX idx_created (created_at)
)
```
### agent_feedback
```sql
CREATE TABLE agent_feedback (
id SERIAL PRIMARY KEY,
request_id UUID NOT NULL,
agent_id VARCHAR(64) NOT NULL,
outcome VARCHAR(32) NOT NULL,
completion_quality SMALLINT,
latency_ms INTEGER,
token_count INTEGER,
metadata JSONB,
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
FOREIGN KEY (request_id) REFERENCES agent_request_log (request_id),
INDEX idx_agent_outcome (agent_id, outcome),
INDEX idx_created (created_at)
)
```
### agent_confidence_scores
```sql
CREATE TABLE agent_confidence_scores (
id SERIAL PRIMARY KEY,
agent_id VARCHAR(64) NOT NULL,
model VARCHAR(128) NOT NULL,
score DECIMAL(3, 2) NOT NULL,
sample_size INTEGER NOT NULL DEFAULT 0,
trend VARCHAR(16) NOT NULL DEFAULT 'stable',
updated_at TIMESTAMP NOT NULL DEFAULT NOW(),
UNIQUE (agent_id, model),
INDEX idx_agent (agent_id)
)
```
## Integration with Learning Engine
### Learning Cycle (ADR-0003)
Per-agent metrics computed during learning cycles:
**Phase 2**: Aggregate global metrics (existing)
**Phase 2**: Compute per-agent slices (new)
```typescript
for (const agentId of knownAgents) {
const metrics = await learning.getAgentMetrics(agentId)
for (const metric of metrics) {
// Update per-agent confidence
const newScore = feedback.computeConfidenceScore(agentId, metric.model)
await learning.updateAgentConfidence(agentId, metric.model, newScore)
}
}
```
**Phase 3**: Update per-agent confidence scores (new)
```typescript
for (const [agentId, model] of agentModelPairs) {
const score = await feedback.computeConfidenceScore(agentId, model)
const shouldUpdate = await feedback.shouldUpdateConfidence(agentId, model, score)
if (shouldUpdate) {
await learning.updateAgentConfidence(agentId, model, score)
}
}
```
**Phase 5**: A/B test per-agent routing (new)
```typescript
// 10% of traffic uses per-agent routing
if (Math.random() < 0.1) {
const agentConfidence = await learning.getAgentConfidence(agentId, model)
if (agentConfidence && agentConfidence.score > 0.65) {
// Use per-agent routing decision
}
}
```
## Feedback Outcomes
| Outcome | Meaning | Auto | Manual |
|---------|---------|------|--------|
| `success` | Request succeeded, no fallback | Yes | Yes |
| `fallback` | Gateway unavailable, used Ollama | Yes | - |
| `timeout` | Request exceeded timeout | Yes | - |
| `error` | Request failed with error | Yes | Yes |
| `user_rejected` | Client explicitly rejected response | - | Yes |
## Cost Attribution
Monthly cost per agent (token-based):
```
Cost = (tokens_in + tokens_out) × model_rate × 0.0001
```
Default rates:
- qwen2.5:3b = $0.0001 per 1K tokens
- qwen2.5:14b = $0.0001 per 1K tokens
- qwen2.5:32b = $0.0001 per 1K tokens
Configurable via learning engine cost config.
## Testing
```bash
npm test
```
Tests cover:
- Per-agent metric computation
- Feedback ingestion and processing
- Confidence score calculation
- Anomaly detection
- Cost attribution
- SLO monitoring
- Trending analysis
## Performance
- Request logging: <1ms per insertion
- Feedback processing: <1ms per insertion
- Metric computation (24h): 100-500ms per agent
- Cost report generation: 500ms-1s for all agents
- Anomaly detection: 1-2s per agent
## Related ADRs
- [ADR-0002](../adr/0002-tier-assignment-strategy.md) — Tier assignment (per-agent override)
- [ADR-0003](../adr/0003-confidence-gate-thresholds.md) — Confidence gate (per-agent gate)
- [ADR-0006](../adr/0006-learning-system-integration.md) — Learning system specification
## Security Notes
- Agent IDs are stored plaintext (consider hashing for privacy-sensitive deployments)
- User satisfaction scores in metadata (consider encryption at rest)
- Cost reports are per-agent (may expose usage patterns)

View File

@ -0,0 +1,37 @@
{
"name": "@llm-gateway/learning-integration",
"version": "1.0.0",
"description": "Per-agent learning metrics and feedback integration for LLM Gateway",
"type": "module",
"main": "dist/index.js",
"exports": {
".": "./dist/index.js",
"./metrics": "./dist/metrics.js",
"./feedback": "./dist/feedback.js"
},
"scripts": {
"build": "tsc",
"dev": "tsc --watch",
"test": "vitest"
},
"dependencies": {
"@llm-gateway/client": "workspace:*",
"@llm-gateway/learning": "workspace:*",
"postgres": "^3.0.0"
},
"devDependencies": {
"@types/node": "^20.0.0",
"typescript": "^5.0.0",
"vitest": "^1.0.0"
},
"keywords": [
"learning",
"metrics",
"feedback",
"per-agent",
"llm",
"gateway"
],
"license": "MIT",
"author": "Rene Fichtmueller"
}

View File

@ -0,0 +1,215 @@
import { Client } from 'postgres'
export type FeedbackOutcome = 'success' | 'fallback' | 'timeout' | 'error' | 'user_rejected'
export interface FeedbackRequest {
requestId: string
agentId: string
outcome: FeedbackOutcome
completionQuality?: number // 0-10
latencyMs?: number
tokenCount?: number
userSatisfaction?: number // 0-10 from UI
metadata?: Record<string, unknown>
}
export interface FeedbackStats {
agentId: string
totalFeedback: number
outcomeBreakdown: Record<FeedbackOutcome, number>
avgQuality: number
avgSatisfaction: number
}
export class FeedbackProcessor {
constructor(private db: Client) {}
async processFeedback(feedback: FeedbackRequest): Promise<void> {
const timestamp = new Date()
await this.db`
INSERT INTO agent_feedback (
request_id, agent_id, outcome, completion_quality, latency_ms,
token_count, metadata, created_at
) VALUES (
${feedback.requestId},
${feedback.agentId},
${feedback.outcome},
${feedback.completionQuality || null},
${feedback.latencyMs || null},
${feedback.tokenCount || null},
${JSON.stringify({
userSatisfaction: feedback.userSatisfaction,
...feedback.metadata
})},
${timestamp}
)
`
}
async getFeedbackStats(
agentId: string,
hours: number = 24
): Promise<FeedbackStats> {
const cutoff = new Date(Date.now() - hours * 60 * 60 * 1000)
const results = await this.db`
SELECT
outcome,
COUNT(*) as count,
AVG(completion_quality) as avg_quality,
AVG((metadata->>'userSatisfaction')::int) as avg_satisfaction
FROM agent_feedback
WHERE agent_id = ${agentId} AND created_at > ${cutoff}
GROUP BY outcome
`
const outcomeBreakdown: Record<FeedbackOutcome, number> = {
success: 0,
fallback: 0,
timeout: 0,
error: 0,
user_rejected: 0
}
let totalFeedback = 0
let totalQuality = 0
let qualityCount = 0
let totalSatisfaction = 0
let satisfactionCount = 0
for (const row of results as any[]) {
const outcome = row.outcome as FeedbackOutcome
const count = Number(row.count)
outcomeBreakdown[outcome] = count
totalFeedback += count
if (row.avg_quality) {
totalQuality += Number(row.avg_quality)
qualityCount++
}
if (row.avg_satisfaction) {
totalSatisfaction += Number(row.avg_satisfaction)
satisfactionCount++
}
}
return {
agentId,
totalFeedback,
outcomeBreakdown,
avgQuality: qualityCount > 0 ? totalQuality / qualityCount : 0,
avgSatisfaction: satisfactionCount > 0 ? totalSatisfaction / satisfactionCount : 0
}
}
async getOutcomeDistribution(
agentId: string,
hours: number = 24
): Promise<{ outcome: FeedbackOutcome; percentage: number }[]> {
const cutoff = new Date(Date.now() - hours * 60 * 60 * 1000)
const results = await this.db`
SELECT outcome, COUNT(*) as count
FROM agent_feedback
WHERE agent_id = ${agentId} AND created_at > ${cutoff}
GROUP BY outcome
`
const total = results.reduce((sum, row: any) => sum + Number(row.count), 0)
if (total === 0) return []
return results.map((row: any) => ({
outcome: row.outcome as FeedbackOutcome,
percentage: (Number(row.count) / total) * 100
}))
}
async identifySuccessfulModels(agentId: string): Promise<string[]> {
const cutoff = new Date(Date.now() - 24 * 60 * 60 * 1000)
const results = await this.db`
SELECT DISTINCT log.model
FROM agent_request_log log
INNER JOIN agent_feedback fb ON log.request_id = fb.request_id
WHERE log.agent_id = ${agentId}
AND log.created_at > ${cutoff}
AND fb.outcome = 'success'
ORDER BY log.model
`
return results.map((row: any) => row.model)
}
async computeConfidenceScore(
agentId: string,
model: string
): Promise<number> {
const day = new Date()
day.setDate(day.getDate() - 1)
const feedback = await this.db`
SELECT
COUNT(CASE WHEN outcome = 'success' THEN 1 END)::float / COUNT(*) as success_rate,
AVG(completion_quality) as avg_quality,
AVG((metadata->>'userSatisfaction')::int) as avg_satisfaction
FROM agent_feedback fb
INNER JOIN agent_request_log log ON fb.request_id = log.request_id
WHERE fb.agent_id = ${agentId}
AND log.model = ${model}
AND fb.created_at > ${day}
`
if (feedback.length === 0) return 0.5 // Default neutral confidence
const row = feedback[0] as any
const successRate = Number(row.success_rate || 0)
const avgQuality = Number(row.avg_quality || 5) / 10 // Normalize to 0-1
const avgSatisfaction = Number(row.avg_satisfaction || 5) / 10 // Normalize to 0-1
// Weighted average: 50% success, 25% quality, 25% satisfaction
const score = successRate * 0.5 + avgQuality * 0.25 + avgSatisfaction * 0.25
return Math.min(1, Math.max(0, score))
}
async shouldUpdateConfidence(
agentId: string,
model: string,
newScore: number,
threshold: number = 0.1
): Promise<boolean> {
// Get current confidence
const current = await this.db`
SELECT score FROM agent_confidence_scores
WHERE agent_id = ${agentId} AND model = ${model}
`
if (current.length === 0) return true // Always update if no history
const currentScore = Number(current[0].score)
return Math.abs(newScore - currentScore) > threshold
}
async processUserFeedback(
requestId: string,
agentId: string,
satisfaction: number // 0-10
): Promise<void> {
const metadata = { userSatisfaction: satisfaction }
await this.db`
INSERT INTO agent_feedback (
request_id, agent_id, outcome, metadata
) VALUES (
${requestId},
${agentId},
'success',
${JSON.stringify(metadata)}
)
ON CONFLICT (request_id) DO UPDATE SET
metadata = ${JSON.stringify(metadata)}
`
}
}
export default FeedbackProcessor

View File

@ -0,0 +1,267 @@
import { sql } from 'postgres'
import { Client } from 'postgres'
export interface AgentMetrics {
agentId: string
model: string
requestCount: number
successRate: number
avgLatencyMs: number
totalTokens: number
costUsd: number
confidence: number
updatedAt: Date
}
export interface RequestLog {
requestId: string
agentId: string
model: string
latencyMs: number
tokensIn: number
tokensOut: number
confidence: number
fallbackUsed: boolean
success: boolean
timestamp: Date
}
export interface AgentFeedback {
requestId: string
agentId: string
outcome: 'success' | 'fallback' | 'timeout' | 'error' | 'user_rejected'
completionQuality?: number
latencyMs?: number
tokenCount?: number
metadata?: Record<string, unknown>
timestamp: Date
}
export interface PerAgentConfidence {
agentId: string
model: string
score: number
sampleSize: number
lastUpdated: Date
trend: 'improving' | 'stable' | 'degrading'
}
export class LearningIntegration {
private db: Client
constructor(dbConnection: Client) {
this.db = dbConnection
}
async initializeTables(): Promise<void> {
await this.db`
CREATE TABLE IF NOT EXISTS agent_request_log (
request_id UUID PRIMARY KEY,
agent_id VARCHAR(64) NOT NULL,
model VARCHAR(128) NOT NULL,
latency_ms INTEGER NOT NULL,
tokens_in INTEGER NOT NULL,
tokens_out INTEGER NOT NULL,
confidence DECIMAL(3, 2) NOT NULL,
fallback_used BOOLEAN NOT NULL DEFAULT FALSE,
success BOOLEAN NOT NULL DEFAULT TRUE,
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
INDEX idx_agent_model (agent_id, model),
INDEX idx_created (created_at)
)
`
await this.db`
CREATE TABLE IF NOT EXISTS agent_feedback (
id SERIAL PRIMARY KEY,
request_id UUID NOT NULL,
agent_id VARCHAR(64) NOT NULL,
outcome VARCHAR(32) NOT NULL,
completion_quality SMALLINT,
latency_ms INTEGER,
token_count INTEGER,
metadata JSONB,
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
FOREIGN KEY (request_id) REFERENCES agent_request_log (request_id),
INDEX idx_agent_outcome (agent_id, outcome),
INDEX idx_created (created_at)
)
`
await this.db`
CREATE TABLE IF NOT EXISTS agent_confidence_scores (
id SERIAL PRIMARY KEY,
agent_id VARCHAR(64) NOT NULL,
model VARCHAR(128) NOT NULL,
score DECIMAL(3, 2) NOT NULL,
sample_size INTEGER NOT NULL DEFAULT 0,
trend VARCHAR(16) NOT NULL DEFAULT 'stable',
updated_at TIMESTAMP NOT NULL DEFAULT NOW(),
UNIQUE (agent_id, model),
INDEX idx_agent (agent_id)
)
`
}
async logRequest(log: Omit<RequestLog, 'timestamp'>): Promise<void> {
await this.db`
INSERT INTO agent_request_log (
request_id, agent_id, model, latency_ms, tokens_in, tokens_out,
confidence, fallback_used, success
) VALUES (
${log.requestId}, ${log.agentId}, ${log.model}, ${log.latencyMs},
${log.tokensIn}, ${log.tokensOut}, ${log.confidence},
${log.fallbackUsed}, ${log.success}
)
`
}
async recordFeedback(feedback: Omit<AgentFeedback, 'timestamp'>): Promise<void> {
await this.db`
INSERT INTO agent_feedback (
request_id, agent_id, outcome, completion_quality, latency_ms,
token_count, metadata
) VALUES (
${feedback.requestId}, ${feedback.agentId}, ${feedback.outcome},
${feedback.completionQuality || null}, ${feedback.latencyMs || null},
${feedback.tokenCount || null}, ${JSON.stringify(feedback.metadata || {})}
)
`
}
async getAgentMetrics(agentId: string, hours: number = 24): Promise<AgentMetrics[]> {
const cutoffTime = new Date(Date.now() - hours * 60 * 60 * 1000)
const results = await this.db`
SELECT
agent_id,
model,
COUNT(*) as request_count,
COUNT(CASE WHEN success = true THEN 1 END)::float / COUNT(*) as success_rate,
AVG(latency_ms)::float as avg_latency_ms,
SUM(tokens_in + tokens_out) as total_tokens,
SUM(tokens_in + tokens_out) * 0.0001 as cost_usd,
AVG(confidence)::float as confidence,
MAX(created_at) as updated_at
FROM agent_request_log
WHERE agent_id = ${agentId} AND created_at > ${cutoffTime}
GROUP BY agent_id, model
ORDER BY request_count DESC
`
return results.map((row: any) => ({
agentId: row.agent_id,
model: row.model,
requestCount: Number(row.request_count),
successRate: Number(row.success_rate),
avgLatencyMs: Number(row.avg_latency_ms),
totalTokens: Number(row.total_tokens),
costUsd: Number(row.cost_usd),
confidence: Number(row.confidence),
updatedAt: new Date(row.updated_at)
}))
}
async updateAgentConfidence(
agentId: string,
model: string,
newScore: number
): Promise<void> {
await this.db`
INSERT INTO agent_confidence_scores (agent_id, model, score, sample_size)
VALUES (${agentId}, ${model}, ${newScore}, 1)
ON CONFLICT (agent_id, model)
DO UPDATE SET
score = ${newScore},
sample_size = agent_confidence_scores.sample_size + 1,
updated_at = NOW()
`
}
async getAgentConfidence(agentId: string, model: string): Promise<PerAgentConfidence | null> {
const results = await this.db`
SELECT * FROM agent_confidence_scores
WHERE agent_id = ${agentId} AND model = ${model}
`
if (results.length === 0) return null
const row = results[0] as any
return {
agentId: row.agent_id,
model: row.model,
score: Number(row.score),
sampleSize: Number(row.sample_size),
lastUpdated: new Date(row.updated_at),
trend: row.trend as 'improving' | 'stable' | 'degrading'
}
}
async computePerAgentMetrics(agentId: string): Promise<AgentMetrics[]> {
// Compute metrics for past 24 hours
return this.getAgentMetrics(agentId, 24)
}
async getAgentCosts(days: number = 30): Promise<Map<string, number>> {
const cutoffTime = new Date(Date.now() - days * 24 * 60 * 60 * 1000)
const results = await this.db`
SELECT
agent_id,
SUM(tokens_in + tokens_out) * 0.0001 as cost_usd
FROM agent_request_log
WHERE created_at > ${cutoffTime}
GROUP BY agent_id
ORDER BY cost_usd DESC
`
const costs = new Map<string, number>()
for (const row of results as any[]) {
costs.set(row.agent_id, Number(row.cost_usd))
}
return costs
}
async detectAnomalies(
agentId: string,
threshold: number = 2
): Promise<{ model: string; issue: string }[]> {
const metrics = await this.getAgentMetrics(agentId, 24)
const baseline = await this.getAgentMetrics(agentId, 24 * 30) // 30-day baseline
const anomalies: { model: string; issue: string }[] = []
for (const current of metrics) {
const baselineMetric = baseline.find(m => m.model === current.model)
if (!baselineMetric) continue
// Check latency spike
if (current.avgLatencyMs > baselineMetric.avgLatencyMs * (1 + threshold * 0.1)) {
anomalies.push({
model: current.model,
issue: `Latency spike: ${current.avgLatencyMs}ms (baseline: ${baselineMetric.avgLatencyMs}ms)`
})
}
// Check success rate drop
if (current.successRate < baselineMetric.successRate * (1 - threshold * 0.1)) {
anomalies.push({
model: current.model,
issue: `Success rate drop: ${(current.successRate * 100).toFixed(1)}% (baseline: ${(baselineMetric.successRate * 100).toFixed(1)}%)`
})
}
// Check confidence drop
if (current.confidence < baselineMetric.confidence * 0.8) {
anomalies.push({
model: current.model,
issue: `Confidence degradation: ${current.confidence.toFixed(2)} (baseline: ${baselineMetric.confidence.toFixed(2)})`
})
}
}
return anomalies
}
}
export default LearningIntegration

View File

@ -0,0 +1,248 @@
import { Client } from 'postgres'
import { sql } from 'postgres'
export interface DailyAgentCost {
date: string
agentId: string
tokensIn: number
tokensOut: number
costUsd: number
}
export interface LatencySLO {
agentId: string
targetMs: number
p50: number
p95: number
p99: number
breached: boolean
}
export class PerAgentMetrics {
constructor(private db: Client) {}
async generateDailyCostReport(date: string): Promise<DailyAgentCost[]> {
const startOfDay = new Date(date)
startOfDay.setHours(0, 0, 0, 0)
const endOfDay = new Date(date)
endOfDay.setHours(23, 59, 59, 999)
const results = await this.db`
SELECT
agent_id,
SUM(tokens_in) as tokens_in,
SUM(tokens_out) as tokens_out,
(SUM(tokens_in) + SUM(tokens_out)) * 0.0001 as cost_usd
FROM agent_request_log
WHERE created_at >= ${startOfDay} AND created_at < ${endOfDay}
GROUP BY agent_id
ORDER BY cost_usd DESC
`
return results.map((row: any) => ({
date,
agentId: row.agent_id,
tokensIn: Number(row.tokens_in),
tokensOut: Number(row.tokens_out),
costUsd: Number(row.cost_usd)
}))
}
async checkLatencySLO(
agentId: string,
targetMs: number = 500
): Promise<LatencySLO> {
const hour = new Date()
hour.setHours(hour.getHours() - 1)
const results = await this.db`
SELECT
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY latency_ms) as p50,
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY latency_ms) as p95,
PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY latency_ms) as p99
FROM agent_request_log
WHERE agent_id = ${agentId} AND created_at > ${hour}
`
if (results.length === 0) {
return {
agentId,
targetMs,
p50: 0,
p95: 0,
p99: 0,
breached: false
}
}
const row = results[0] as any
const p99 = Number(row.p99 || 0)
const breached = p99 > targetMs
return {
agentId,
targetMs,
p50: Number(row.p50 || 0),
p95: Number(row.p95 || 0),
p99,
breached
}
}
async getAgentSuccessRate(
agentId: string,
hours: number = 24
): Promise<{ total: number; successful: number; rate: number }> {
const cutoff = new Date(Date.now() - hours * 60 * 60 * 1000)
const results = await this.db`
SELECT
COUNT(*) as total,
COUNT(CASE WHEN success = true THEN 1 END) as successful
FROM agent_request_log
WHERE agent_id = ${agentId} AND created_at > ${cutoff}
`
if (results.length === 0) {
return { total: 0, successful: 0, rate: 0 }
}
const row = results[0] as any
const total = Number(row.total)
const successful = Number(row.successful)
return {
total,
successful,
rate: total > 0 ? successful / total : 0
}
}
async getFallbackRate(
agentId: string,
hours: number = 24
): Promise<{ fallbacks: number; total: number; rate: number }> {
const cutoff = new Date(Date.now() - hours * 60 * 60 * 1000)
const results = await this.db`
SELECT
COUNT(*) as total,
COUNT(CASE WHEN fallback_used = true THEN 1 END) as fallbacks
FROM agent_request_log
WHERE agent_id = ${agentId} AND created_at > ${cutoff}
`
if (results.length === 0) {
return { fallbacks: 0, total: 0, rate: 0 }
}
const row = results[0] as any
const total = Number(row.total)
const fallbacks = Number(row.fallbacks)
return {
fallbacks,
total,
rate: total > 0 ? fallbacks / total : 0
}
}
async getModelPerformance(
agentId: string,
model: string,
hours: number = 24
): Promise<{
model: string
requests: number
avgLatency: number
successRate: number
confidence: number
}> {
const cutoff = new Date(Date.now() - hours * 60 * 60 * 1000)
const results = await this.db`
SELECT
COUNT(*) as requests,
AVG(latency_ms) as avg_latency,
COUNT(CASE WHEN success = true THEN 1 END)::float / COUNT(*) as success_rate,
AVG(confidence) as confidence
FROM agent_request_log
WHERE agent_id = ${agentId} AND model = ${model} AND created_at > ${cutoff}
`
if (results.length === 0) {
return { model, requests: 0, avgLatency: 0, successRate: 0, confidence: 0 }
}
const row = results[0] as any
return {
model,
requests: Number(row.requests),
avgLatency: Number(row.avg_latency || 0),
successRate: Number(row.success_rate || 0),
confidence: Number(row.confidence || 0)
}
}
async compareAgentPerformance(): Promise<
{ agentId: string; requests: number; avgLatency: number; costUsd: number }[]
> {
const day = new Date()
day.setDate(day.getDate() - 1)
const startOfDay = new Date(day)
startOfDay.setHours(0, 0, 0, 0)
const results = await this.db`
SELECT
agent_id,
COUNT(*) as requests,
AVG(latency_ms) as avg_latency,
(SUM(tokens_in) + SUM(tokens_out)) * 0.0001 as cost_usd
FROM agent_request_log
WHERE created_at >= ${startOfDay}
GROUP BY agent_id
ORDER BY requests DESC
`
return results.map((row: any) => ({
agentId: row.agent_id,
requests: Number(row.requests),
avgLatency: Number(row.avg_latency || 0),
costUsd: Number(row.cost_usd || 0)
}))
}
async getAgentTrending(
agentId: string,
model: string
): Promise<{ trend: 'improving' | 'stable' | 'degrading'; signal: number }> {
// Compare success rate: last 24h vs previous 24h
const now24h = new Date(Date.now() - 24 * 60 * 60 * 1000)
const now48h = new Date(Date.now() - 48 * 60 * 60 * 1000)
const recent = await this.db`
SELECT COUNT(CASE WHEN success = true THEN 1 END)::float / COUNT(*) as rate
FROM agent_request_log
WHERE agent_id = ${agentId} AND model = ${model} AND created_at > ${now24h}
`
const previous = await this.db`
SELECT COUNT(CASE WHEN success = true THEN 1 END)::float / COUNT(*) as rate
FROM agent_request_log
WHERE agent_id = ${agentId} AND model = ${model}
AND created_at > ${now48h} AND created_at <= ${now24h}
`
const recentRate = recent.length > 0 ? Number(recent[0].rate || 0) : 0
const previousRate = previous.length > 0 ? Number(previous[0].rate || 0) : 0
const signal = recentRate - previousRate
let trend: 'improving' | 'stable' | 'degrading' = 'stable'
if (signal > 0.05) trend = 'improving'
if (signal < -0.05) trend = 'degrading'
return { trend, signal }
}
}
export default PerAgentMetrics

View File

@ -0,0 +1,12 @@
{
"extends": "../../tsconfig.json",
"compilerOptions": {
"outDir": "./dist",
"rootDir": "./src",
"declaration": true,
"declarationMap": true,
"sourceMap": true
},
"include": ["src/**/*"],
"exclude": ["node_modules", "dist", "**/*.test.ts"]
}