llm-gateway/docs/adr/0005-agent-integration-protocol.md

# ADR-0005: Multi-Agent Integration Protocol

**Date**: 2026-04-19
**Status**: accepted
**Deciders**: Rene (Architecture, Phase 2G)

## Context

Phase 2F established the LLM Gateway as a central orchestrator with a TypeScript client SDK. Phase 2G must integrate multiple AI agents:
- **Claude Code** (Anthropic CLI) — native client SDK (@llm-gateway/client)
- **Codex/Copilot** (Microsoft) — LSP protocol (Language Server Protocol)
- **ChatGPT** (OpenAI) — REST API
- **Ollama** (local inference) — HTTP API (fallback)

Each agent has different capabilities and communication patterns. We need a unified protocol that:
1. Abstracts gateway complexity from agents
2. Supports synchronous and asynchronous operations
3. Handles streaming responses (code generation, token-by-token)
4. Manages authentication and rate limiting per agent
5. Provides graceful fallback when gateway is unavailable

## Decision

Implement a **three-layer agent integration stack**:

### Layer 1: Transport (HTTP/WebSocket)
- **Core**: Fastify endpoints in LLM Gateway
- **Endpoints**:
  - `POST /agents/{agent-id}/completion` — synchronous completion
  - `GET /agents/{agent-id}/completion?stream=true` — SSE stream
  - `POST /agents/{agent-id}/validate` — prompt validation
  - `GET /agents/{agent-id}/status` — health check

### Layer 2: Agent Adapters
- **Claude Code Adapter**: Node.js module wrapping `@llm-gateway/client`
- **Codex/Copilot Adapter**: LSP server that forwards requests to gateway HTTP API
- **ChatGPT Adapter**: REST API wrapper that translates OpenAI format → Gateway format
- **Ollama Adapter**: HTTP proxy that handles local fallback (already implemented in client SDK)

### Layer 3: Protocol Format
Use JSON-RPC 2.0 over HTTP/WebSocket:
```json
{
  "jsonrpc": "2.0",
  "method": "completion",
  "params": {
    "prompt": "...",
    "model": "claude-3.5-sonnet",
    "temperature": 0.7,
    "max_tokens": 2000,
    "agent_id": "claude-code"
  },
  "id": 1
}
```

## Alternatives Considered

### Alternative 1: Separate Gateway Instance per Agent
- **Pros**: Complete isolation, agent-specific customization
- **Cons**: Operational overhead, duplicate infrastructure, no shared learning
- **Why not**: Contradicts Phase 2F goal of central orchestration

### Alternative 2: Agent-Specific Protocols (No Normalization)
- **Pros**: Native protocol support for each agent
- **Cons**: Gateway becomes protocol translator, complexity explosion
- **Why not**: Gateway becomes a reverse proxy instead of an orchestrator

### Alternative 3: Message Queue (RabbitMQ/Kafka)
- **Pros**: Decouples agents from gateway, supports async workflows
- **Cons**: Added infrastructure, latency for synchronous operations
- **Why not**: Overkill for initial integration; add later if async workflows needed

## Consequences

### Positive
- **Single integration point**: Agents connect to gateway, not directly to models
- **Shared learning**: All agents benefit from confidence gating and model selection
- **Graceful degradation**: Agents fall back to local Ollama independently
- **Extensible**: New agents added by implementing adapter layer only

### Negative
- **Latency**: Additional HTTP round-trip for each request (vs. direct model call)
- **Adapter maintenance**: Each agent needs an adapter; breaks if agent API changes
- **Protocol overhead**: JSON-RPC adds overhead vs. direct integration

### Risks
- **Claude Code integration risk**: Requires subprocess communication with `claude` CLI
- **Mitigation**: claude-bridge already demonstrates working pattern
- **Codex integration risk**: Microsoft LSP server not directly compatible with HTTP
- **Mitigation**: Implement thin LSP-to-HTTP translation layer

## Implementation Plan

### Phase 2G.1: Claude Code Integration (Week 1)
```bash
# Extend @llm-gateway/client with agent metadata
createTIPClient({
  agentId: 'claude-code',
  fallback: { ollamaUrl: '192.168.178.213:11434' }
})
```

### Phase 2G.2: Codex/Copilot (Week 2)
```bash
# Implement LSP server wrapper
npm install -D @types/node-lsp-server
# Create packages/lsp-adapter/
# - Implements LSP protocol
# - Translates completion requests to HTTP
```

### Phase 2G.3: ChatGPT Integration (Week 3)
```bash
# OpenAI API compatibility layer
# POST /agents/chatgpt/chat/completions
# → Translate to gateway completion format
```

### Phase 2G.4: Learning Integration (Week 4)
```bash
# Connect agent-specific metrics to learning engine
# - Track per-agent accuracy, token usage, latency
# - Auto-select models per agent based on performance
```

## Open Questions

1. **Authentication**: How do agents authenticate with gateway?
   - Option A: API keys per agent
   - Option B: OAuth2 with OIDC
   - Option C: mTLS for local agents, keys for remote
   - **Decision pending**: TBD in Phase 2G.1

2. **Rate Limiting**: Per-agent or global quota?
   - Option A: Per-agent limits (Claude Code = 100 req/min)
   - Option B: Global pool shared across agents
   - **Decision pending**: Depends on learning system usage patterns

3. **Response Format**: Streaming vs. buffered?
   - Option A: Always stream (SSE)
   - Option B: Support both (`?stream=true/false`)
   - **Decision pending**: Codex/Copilot compatibility check needed