- Define three-layer integration stack (transport, adapters, protocol) - JSON-RPC 2.0 over HTTP for unified agent communication - Support for Claude Code, Codex/Copilot, ChatGPT, Ollama fallback - Establishes foundation for Phase 2G multi-agent integration - Decision on authentication, rate limiting, streaming TBD in implementation
5.1 KiB
5.1 KiB
ADR-0005: Multi-Agent Integration Protocol
Date: 2026-04-19 Status: accepted Deciders: Rene (Architecture, Phase 2G)
Context
Phase 2F established the LLM Gateway as a central orchestrator with a TypeScript client SDK. Phase 2G must integrate multiple AI agents:
- Claude Code (Anthropic CLI) — native client SDK (@llm-gateway/client)
- Codex/Copilot (Microsoft) — LSP protocol (Language Server Protocol)
- ChatGPT (OpenAI) — REST API
- Ollama (local inference) — HTTP API (fallback)
Each agent has different capabilities and communication patterns. We need a unified protocol that:
- Abstracts gateway complexity from agents
- Supports synchronous and asynchronous operations
- Handles streaming responses (code generation, token-by-token)
- Manages authentication and rate limiting per agent
- Provides graceful fallback when gateway is unavailable
Decision
Implement a three-layer agent integration stack:
Layer 1: Transport (HTTP/WebSocket)
- Core: Fastify endpoints in LLM Gateway
- Endpoints:
POST /agents/{agent-id}/completion— synchronous completionGET /agents/{agent-id}/completion?stream=true— SSE streamPOST /agents/{agent-id}/validate— prompt validationGET /agents/{agent-id}/status— health check
Layer 2: Agent Adapters
- Claude Code Adapter: Node.js module wrapping
@llm-gateway/client - Codex/Copilot Adapter: LSP server that forwards requests to gateway HTTP API
- ChatGPT Adapter: REST API wrapper that translates OpenAI format → Gateway format
- Ollama Adapter: HTTP proxy that handles local fallback (already implemented in client SDK)
Layer 3: Protocol Format
Use JSON-RPC 2.0 over HTTP/WebSocket:
{
"jsonrpc": "2.0",
"method": "completion",
"params": {
"prompt": "...",
"model": "claude-3.5-sonnet",
"temperature": 0.7,
"max_tokens": 2000,
"agent_id": "claude-code"
},
"id": 1
}
Alternatives Considered
Alternative 1: Separate Gateway Instance per Agent
- Pros: Complete isolation, agent-specific customization
- Cons: Operational overhead, duplicate infrastructure, no shared learning
- Why not: Contradicts Phase 2F goal of central orchestration
Alternative 2: Agent-Specific Protocols (No Normalization)
- Pros: Native protocol support for each agent
- Cons: Gateway becomes protocol translator, complexity explosion
- Why not: Gateway becomes a reverse proxy instead of an orchestrator
Alternative 3: Message Queue (RabbitMQ/Kafka)
- Pros: Decouples agents from gateway, supports async workflows
- Cons: Added infrastructure, latency for synchronous operations
- Why not: Overkill for initial integration; add later if async workflows needed
Consequences
Positive
- Single integration point: Agents connect to gateway, not directly to models
- Shared learning: All agents benefit from confidence gating and model selection
- Graceful degradation: Agents fall back to local Ollama independently
- Extensible: New agents added by implementing adapter layer only
Negative
- Latency: Additional HTTP round-trip for each request (vs. direct model call)
- Adapter maintenance: Each agent needs an adapter; breaks if agent API changes
- Protocol overhead: JSON-RPC adds overhead vs. direct integration
Risks
- Claude Code integration risk: Requires subprocess communication with
claudeCLI - Mitigation: claude-bridge already demonstrates working pattern
- Codex integration risk: Microsoft LSP server not directly compatible with HTTP
- Mitigation: Implement thin LSP-to-HTTP translation layer
Implementation Plan
Phase 2G.1: Claude Code Integration (Week 1)
# Extend @llm-gateway/client with agent metadata
createTIPClient({
agentId: 'claude-code',
fallback: { ollamaUrl: '192.168.178.213:11434' }
})
Phase 2G.2: Codex/Copilot (Week 2)
# Implement LSP server wrapper
npm install -D @types/node-lsp-server
# Create packages/lsp-adapter/
# - Implements LSP protocol
# - Translates completion requests to HTTP
Phase 2G.3: ChatGPT Integration (Week 3)
# OpenAI API compatibility layer
# POST /agents/chatgpt/chat/completions
# → Translate to gateway completion format
Phase 2G.4: Learning Integration (Week 4)
# Connect agent-specific metrics to learning engine
# - Track per-agent accuracy, token usage, latency
# - Auto-select models per agent based on performance
Open Questions
-
Authentication: How do agents authenticate with gateway?
- Option A: API keys per agent
- Option B: OAuth2 with OIDC
- Option C: mTLS for local agents, keys for remote
- Decision pending: TBD in Phase 2G.1
-
Rate Limiting: Per-agent or global quota?
- Option A: Per-agent limits (Claude Code = 100 req/min)
- Option B: Global pool shared across agents
- Decision pending: Depends on learning system usage patterns
-
Response Format: Streaming vs. buffered?
- Option A: Always stream (SSE)
- Option B: Support both (
?stream=true/false) - Decision pending: Codex/Copilot compatibility check needed