feat: Add ADR-0005 for Phase 2G agent integration protocol
- Define three-layer integration stack (transport, adapters, protocol) - JSON-RPC 2.0 over HTTP for unified agent communication - Support for Claude Code, Codex/Copilot, ChatGPT, Ollama fallback - Establishes foundation for Phase 2G multi-agent integration - Decision on authentication, rate limiting, streaming TBD in implementation
This commit is contained in:
parent
8e83e5fa6e
commit
4d7e251322
143
docs/adr/0005-agent-integration-protocol.md
Normal file
143
docs/adr/0005-agent-integration-protocol.md
Normal file
@ -0,0 +1,143 @@
|
||||
# ADR-0005: Multi-Agent Integration Protocol
|
||||
|
||||
**Date**: 2026-04-19
|
||||
**Status**: accepted
|
||||
**Deciders**: Rene (Architecture, Phase 2G)
|
||||
|
||||
## Context
|
||||
|
||||
Phase 2F established the LLM Gateway as a central orchestrator with a TypeScript client SDK. Phase 2G must integrate multiple AI agents:
|
||||
- **Claude Code** (Anthropic CLI) — native client SDK (@llm-gateway/client)
|
||||
- **Codex/Copilot** (Microsoft) — LSP protocol (Language Server Protocol)
|
||||
- **ChatGPT** (OpenAI) — REST API
|
||||
- **Ollama** (local inference) — HTTP API (fallback)
|
||||
|
||||
Each agent has different capabilities and communication patterns. We need a unified protocol that:
|
||||
1. Abstracts gateway complexity from agents
|
||||
2. Supports synchronous and asynchronous operations
|
||||
3. Handles streaming responses (code generation, token-by-token)
|
||||
4. Manages authentication and rate limiting per agent
|
||||
5. Provides graceful fallback when gateway is unavailable
|
||||
|
||||
## Decision
|
||||
|
||||
Implement a **three-layer agent integration stack**:
|
||||
|
||||
### Layer 1: Transport (HTTP/WebSocket)
|
||||
- **Core**: Fastify endpoints in LLM Gateway
|
||||
- **Endpoints**:
|
||||
- `POST /agents/{agent-id}/completion` — synchronous completion
|
||||
- `GET /agents/{agent-id}/completion?stream=true` — SSE stream
|
||||
- `POST /agents/{agent-id}/validate` — prompt validation
|
||||
- `GET /agents/{agent-id}/status` — health check
|
||||
|
||||
### Layer 2: Agent Adapters
|
||||
- **Claude Code Adapter**: Node.js module wrapping `@llm-gateway/client`
|
||||
- **Codex/Copilot Adapter**: LSP server that forwards requests to gateway HTTP API
|
||||
- **ChatGPT Adapter**: REST API wrapper that translates OpenAI format → Gateway format
|
||||
- **Ollama Adapter**: HTTP proxy that handles local fallback (already implemented in client SDK)
|
||||
|
||||
### Layer 3: Protocol Format
|
||||
Use JSON-RPC 2.0 over HTTP/WebSocket:
|
||||
```json
|
||||
{
|
||||
"jsonrpc": "2.0",
|
||||
"method": "completion",
|
||||
"params": {
|
||||
"prompt": "...",
|
||||
"model": "claude-3.5-sonnet",
|
||||
"temperature": 0.7,
|
||||
"max_tokens": 2000,
|
||||
"agent_id": "claude-code"
|
||||
},
|
||||
"id": 1
|
||||
}
|
||||
```
|
||||
|
||||
## Alternatives Considered
|
||||
|
||||
### Alternative 1: Separate Gateway Instance per Agent
|
||||
- **Pros**: Complete isolation, agent-specific customization
|
||||
- **Cons**: Operational overhead, duplicate infrastructure, no shared learning
|
||||
- **Why not**: Contradicts Phase 2F goal of central orchestration
|
||||
|
||||
### Alternative 2: Agent-Specific Protocols (No Normalization)
|
||||
- **Pros**: Native protocol support for each agent
|
||||
- **Cons**: Gateway becomes protocol translator, complexity explosion
|
||||
- **Why not**: Gateway becomes a reverse proxy instead of an orchestrator
|
||||
|
||||
### Alternative 3: Message Queue (RabbitMQ/Kafka)
|
||||
- **Pros**: Decouples agents from gateway, supports async workflows
|
||||
- **Cons**: Added infrastructure, latency for synchronous operations
|
||||
- **Why not**: Overkill for initial integration; add later if async workflows needed
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
- **Single integration point**: Agents connect to gateway, not directly to models
|
||||
- **Shared learning**: All agents benefit from confidence gating and model selection
|
||||
- **Graceful degradation**: Agents fall back to local Ollama independently
|
||||
- **Extensible**: New agents added by implementing adapter layer only
|
||||
|
||||
### Negative
|
||||
- **Latency**: Additional HTTP round-trip for each request (vs. direct model call)
|
||||
- **Adapter maintenance**: Each agent needs an adapter; breaks if agent API changes
|
||||
- **Protocol overhead**: JSON-RPC adds overhead vs. direct integration
|
||||
|
||||
### Risks
|
||||
- **Claude Code integration risk**: Requires subprocess communication with `claude` CLI
|
||||
- **Mitigation**: claude-bridge already demonstrates working pattern
|
||||
- **Codex integration risk**: Microsoft LSP server not directly compatible with HTTP
|
||||
- **Mitigation**: Implement thin LSP-to-HTTP translation layer
|
||||
|
||||
## Implementation Plan
|
||||
|
||||
### Phase 2G.1: Claude Code Integration (Week 1)
|
||||
```bash
|
||||
# Extend @llm-gateway/client with agent metadata
|
||||
createTIPClient({
|
||||
agentId: 'claude-code',
|
||||
fallback: { ollamaUrl: '192.168.178.213:11434' }
|
||||
})
|
||||
```
|
||||
|
||||
### Phase 2G.2: Codex/Copilot (Week 2)
|
||||
```bash
|
||||
# Implement LSP server wrapper
|
||||
npm install -D @types/node-lsp-server
|
||||
# Create packages/lsp-adapter/
|
||||
# - Implements LSP protocol
|
||||
# - Translates completion requests to HTTP
|
||||
```
|
||||
|
||||
### Phase 2G.3: ChatGPT Integration (Week 3)
|
||||
```bash
|
||||
# OpenAI API compatibility layer
|
||||
# POST /agents/chatgpt/chat/completions
|
||||
# → Translate to gateway completion format
|
||||
```
|
||||
|
||||
### Phase 2G.4: Learning Integration (Week 4)
|
||||
```bash
|
||||
# Connect agent-specific metrics to learning engine
|
||||
# - Track per-agent accuracy, token usage, latency
|
||||
# - Auto-select models per agent based on performance
|
||||
```
|
||||
|
||||
## Open Questions
|
||||
|
||||
1. **Authentication**: How do agents authenticate with gateway?
|
||||
- Option A: API keys per agent
|
||||
- Option B: OAuth2 with OIDC
|
||||
- Option C: mTLS for local agents, keys for remote
|
||||
- **Decision pending**: TBD in Phase 2G.1
|
||||
|
||||
2. **Rate Limiting**: Per-agent or global quota?
|
||||
- Option A: Per-agent limits (Claude Code = 100 req/min)
|
||||
- Option B: Global pool shared across agents
|
||||
- **Decision pending**: Depends on learning system usage patterns
|
||||
|
||||
3. **Response Format**: Streaming vs. buffered?
|
||||
- Option A: Always stream (SSE)
|
||||
- Option B: Support both (`?stream=true/false`)
|
||||
- **Decision pending**: Codex/Copilot compatibility check needed
|
||||
@ -6,3 +6,4 @@
|
||||
| [0002](0002-tier-assignment-strategy.md) | Tier Assignment Strategy for Model Selection | accepted | 2026-04-19 |
|
||||
| [0003](0003-confidence-gate-thresholds.md) | Confidence Gate Thresholds & Learning Cycle Intervals | accepted | 2026-04-19 |
|
||||
| [0004](0004-external-fallback-chain.md) | External Provider Fallback Chain Ordering | accepted | 2026-04-19 |
|
||||
| [0005](0005-agent-integration-protocol.md) | Multi-Agent Integration Protocol & Adapters | accepted | 2026-04-19 |
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user