Rene Fichtmueller 4d7e251322 feat: Add ADR-0005 for Phase 2G agent integration protocol

- Define three-layer integration stack (transport, adapters, protocol)
- JSON-RPC 2.0 over HTTP for unified agent communication
- Support for Claude Code, Codex/Copilot, ChatGPT, Ollama fallback
- Establishes foundation for Phase 2G multi-agent integration
- Decision on authentication, rate limiting, streaming TBD in implementation

2026-04-19 22:01:17 +02:00

5.1 KiB

Raw Blame History

ADR-0005: Multi-Agent Integration Protocol

Date: 2026-04-19 Status: accepted Deciders: Rene (Architecture, Phase 2G)

Context

Phase 2F established the LLM Gateway as a central orchestrator with a TypeScript client SDK. Phase 2G must integrate multiple AI agents:

Claude Code (Anthropic CLI) — native client SDK (@llm-gateway/client)
Codex/Copilot (Microsoft) — LSP protocol (Language Server Protocol)
ChatGPT (OpenAI) — REST API
Ollama (local inference) — HTTP API (fallback)

Each agent has different capabilities and communication patterns. We need a unified protocol that:

Abstracts gateway complexity from agents
Supports synchronous and asynchronous operations
Handles streaming responses (code generation, token-by-token)
Manages authentication and rate limiting per agent
Provides graceful fallback when gateway is unavailable

Decision

Implement a three-layer agent integration stack:

Layer 1: Transport (HTTP/WebSocket)

Core: Fastify endpoints in LLM Gateway
Endpoints:
- POST /agents/{agent-id}/completion — synchronous completion
- GET /agents/{agent-id}/completion?stream=true — SSE stream
- POST /agents/{agent-id}/validate — prompt validation
- GET /agents/{agent-id}/status — health check

Layer 2: Agent Adapters

Claude Code Adapter: Node.js module wrapping @llm-gateway/client
Codex/Copilot Adapter: LSP server that forwards requests to gateway HTTP API
ChatGPT Adapter: REST API wrapper that translates OpenAI format → Gateway format
Ollama Adapter: HTTP proxy that handles local fallback (already implemented in client SDK)

Layer 3: Protocol Format

Use JSON-RPC 2.0 over HTTP/WebSocket:

{
  "jsonrpc": "2.0",
  "method": "completion",
  "params": {
    "prompt": "...",
    "model": "claude-3.5-sonnet",
    "temperature": 0.7,
    "max_tokens": 2000,
    "agent_id": "claude-code"
  },
  "id": 1
}

Alternatives Considered

Alternative 1: Separate Gateway Instance per Agent

Pros: Complete isolation, agent-specific customization
Cons: Operational overhead, duplicate infrastructure, no shared learning
Why not: Contradicts Phase 2F goal of central orchestration

Alternative 2: Agent-Specific Protocols (No Normalization)

Pros: Native protocol support for each agent
Cons: Gateway becomes protocol translator, complexity explosion
Why not: Gateway becomes a reverse proxy instead of an orchestrator

Alternative 3: Message Queue (RabbitMQ/Kafka)

Pros: Decouples agents from gateway, supports async workflows
Cons: Added infrastructure, latency for synchronous operations
Why not: Overkill for initial integration; add later if async workflows needed

Consequences

Positive

Single integration point: Agents connect to gateway, not directly to models
Shared learning: All agents benefit from confidence gating and model selection
Graceful degradation: Agents fall back to local Ollama independently
Extensible: New agents added by implementing adapter layer only

Negative

Latency: Additional HTTP round-trip for each request (vs. direct model call)
Adapter maintenance: Each agent needs an adapter; breaks if agent API changes
Protocol overhead: JSON-RPC adds overhead vs. direct integration

Risks

Claude Code integration risk: Requires subprocess communication with claude CLI
Mitigation: claude-bridge already demonstrates working pattern
Codex integration risk: Microsoft LSP server not directly compatible with HTTP
Mitigation: Implement thin LSP-to-HTTP translation layer

Implementation Plan

Phase 2G.1: Claude Code Integration (Week 1)

# Extend @llm-gateway/client with agent metadata
createTIPClient({
  agentId: 'claude-code',
  fallback: { ollamaUrl: '192.168.178.213:11434' }
})

Phase 2G.2: Codex/Copilot (Week 2)

# Implement LSP server wrapper
npm install -D @types/node-lsp-server
# Create packages/lsp-adapter/
# - Implements LSP protocol
# - Translates completion requests to HTTP

Phase 2G.3: ChatGPT Integration (Week 3)

# OpenAI API compatibility layer
# POST /agents/chatgpt/chat/completions
# → Translate to gateway completion format

Phase 2G.4: Learning Integration (Week 4)

# Connect agent-specific metrics to learning engine
# - Track per-agent accuracy, token usage, latency
# - Auto-select models per agent based on performance

Open Questions

Authentication: How do agents authenticate with gateway?
- Option A: API keys per agent
- Option B: OAuth2 with OIDC
- Option C: mTLS for local agents, keys for remote
- Decision pending: TBD in Phase 2G.1
Rate Limiting: Per-agent or global quota?
- Option A: Per-agent limits (Claude Code = 100 req/min)
- Option B: Global pool shared across agents
- Decision pending: Depends on learning system usage patterns
Response Format: Streaming vs. buffered?
- Option A: Always stream (SSE)
- Option B: Support both (?stream=true/false)
- Decision pending: Codex/Copilot compatibility check needed

5.1 KiB Raw Blame History