From 4d7e25132211313582f6241e89262f5320c7622b Mon Sep 17 00:00:00 2001 From: Rene Fichtmueller Date: Sun, 19 Apr 2026 22:01:17 +0200 Subject: [PATCH] feat: Add ADR-0005 for Phase 2G agent integration protocol - Define three-layer integration stack (transport, adapters, protocol) - JSON-RPC 2.0 over HTTP for unified agent communication - Support for Claude Code, Codex/Copilot, ChatGPT, Ollama fallback - Establishes foundation for Phase 2G multi-agent integration - Decision on authentication, rate limiting, streaming TBD in implementation --- docs/adr/0005-agent-integration-protocol.md | 143 ++++++++++++++++++++ docs/adr/README.md | 1 + 2 files changed, 144 insertions(+) create mode 100644 docs/adr/0005-agent-integration-protocol.md diff --git a/docs/adr/0005-agent-integration-protocol.md b/docs/adr/0005-agent-integration-protocol.md new file mode 100644 index 0000000..8333703 --- /dev/null +++ b/docs/adr/0005-agent-integration-protocol.md @@ -0,0 +1,143 @@ +# ADR-0005: Multi-Agent Integration Protocol + +**Date**: 2026-04-19 +**Status**: accepted +**Deciders**: Rene (Architecture, Phase 2G) + +## Context + +Phase 2F established the LLM Gateway as a central orchestrator with a TypeScript client SDK. Phase 2G must integrate multiple AI agents: +- **Claude Code** (Anthropic CLI) — native client SDK (@llm-gateway/client) +- **Codex/Copilot** (Microsoft) — LSP protocol (Language Server Protocol) +- **ChatGPT** (OpenAI) — REST API +- **Ollama** (local inference) — HTTP API (fallback) + +Each agent has different capabilities and communication patterns. We need a unified protocol that: +1. Abstracts gateway complexity from agents +2. Supports synchronous and asynchronous operations +3. Handles streaming responses (code generation, token-by-token) +4. Manages authentication and rate limiting per agent +5. Provides graceful fallback when gateway is unavailable + +## Decision + +Implement a **three-layer agent integration stack**: + +### Layer 1: Transport (HTTP/WebSocket) +- **Core**: Fastify endpoints in LLM Gateway +- **Endpoints**: + - `POST /agents/{agent-id}/completion` — synchronous completion + - `GET /agents/{agent-id}/completion?stream=true` — SSE stream + - `POST /agents/{agent-id}/validate` — prompt validation + - `GET /agents/{agent-id}/status` — health check + +### Layer 2: Agent Adapters +- **Claude Code Adapter**: Node.js module wrapping `@llm-gateway/client` +- **Codex/Copilot Adapter**: LSP server that forwards requests to gateway HTTP API +- **ChatGPT Adapter**: REST API wrapper that translates OpenAI format → Gateway format +- **Ollama Adapter**: HTTP proxy that handles local fallback (already implemented in client SDK) + +### Layer 3: Protocol Format +Use JSON-RPC 2.0 over HTTP/WebSocket: +```json +{ + "jsonrpc": "2.0", + "method": "completion", + "params": { + "prompt": "...", + "model": "claude-3.5-sonnet", + "temperature": 0.7, + "max_tokens": 2000, + "agent_id": "claude-code" + }, + "id": 1 +} +``` + +## Alternatives Considered + +### Alternative 1: Separate Gateway Instance per Agent +- **Pros**: Complete isolation, agent-specific customization +- **Cons**: Operational overhead, duplicate infrastructure, no shared learning +- **Why not**: Contradicts Phase 2F goal of central orchestration + +### Alternative 2: Agent-Specific Protocols (No Normalization) +- **Pros**: Native protocol support for each agent +- **Cons**: Gateway becomes protocol translator, complexity explosion +- **Why not**: Gateway becomes a reverse proxy instead of an orchestrator + +### Alternative 3: Message Queue (RabbitMQ/Kafka) +- **Pros**: Decouples agents from gateway, supports async workflows +- **Cons**: Added infrastructure, latency for synchronous operations +- **Why not**: Overkill for initial integration; add later if async workflows needed + +## Consequences + +### Positive +- **Single integration point**: Agents connect to gateway, not directly to models +- **Shared learning**: All agents benefit from confidence gating and model selection +- **Graceful degradation**: Agents fall back to local Ollama independently +- **Extensible**: New agents added by implementing adapter layer only + +### Negative +- **Latency**: Additional HTTP round-trip for each request (vs. direct model call) +- **Adapter maintenance**: Each agent needs an adapter; breaks if agent API changes +- **Protocol overhead**: JSON-RPC adds overhead vs. direct integration + +### Risks +- **Claude Code integration risk**: Requires subprocess communication with `claude` CLI +- **Mitigation**: claude-bridge already demonstrates working pattern +- **Codex integration risk**: Microsoft LSP server not directly compatible with HTTP +- **Mitigation**: Implement thin LSP-to-HTTP translation layer + +## Implementation Plan + +### Phase 2G.1: Claude Code Integration (Week 1) +```bash +# Extend @llm-gateway/client with agent metadata +createTIPClient({ + agentId: 'claude-code', + fallback: { ollamaUrl: '192.168.178.213:11434' } +}) +``` + +### Phase 2G.2: Codex/Copilot (Week 2) +```bash +# Implement LSP server wrapper +npm install -D @types/node-lsp-server +# Create packages/lsp-adapter/ +# - Implements LSP protocol +# - Translates completion requests to HTTP +``` + +### Phase 2G.3: ChatGPT Integration (Week 3) +```bash +# OpenAI API compatibility layer +# POST /agents/chatgpt/chat/completions +# → Translate to gateway completion format +``` + +### Phase 2G.4: Learning Integration (Week 4) +```bash +# Connect agent-specific metrics to learning engine +# - Track per-agent accuracy, token usage, latency +# - Auto-select models per agent based on performance +``` + +## Open Questions + +1. **Authentication**: How do agents authenticate with gateway? + - Option A: API keys per agent + - Option B: OAuth2 with OIDC + - Option C: mTLS for local agents, keys for remote + - **Decision pending**: TBD in Phase 2G.1 + +2. **Rate Limiting**: Per-agent or global quota? + - Option A: Per-agent limits (Claude Code = 100 req/min) + - Option B: Global pool shared across agents + - **Decision pending**: Depends on learning system usage patterns + +3. **Response Format**: Streaming vs. buffered? + - Option A: Always stream (SSE) + - Option B: Support both (`?stream=true/false`) + - **Decision pending**: Codex/Copilot compatibility check needed diff --git a/docs/adr/README.md b/docs/adr/README.md index 0a72783..20e9d1e 100644 --- a/docs/adr/README.md +++ b/docs/adr/README.md @@ -6,3 +6,4 @@ | [0002](0002-tier-assignment-strategy.md) | Tier Assignment Strategy for Model Selection | accepted | 2026-04-19 | | [0003](0003-confidence-gate-thresholds.md) | Confidence Gate Thresholds & Learning Cycle Intervals | accepted | 2026-04-19 | | [0004](0004-external-fallback-chain.md) | External Provider Fallback Chain Ordering | accepted | 2026-04-19 | +| [0005](0005-agent-integration-protocol.md) | Multi-Agent Integration Protocol & Adapters | accepted | 2026-04-19 |