Rene Fichtmueller 2ca77d0aee feat: Phase 2F — Multi-Agent Integration (ADRs + Client Fallback + Tests)

- ADR-0001: Multi-Agent Coworking Architecture with LLM Gateway Orchestrator
- ADR-0002: Tier Assignment Strategy for Model Selection (cost-first escalation)
- ADR-0003: Confidence Gate Thresholds & Learning Cycle Intervals (6h/12h/24h cycles)
- ADR-0004: External Provider Fallback Chain Ordering (Cerebras → Groq → Mistral)
- Enhanced client SDK: Offline Ollama fallback, health checks, exponential backoff retry
- Integration tests: claude-code-integration.test.ts (14 test cases)
- PHASE_2F_DEPLOYMENT.md: Pre-deployment checklist, automated deploy, rollback plan
- Post-deployment verification procedures for health, client fallback, metrics

2026-04-19 21:39:44 +02:00

8.3 KiB

Raw Blame History

Phase 2F Deployment Guide

Date: 2026-04-19
Phase: 2F — Multi-Agent Integration (Phase 2E + ADRs + Client Enhancements)
Status: Ready for deployment

What's New in Phase 2F

1. Architecture Decision Records (ADRs)

Located in docs/adr/:

ADR-0001: Multi-Agent Coworking Architecture with LLM Gateway Orchestrator
- Gateway acts as central hub for Claude Code, Codex, Copilot, ChatGPT
- Agents are clients, not workers
- HTTP API for agent communication
- Learning system improves routing for all agents collectively
ADR-0002: Tier Assignment Strategy for Model Selection
- Cost-first approach: fast → medium → large → external
- Confidence-based escalation: if confidence <5 on fast, escalate to medium
- Learning cycles (6h, 12h, 24h) auto-tune tier assignment
ADR-0003: Confidence Gate Thresholds & Learning Cycle Intervals
- Three-tier gating: 0–4 (pending_review), 4–7 (warning), 7–10 (approved)
- Autonomous learning cycles adjust thresholds based on human review feedback
- 6h (validators), 12h (thresholds), 24h (model assignments)
ADR-0004: External Provider Fallback Chain Ordering
- Cerebras → Groq → Mistral AI → NVIDIA NIM → Cloudflare Workers AI
- Cost-first: prefer free tiers; paid APIs only as fallback
- Rate-limit aware: automatic backoff and provider rotation

2. Enhanced Client SDK (`@llm-gateway/client`)

New features:

Offline Ollama fallback: If Gateway is down, client automatically falls back to local Ollama
- Attempts completion via Gateway first (via HTTP)
- On failure, switches to local Ollama (http://192.168.178.213:11434)
- Automatic retry with exponential backoff (max 3 attempts)
- Health check caching: doesn't spam health checks after failure
Health status API:
- client.getStatus() returns: { gateway: bool, ollama: string, mode: 'gateway'|'fallback' }
- Useful for agent dashboards to show current routing mode
Graceful degradation:
- Client catches timeouts, network errors, and API errors
- Returns meaningful error messages for debugging

Code changes:

// Old: No fallback capability
const client = createTIPClient();
const result = await client.completion({...});
// Would throw if Gateway down

// New: Automatic fallback
const client = createTIPClient();
const result = await client.completion({...});
// Tries Gateway first, falls back to Ollama automatically
const status = client.getStatus();
console.log(status.mode); // 'gateway' or 'fallback'

3. Integration Tests

Added tests/integration/claude-code-integration.test.ts:

Health checks
Completion requests (code explanation, analysis, summarization)
Offline fallback verification
Rate limiting and SLA validation
Error handling

Run tests:

npm run test:integration
# or filter:
npm run test -- claude-code-integration

4. Deployment Changes

No breaking changes to:

Gateway API (POST /v1/completion, /v1/classify, etc.)
Database schema
Routing rules
Configuration

Additive changes:

docs/adr/ directory with 4 ADRs
Enhanced client SDK with fallback capability
Integration test suite

Pre-Deployment Checklist

All TypeScript compiles cleanly (npm run build)
Integration tests pass (npm run test:integration)
Client SDK builds without errors
Git branch is main with no uncommitted changes
Gitea remote is configured (git remote -v shows origin → gitea.context-x.org)
SSH key for Erik (root@82.165.222.127) is working
Ollama is running on Mac Studio (http://192.168.178.213:11434 reachable)

Deployment Steps

Option 1: Automated Deploy (Recommended)

cd /Users/renefichtmueller/Desktop/Claude\ Code/llm-gateway
bash deploy/deploy.sh

This will:

Check prerequisites (npm, git, ssh, curl)
Run local build (npm run build)
Push main branch to Gitea
SSH into Erik and:
- Pull latest code
- Run npm install
- Run npm run build
- Reload PM2 processes (llm-gateway, llm-learning)
Wait for service restart and verify health
Return summary with status

Expected output:

Deploy successful!

  Commit:         a1b2c3d
  Direct:         http://82.165.222.127:3100/health
  Cloudflare:     https://llm-gateway.context-x.org/health
  PM2 status:     ssh root@82.165.222.127 'pm2 status'
  Logs:           ssh root@82.165.222.127 'pm2 logs llm-gateway'

Option 2: Manual Deploy

If automated deploy fails, deploy manually on Erik:

ssh root@82.165.222.127

# On Erik:
cd /opt/llm-gateway
git fetch origin
git reset --hard origin/main
npm install
npm run build
pm2 reload llm-gateway llm-learning --update-env
pm2 status
pm2 logs llm-gateway --lines 20

Post-Deployment Verification

1. Health Check

curl -s https://llm-gateway.context-x.org/health | jq .
# Expected output:
# { "status": "ok", "ollama": {...}, "queue": {...} }

2. Test Completion Endpoint

curl -X POST https://llm-gateway.context-x.org/v1/completion \
  -H "Content-Type: application/json" \
  -d '{
    "caller": "phase-2f-test",
    "task_type": "analysis",
    "input": "Test request for Phase 2F verification",
    "language": "en"
  }' | jq .

Expected response:

{
  "id": "call-XXXX-YYYY",
  "status": "approved",
  "confidence": 7.5,
  "model": "qwen2.5:14b",
  "task_type": "analysis",
  "latency_ms": 1234,
  "tokens": { "in": 42, "out": 156 },
  "output": "Test request processed successfully..."
}

3. Verify Client Fallback

Test that client can fall back to Ollama when Gateway is unavailable:

import { LLMGatewayClient } from '@llm-gateway/client';

const client = new LLMGatewayClient({
  caller: 'test',
  baseUrl: 'http://localhost:9999', // non-existent
  ollamaUrl: 'http://192.168.178.213:11434', // falls back to Mac Studio
  timeout: 30_000,
});

const result = await client.completion({
  task_type: 'fallback-test',
  input: 'Test fallback to Ollama',
});

console.log(result.status); // 'approved'
console.log(client.getStatus().mode); // 'fallback'

4. Check PM2 Processes

ssh root@82.165.222.127 pm2 status
# Expected: Both llm-gateway (PM2 id 19) and llm-learning (PM2 id 20) running

5. Monitor Logs

ssh root@82.165.222.127 pm2 logs llm-gateway --lines 50
# Check for errors, warnings, and learning cycle logs

Rollback Plan

If Phase 2F deployment fails:

# On Erik:
cd /opt/llm-gateway
git reset --hard HEAD~1  # Go back to previous commit
npm run build
pm2 reload llm-gateway llm-learning --update-env

Or automated rollback:

# Locally:
git reset --hard HEAD~1
bash deploy/deploy.sh

Monitoring After Deployment

Key metrics to watch (via Shield Dashboard or logs):

Request latency: Should be <1s for fast tier, 1-3s for medium
Confidence score distribution: Should stay relatively stable
Review queue depth: Should be <100 items
Provider fallback rate: Should be <5% (most requests via Gateway)
Learning cycle execution: Should see 6h/12h/24h cycle logs

Alerts to configure:

If any request latency >10s
If review queue >500 items
If provider fallback rate >20%
If confidence threshold drifts >0.5 in 24h
If Gateway health check fails 3 times in a row

Agent Integration Next Steps (Phase 2G)

Once Phase 2F is live:

Claude Code integration: Embed @llm-gateway/client into Claude Code
Codex integration: HTTP client in VS Code extension
Copilot integration: Plugin for GitHub Copilot
ChatGPT integration: Custom GPT with Gateway endpoint

Support & Troubleshooting

If Gateway is slow:

Check Ollama on Mac Studio: curl http://192.168.178.213:11434/api/tags
Check learning engine logs: pm2 logs llm-learning
Check CPU/memory on Erik: ssh root@82.165.222.127 top

If client can't reach Gateway:

Check Cloudflare tunnel status: ssh root@82.165.222.127 'ps aux | grep cloudflared'
Check DNS: dig llm-gateway.context-x.org
Fall back to direct IP: client will auto-retry with 30s cooldown

If confidence scores are low:

Check which validators are failing: curl https://llm-gateway.context-x.org/v1/metrics | grep validation_failures
Review recent audit logs: SELECT * FROM audit_log LIMIT 20 ORDER BY created_at DESC;

Questions? Check docs/adr/ for design rationale or contact René.

8.3 KiB Raw Blame History Unescape Escape