- ADR-0001: Multi-Agent Coworking Architecture with LLM Gateway Orchestrator - ADR-0002: Tier Assignment Strategy for Model Selection (cost-first escalation) - ADR-0003: Confidence Gate Thresholds & Learning Cycle Intervals (6h/12h/24h cycles) - ADR-0004: External Provider Fallback Chain Ordering (Cerebras → Groq → Mistral) - Enhanced client SDK: Offline Ollama fallback, health checks, exponential backoff retry - Integration tests: claude-code-integration.test.ts (14 test cases) - PHASE_2F_DEPLOYMENT.md: Pre-deployment checklist, automated deploy, rollback plan - Post-deployment verification procedures for health, client fallback, metrics
8.3 KiB
Phase 2F Deployment Guide
Date: 2026-04-19
Phase: 2F — Multi-Agent Integration (Phase 2E + ADRs + Client Enhancements)
Status: Ready for deployment
What's New in Phase 2F
1. Architecture Decision Records (ADRs)
Located in docs/adr/:
-
ADR-0001: Multi-Agent Coworking Architecture with LLM Gateway Orchestrator
- Gateway acts as central hub for Claude Code, Codex, Copilot, ChatGPT
- Agents are clients, not workers
- HTTP API for agent communication
- Learning system improves routing for all agents collectively
-
ADR-0002: Tier Assignment Strategy for Model Selection
- Cost-first approach: fast → medium → large → external
- Confidence-based escalation: if confidence <5 on fast, escalate to medium
- Learning cycles (6h, 12h, 24h) auto-tune tier assignment
-
ADR-0003: Confidence Gate Thresholds & Learning Cycle Intervals
- Three-tier gating: 0–4 (pending_review), 4–7 (warning), 7–10 (approved)
- Autonomous learning cycles adjust thresholds based on human review feedback
- 6h (validators), 12h (thresholds), 24h (model assignments)
-
ADR-0004: External Provider Fallback Chain Ordering
- Cerebras → Groq → Mistral AI → NVIDIA NIM → Cloudflare Workers AI
- Cost-first: prefer free tiers; paid APIs only as fallback
- Rate-limit aware: automatic backoff and provider rotation
2. Enhanced Client SDK (@llm-gateway/client)
New features:
-
Offline Ollama fallback: If Gateway is down, client automatically falls back to local Ollama
- Attempts completion via Gateway first (via HTTP)
- On failure, switches to local Ollama (http://192.168.178.213:11434)
- Automatic retry with exponential backoff (max 3 attempts)
- Health check caching: doesn't spam health checks after failure
-
Health status API:
client.getStatus()returns:{ gateway: bool, ollama: string, mode: 'gateway'|'fallback' }- Useful for agent dashboards to show current routing mode
-
Graceful degradation:
- Client catches timeouts, network errors, and API errors
- Returns meaningful error messages for debugging
Code changes:
// Old: No fallback capability
const client = createTIPClient();
const result = await client.completion({...});
// Would throw if Gateway down
// New: Automatic fallback
const client = createTIPClient();
const result = await client.completion({...});
// Tries Gateway first, falls back to Ollama automatically
const status = client.getStatus();
console.log(status.mode); // 'gateway' or 'fallback'
3. Integration Tests
Added tests/integration/claude-code-integration.test.ts:
- Health checks
- Completion requests (code explanation, analysis, summarization)
- Offline fallback verification
- Rate limiting and SLA validation
- Error handling
Run tests:
npm run test:integration
# or filter:
npm run test -- claude-code-integration
4. Deployment Changes
No breaking changes to:
- Gateway API (
POST /v1/completion,/v1/classify, etc.) - Database schema
- Routing rules
- Configuration
Additive changes:
docs/adr/directory with 4 ADRs- Enhanced client SDK with fallback capability
- Integration test suite
Pre-Deployment Checklist
- All TypeScript compiles cleanly (
npm run build) - Integration tests pass (
npm run test:integration) - Client SDK builds without errors
- Git branch is
mainwith no uncommitted changes - Gitea remote is configured (
git remote -vshows origin → gitea.context-x.org) - SSH key for Erik (root@82.165.222.127) is working
- Ollama is running on Mac Studio (http://192.168.178.213:11434 reachable)
Deployment Steps
Option 1: Automated Deploy (Recommended)
cd /Users/renefichtmueller/Desktop/Claude\ Code/llm-gateway
bash deploy/deploy.sh
This will:
- Check prerequisites (npm, git, ssh, curl)
- Run local build (
npm run build) - Push
mainbranch to Gitea - SSH into Erik and:
- Pull latest code
- Run
npm install - Run
npm run build - Reload PM2 processes (llm-gateway, llm-learning)
- Wait for service restart and verify health
- Return summary with status
Expected output:
Deploy successful!
Commit: a1b2c3d
Direct: http://82.165.222.127:3100/health
Cloudflare: https://llm-gateway.context-x.org/health
PM2 status: ssh root@82.165.222.127 'pm2 status'
Logs: ssh root@82.165.222.127 'pm2 logs llm-gateway'
Option 2: Manual Deploy
If automated deploy fails, deploy manually on Erik:
ssh root@82.165.222.127
# On Erik:
cd /opt/llm-gateway
git fetch origin
git reset --hard origin/main
npm install
npm run build
pm2 reload llm-gateway llm-learning --update-env
pm2 status
pm2 logs llm-gateway --lines 20
Post-Deployment Verification
1. Health Check
curl -s https://llm-gateway.context-x.org/health | jq .
# Expected output:
# { "status": "ok", "ollama": {...}, "queue": {...} }
2. Test Completion Endpoint
curl -X POST https://llm-gateway.context-x.org/v1/completion \
-H "Content-Type: application/json" \
-d '{
"caller": "phase-2f-test",
"task_type": "analysis",
"input": "Test request for Phase 2F verification",
"language": "en"
}' | jq .
Expected response:
{
"id": "call-XXXX-YYYY",
"status": "approved",
"confidence": 7.5,
"model": "qwen2.5:14b",
"task_type": "analysis",
"latency_ms": 1234,
"tokens": { "in": 42, "out": 156 },
"output": "Test request processed successfully..."
}
3. Verify Client Fallback
Test that client can fall back to Ollama when Gateway is unavailable:
import { LLMGatewayClient } from '@llm-gateway/client';
const client = new LLMGatewayClient({
caller: 'test',
baseUrl: 'http://localhost:9999', // non-existent
ollamaUrl: 'http://192.168.178.213:11434', // falls back to Mac Studio
timeout: 30_000,
});
const result = await client.completion({
task_type: 'fallback-test',
input: 'Test fallback to Ollama',
});
console.log(result.status); // 'approved'
console.log(client.getStatus().mode); // 'fallback'
4. Check PM2 Processes
ssh root@82.165.222.127 pm2 status
# Expected: Both llm-gateway (PM2 id 19) and llm-learning (PM2 id 20) running
5. Monitor Logs
ssh root@82.165.222.127 pm2 logs llm-gateway --lines 50
# Check for errors, warnings, and learning cycle logs
Rollback Plan
If Phase 2F deployment fails:
# On Erik:
cd /opt/llm-gateway
git reset --hard HEAD~1 # Go back to previous commit
npm run build
pm2 reload llm-gateway llm-learning --update-env
Or automated rollback:
# Locally:
git reset --hard HEAD~1
bash deploy/deploy.sh
Monitoring After Deployment
Key metrics to watch (via Shield Dashboard or logs):
- Request latency: Should be <1s for fast tier, 1-3s for medium
- Confidence score distribution: Should stay relatively stable
- Review queue depth: Should be <100 items
- Provider fallback rate: Should be <5% (most requests via Gateway)
- Learning cycle execution: Should see 6h/12h/24h cycle logs
Alerts to configure:
- If any request latency >10s
- If review queue >500 items
- If provider fallback rate >20%
- If confidence threshold drifts >0.5 in 24h
- If Gateway health check fails 3 times in a row
Agent Integration Next Steps (Phase 2G)
Once Phase 2F is live:
- Claude Code integration: Embed
@llm-gateway/clientinto Claude Code - Codex integration: HTTP client in VS Code extension
- Copilot integration: Plugin for GitHub Copilot
- ChatGPT integration: Custom GPT with Gateway endpoint
Support & Troubleshooting
If Gateway is slow:
- Check Ollama on Mac Studio:
curl http://192.168.178.213:11434/api/tags - Check learning engine logs:
pm2 logs llm-learning - Check CPU/memory on Erik:
ssh root@82.165.222.127 top
If client can't reach Gateway:
- Check Cloudflare tunnel status:
ssh root@82.165.222.127 'ps aux | grep cloudflared' - Check DNS:
dig llm-gateway.context-x.org - Fall back to direct IP: client will auto-retry with 30s cooldown
If confidence scores are low:
- Check which validators are failing:
curl https://llm-gateway.context-x.org/v1/metrics | grep validation_failures - Review recent audit logs:
SELECT * FROM audit_log LIMIT 20 ORDER BY created_at DESC;
Questions? Check docs/adr/ for design rationale or contact René.