- ADR-0001: Multi-Agent Coworking Architecture with LLM Gateway Orchestrator - ADR-0002: Tier Assignment Strategy for Model Selection (cost-first escalation) - ADR-0003: Confidence Gate Thresholds & Learning Cycle Intervals (6h/12h/24h cycles) - ADR-0004: External Provider Fallback Chain Ordering (Cerebras → Groq → Mistral) - Enhanced client SDK: Offline Ollama fallback, health checks, exponential backoff retry - Integration tests: claude-code-integration.test.ts (14 test cases) - PHASE_2F_DEPLOYMENT.md: Pre-deployment checklist, automated deploy, rollback plan - Post-deployment verification procedures for health, client fallback, metrics
298 lines
8.3 KiB
Markdown
298 lines
8.3 KiB
Markdown
# Phase 2F Deployment Guide
|
||
|
||
**Date**: 2026-04-19
|
||
**Phase**: 2F — Multi-Agent Integration (Phase 2E + ADRs + Client Enhancements)
|
||
**Status**: Ready for deployment
|
||
|
||
## What's New in Phase 2F
|
||
|
||
### 1. Architecture Decision Records (ADRs)
|
||
|
||
Located in `docs/adr/`:
|
||
|
||
- **ADR-0001**: Multi-Agent Coworking Architecture with LLM Gateway Orchestrator
|
||
- Gateway acts as central hub for Claude Code, Codex, Copilot, ChatGPT
|
||
- Agents are clients, not workers
|
||
- HTTP API for agent communication
|
||
- Learning system improves routing for all agents collectively
|
||
|
||
- **ADR-0002**: Tier Assignment Strategy for Model Selection
|
||
- Cost-first approach: fast → medium → large → external
|
||
- Confidence-based escalation: if confidence <5 on fast, escalate to medium
|
||
- Learning cycles (6h, 12h, 24h) auto-tune tier assignment
|
||
|
||
- **ADR-0003**: Confidence Gate Thresholds & Learning Cycle Intervals
|
||
- Three-tier gating: 0–4 (pending_review), 4–7 (warning), 7–10 (approved)
|
||
- Autonomous learning cycles adjust thresholds based on human review feedback
|
||
- 6h (validators), 12h (thresholds), 24h (model assignments)
|
||
|
||
- **ADR-0004**: External Provider Fallback Chain Ordering
|
||
- Cerebras → Groq → Mistral AI → NVIDIA NIM → Cloudflare Workers AI
|
||
- Cost-first: prefer free tiers; paid APIs only as fallback
|
||
- Rate-limit aware: automatic backoff and provider rotation
|
||
|
||
### 2. Enhanced Client SDK (`@llm-gateway/client`)
|
||
|
||
**New features**:
|
||
|
||
- **Offline Ollama fallback**: If Gateway is down, client automatically falls back to local Ollama
|
||
- Attempts completion via Gateway first (via HTTP)
|
||
- On failure, switches to local Ollama (http://192.168.178.213:11434)
|
||
- Automatic retry with exponential backoff (max 3 attempts)
|
||
- Health check caching: doesn't spam health checks after failure
|
||
|
||
- **Health status API**:
|
||
- `client.getStatus()` returns: `{ gateway: bool, ollama: string, mode: 'gateway'|'fallback' }`
|
||
- Useful for agent dashboards to show current routing mode
|
||
|
||
- **Graceful degradation**:
|
||
- Client catches timeouts, network errors, and API errors
|
||
- Returns meaningful error messages for debugging
|
||
|
||
**Code changes**:
|
||
|
||
```typescript
|
||
// Old: No fallback capability
|
||
const client = createTIPClient();
|
||
const result = await client.completion({...});
|
||
// Would throw if Gateway down
|
||
|
||
// New: Automatic fallback
|
||
const client = createTIPClient();
|
||
const result = await client.completion({...});
|
||
// Tries Gateway first, falls back to Ollama automatically
|
||
const status = client.getStatus();
|
||
console.log(status.mode); // 'gateway' or 'fallback'
|
||
```
|
||
|
||
### 3. Integration Tests
|
||
|
||
Added `tests/integration/claude-code-integration.test.ts`:
|
||
|
||
- Health checks
|
||
- Completion requests (code explanation, analysis, summarization)
|
||
- Offline fallback verification
|
||
- Rate limiting and SLA validation
|
||
- Error handling
|
||
|
||
Run tests:
|
||
```bash
|
||
npm run test:integration
|
||
# or filter:
|
||
npm run test -- claude-code-integration
|
||
```
|
||
|
||
### 4. Deployment Changes
|
||
|
||
No breaking changes to:
|
||
- Gateway API (`POST /v1/completion`, `/v1/classify`, etc.)
|
||
- Database schema
|
||
- Routing rules
|
||
- Configuration
|
||
|
||
**Additive changes**:
|
||
- `docs/adr/` directory with 4 ADRs
|
||
- Enhanced client SDK with fallback capability
|
||
- Integration test suite
|
||
|
||
## Pre-Deployment Checklist
|
||
|
||
- [ ] All TypeScript compiles cleanly (`npm run build`)
|
||
- [ ] Integration tests pass (`npm run test:integration`)
|
||
- [ ] Client SDK builds without errors
|
||
- [ ] Git branch is `main` with no uncommitted changes
|
||
- [ ] Gitea remote is configured (`git remote -v` shows origin → gitea.context-x.org)
|
||
- [ ] SSH key for Erik (root@82.165.222.127) is working
|
||
- [ ] Ollama is running on Mac Studio (http://192.168.178.213:11434 reachable)
|
||
|
||
## Deployment Steps
|
||
|
||
### Option 1: Automated Deploy (Recommended)
|
||
|
||
```bash
|
||
cd /Users/renefichtmueller/Desktop/Claude\ Code/llm-gateway
|
||
bash deploy/deploy.sh
|
||
```
|
||
|
||
This will:
|
||
1. Check prerequisites (npm, git, ssh, curl)
|
||
2. Run local build (`npm run build`)
|
||
3. Push `main` branch to Gitea
|
||
4. SSH into Erik and:
|
||
- Pull latest code
|
||
- Run `npm install`
|
||
- Run `npm run build`
|
||
- Reload PM2 processes (llm-gateway, llm-learning)
|
||
5. Wait for service restart and verify health
|
||
6. Return summary with status
|
||
|
||
**Expected output**:
|
||
```
|
||
Deploy successful!
|
||
|
||
Commit: a1b2c3d
|
||
Direct: http://82.165.222.127:3100/health
|
||
Cloudflare: https://llm-gateway.context-x.org/health
|
||
PM2 status: ssh root@82.165.222.127 'pm2 status'
|
||
Logs: ssh root@82.165.222.127 'pm2 logs llm-gateway'
|
||
```
|
||
|
||
### Option 2: Manual Deploy
|
||
|
||
If automated deploy fails, deploy manually on Erik:
|
||
|
||
```bash
|
||
ssh root@82.165.222.127
|
||
|
||
# On Erik:
|
||
cd /opt/llm-gateway
|
||
git fetch origin
|
||
git reset --hard origin/main
|
||
npm install
|
||
npm run build
|
||
pm2 reload llm-gateway llm-learning --update-env
|
||
pm2 status
|
||
pm2 logs llm-gateway --lines 20
|
||
```
|
||
|
||
## Post-Deployment Verification
|
||
|
||
### 1. Health Check
|
||
|
||
```bash
|
||
curl -s https://llm-gateway.context-x.org/health | jq .
|
||
# Expected output:
|
||
# { "status": "ok", "ollama": {...}, "queue": {...} }
|
||
```
|
||
|
||
### 2. Test Completion Endpoint
|
||
|
||
```bash
|
||
curl -X POST https://llm-gateway.context-x.org/v1/completion \
|
||
-H "Content-Type: application/json" \
|
||
-d '{
|
||
"caller": "phase-2f-test",
|
||
"task_type": "analysis",
|
||
"input": "Test request for Phase 2F verification",
|
||
"language": "en"
|
||
}' | jq .
|
||
```
|
||
|
||
**Expected response**:
|
||
```json
|
||
{
|
||
"id": "call-XXXX-YYYY",
|
||
"status": "approved",
|
||
"confidence": 7.5,
|
||
"model": "qwen2.5:14b",
|
||
"task_type": "analysis",
|
||
"latency_ms": 1234,
|
||
"tokens": { "in": 42, "out": 156 },
|
||
"output": "Test request processed successfully..."
|
||
}
|
||
```
|
||
|
||
### 3. Verify Client Fallback
|
||
|
||
Test that client can fall back to Ollama when Gateway is unavailable:
|
||
|
||
```typescript
|
||
import { LLMGatewayClient } from '@llm-gateway/client';
|
||
|
||
const client = new LLMGatewayClient({
|
||
caller: 'test',
|
||
baseUrl: 'http://localhost:9999', // non-existent
|
||
ollamaUrl: 'http://192.168.178.213:11434', // falls back to Mac Studio
|
||
timeout: 30_000,
|
||
});
|
||
|
||
const result = await client.completion({
|
||
task_type: 'fallback-test',
|
||
input: 'Test fallback to Ollama',
|
||
});
|
||
|
||
console.log(result.status); // 'approved'
|
||
console.log(client.getStatus().mode); // 'fallback'
|
||
```
|
||
|
||
### 4. Check PM2 Processes
|
||
|
||
```bash
|
||
ssh root@82.165.222.127 pm2 status
|
||
# Expected: Both llm-gateway (PM2 id 19) and llm-learning (PM2 id 20) running
|
||
```
|
||
|
||
### 5. Monitor Logs
|
||
|
||
```bash
|
||
ssh root@82.165.222.127 pm2 logs llm-gateway --lines 50
|
||
# Check for errors, warnings, and learning cycle logs
|
||
```
|
||
|
||
## Rollback Plan
|
||
|
||
If Phase 2F deployment fails:
|
||
|
||
```bash
|
||
# On Erik:
|
||
cd /opt/llm-gateway
|
||
git reset --hard HEAD~1 # Go back to previous commit
|
||
npm run build
|
||
pm2 reload llm-gateway llm-learning --update-env
|
||
```
|
||
|
||
Or automated rollback:
|
||
|
||
```bash
|
||
# Locally:
|
||
git reset --hard HEAD~1
|
||
bash deploy/deploy.sh
|
||
```
|
||
|
||
## Monitoring After Deployment
|
||
|
||
**Key metrics to watch** (via Shield Dashboard or logs):
|
||
|
||
1. **Request latency**: Should be <1s for fast tier, 1-3s for medium
|
||
2. **Confidence score distribution**: Should stay relatively stable
|
||
3. **Review queue depth**: Should be <100 items
|
||
4. **Provider fallback rate**: Should be <5% (most requests via Gateway)
|
||
5. **Learning cycle execution**: Should see 6h/12h/24h cycle logs
|
||
|
||
**Alerts to configure**:
|
||
|
||
- If any request latency >10s
|
||
- If review queue >500 items
|
||
- If provider fallback rate >20%
|
||
- If confidence threshold drifts >0.5 in 24h
|
||
- If Gateway health check fails 3 times in a row
|
||
|
||
## Agent Integration Next Steps (Phase 2G)
|
||
|
||
Once Phase 2F is live:
|
||
|
||
1. **Claude Code integration**: Embed `@llm-gateway/client` into Claude Code
|
||
2. **Codex integration**: HTTP client in VS Code extension
|
||
3. **Copilot integration**: Plugin for GitHub Copilot
|
||
4. **ChatGPT integration**: Custom GPT with Gateway endpoint
|
||
|
||
## Support & Troubleshooting
|
||
|
||
**If Gateway is slow**:
|
||
- Check Ollama on Mac Studio: `curl http://192.168.178.213:11434/api/tags`
|
||
- Check learning engine logs: `pm2 logs llm-learning`
|
||
- Check CPU/memory on Erik: `ssh root@82.165.222.127 top`
|
||
|
||
**If client can't reach Gateway**:
|
||
- Check Cloudflare tunnel status: `ssh root@82.165.222.127 'ps aux | grep cloudflared'`
|
||
- Check DNS: `dig llm-gateway.context-x.org`
|
||
- Fall back to direct IP: client will auto-retry with 30s cooldown
|
||
|
||
**If confidence scores are low**:
|
||
- Check which validators are failing: `curl https://llm-gateway.context-x.org/v1/metrics | grep validation_failures`
|
||
- Review recent audit logs: `SELECT * FROM audit_log LIMIT 20 ORDER BY created_at DESC;`
|
||
|
||
---
|
||
|
||
**Questions?** Check `docs/adr/` for design rationale or contact René.
|