llm-gateway/PHASE_2F_DEPLOYMENT.md
Rene Fichtmueller 2ca77d0aee feat: Phase 2F — Multi-Agent Integration (ADRs + Client Fallback + Tests)
- ADR-0001: Multi-Agent Coworking Architecture with LLM Gateway Orchestrator
- ADR-0002: Tier Assignment Strategy for Model Selection (cost-first escalation)
- ADR-0003: Confidence Gate Thresholds & Learning Cycle Intervals (6h/12h/24h cycles)
- ADR-0004: External Provider Fallback Chain Ordering (Cerebras → Groq → Mistral)
- Enhanced client SDK: Offline Ollama fallback, health checks, exponential backoff retry
- Integration tests: claude-code-integration.test.ts (14 test cases)
- PHASE_2F_DEPLOYMENT.md: Pre-deployment checklist, automated deploy, rollback plan
- Post-deployment verification procedures for health, client fallback, metrics
2026-04-19 21:39:44 +02:00

298 lines
8.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Phase 2F Deployment Guide
**Date**: 2026-04-19
**Phase**: 2F — Multi-Agent Integration (Phase 2E + ADRs + Client Enhancements)
**Status**: Ready for deployment
## What's New in Phase 2F
### 1. Architecture Decision Records (ADRs)
Located in `docs/adr/`:
- **ADR-0001**: Multi-Agent Coworking Architecture with LLM Gateway Orchestrator
- Gateway acts as central hub for Claude Code, Codex, Copilot, ChatGPT
- Agents are clients, not workers
- HTTP API for agent communication
- Learning system improves routing for all agents collectively
- **ADR-0002**: Tier Assignment Strategy for Model Selection
- Cost-first approach: fast → medium → large → external
- Confidence-based escalation: if confidence <5 on fast, escalate to medium
- Learning cycles (6h, 12h, 24h) auto-tune tier assignment
- **ADR-0003**: Confidence Gate Thresholds & Learning Cycle Intervals
- Three-tier gating: 04 (pending_review), 47 (warning), 710 (approved)
- Autonomous learning cycles adjust thresholds based on human review feedback
- 6h (validators), 12h (thresholds), 24h (model assignments)
- **ADR-0004**: External Provider Fallback Chain Ordering
- Cerebras Groq Mistral AI NVIDIA NIM Cloudflare Workers AI
- Cost-first: prefer free tiers; paid APIs only as fallback
- Rate-limit aware: automatic backoff and provider rotation
### 2. Enhanced Client SDK (`@llm-gateway/client`)
**New features**:
- **Offline Ollama fallback**: If Gateway is down, client automatically falls back to local Ollama
- Attempts completion via Gateway first (via HTTP)
- On failure, switches to local Ollama (http://192.168.178.213:11434)
- Automatic retry with exponential backoff (max 3 attempts)
- Health check caching: doesn't spam health checks after failure
- **Health status API**:
- `client.getStatus()` returns: `{ gateway: bool, ollama: string, mode: 'gateway'|'fallback' }`
- Useful for agent dashboards to show current routing mode
- **Graceful degradation**:
- Client catches timeouts, network errors, and API errors
- Returns meaningful error messages for debugging
**Code changes**:
```typescript
// Old: No fallback capability
const client = createTIPClient();
const result = await client.completion({...});
// Would throw if Gateway down
// New: Automatic fallback
const client = createTIPClient();
const result = await client.completion({...});
// Tries Gateway first, falls back to Ollama automatically
const status = client.getStatus();
console.log(status.mode); // 'gateway' or 'fallback'
```
### 3. Integration Tests
Added `tests/integration/claude-code-integration.test.ts`:
- Health checks
- Completion requests (code explanation, analysis, summarization)
- Offline fallback verification
- Rate limiting and SLA validation
- Error handling
Run tests:
```bash
npm run test:integration
# or filter:
npm run test -- claude-code-integration
```
### 4. Deployment Changes
No breaking changes to:
- Gateway API (`POST /v1/completion`, `/v1/classify`, etc.)
- Database schema
- Routing rules
- Configuration
**Additive changes**:
- `docs/adr/` directory with 4 ADRs
- Enhanced client SDK with fallback capability
- Integration test suite
## Pre-Deployment Checklist
- [ ] All TypeScript compiles cleanly (`npm run build`)
- [ ] Integration tests pass (`npm run test:integration`)
- [ ] Client SDK builds without errors
- [ ] Git branch is `main` with no uncommitted changes
- [ ] Gitea remote is configured (`git remote -v` shows origin gitea.context-x.org)
- [ ] SSH key for Erik (root@82.165.222.127) is working
- [ ] Ollama is running on Mac Studio (http://192.168.178.213:11434 reachable)
## Deployment Steps
### Option 1: Automated Deploy (Recommended)
```bash
cd /Users/renefichtmueller/Desktop/Claude\ Code/llm-gateway
bash deploy/deploy.sh
```
This will:
1. Check prerequisites (npm, git, ssh, curl)
2. Run local build (`npm run build`)
3. Push `main` branch to Gitea
4. SSH into Erik and:
- Pull latest code
- Run `npm install`
- Run `npm run build`
- Reload PM2 processes (llm-gateway, llm-learning)
5. Wait for service restart and verify health
6. Return summary with status
**Expected output**:
```
Deploy successful!
Commit: a1b2c3d
Direct: http://82.165.222.127:3100/health
Cloudflare: https://llm-gateway.context-x.org/health
PM2 status: ssh root@82.165.222.127 'pm2 status'
Logs: ssh root@82.165.222.127 'pm2 logs llm-gateway'
```
### Option 2: Manual Deploy
If automated deploy fails, deploy manually on Erik:
```bash
ssh root@82.165.222.127
# On Erik:
cd /opt/llm-gateway
git fetch origin
git reset --hard origin/main
npm install
npm run build
pm2 reload llm-gateway llm-learning --update-env
pm2 status
pm2 logs llm-gateway --lines 20
```
## Post-Deployment Verification
### 1. Health Check
```bash
curl -s https://llm-gateway.context-x.org/health | jq .
# Expected output:
# { "status": "ok", "ollama": {...}, "queue": {...} }
```
### 2. Test Completion Endpoint
```bash
curl -X POST https://llm-gateway.context-x.org/v1/completion \
-H "Content-Type: application/json" \
-d '{
"caller": "phase-2f-test",
"task_type": "analysis",
"input": "Test request for Phase 2F verification",
"language": "en"
}' | jq .
```
**Expected response**:
```json
{
"id": "call-XXXX-YYYY",
"status": "approved",
"confidence": 7.5,
"model": "qwen2.5:14b",
"task_type": "analysis",
"latency_ms": 1234,
"tokens": { "in": 42, "out": 156 },
"output": "Test request processed successfully..."
}
```
### 3. Verify Client Fallback
Test that client can fall back to Ollama when Gateway is unavailable:
```typescript
import { LLMGatewayClient } from '@llm-gateway/client';
const client = new LLMGatewayClient({
caller: 'test',
baseUrl: 'http://localhost:9999', // non-existent
ollamaUrl: 'http://192.168.178.213:11434', // falls back to Mac Studio
timeout: 30_000,
});
const result = await client.completion({
task_type: 'fallback-test',
input: 'Test fallback to Ollama',
});
console.log(result.status); // 'approved'
console.log(client.getStatus().mode); // 'fallback'
```
### 4. Check PM2 Processes
```bash
ssh root@82.165.222.127 pm2 status
# Expected: Both llm-gateway (PM2 id 19) and llm-learning (PM2 id 20) running
```
### 5. Monitor Logs
```bash
ssh root@82.165.222.127 pm2 logs llm-gateway --lines 50
# Check for errors, warnings, and learning cycle logs
```
## Rollback Plan
If Phase 2F deployment fails:
```bash
# On Erik:
cd /opt/llm-gateway
git reset --hard HEAD~1 # Go back to previous commit
npm run build
pm2 reload llm-gateway llm-learning --update-env
```
Or automated rollback:
```bash
# Locally:
git reset --hard HEAD~1
bash deploy/deploy.sh
```
## Monitoring After Deployment
**Key metrics to watch** (via Shield Dashboard or logs):
1. **Request latency**: Should be <1s for fast tier, 1-3s for medium
2. **Confidence score distribution**: Should stay relatively stable
3. **Review queue depth**: Should be <100 items
4. **Provider fallback rate**: Should be <5% (most requests via Gateway)
5. **Learning cycle execution**: Should see 6h/12h/24h cycle logs
**Alerts to configure**:
- If any request latency >10s
- If review queue >500 items
- If provider fallback rate >20%
- If confidence threshold drifts >0.5 in 24h
- If Gateway health check fails 3 times in a row
## Agent Integration Next Steps (Phase 2G)
Once Phase 2F is live:
1. **Claude Code integration**: Embed `@llm-gateway/client` into Claude Code
2. **Codex integration**: HTTP client in VS Code extension
3. **Copilot integration**: Plugin for GitHub Copilot
4. **ChatGPT integration**: Custom GPT with Gateway endpoint
## Support & Troubleshooting
**If Gateway is slow**:
- Check Ollama on Mac Studio: `curl http://192.168.178.213:11434/api/tags`
- Check learning engine logs: `pm2 logs llm-learning`
- Check CPU/memory on Erik: `ssh root@82.165.222.127 top`
**If client can't reach Gateway**:
- Check Cloudflare tunnel status: `ssh root@82.165.222.127 'ps aux | grep cloudflared'`
- Check DNS: `dig llm-gateway.context-x.org`
- Fall back to direct IP: client will auto-retry with 30s cooldown
**If confidence scores are low**:
- Check which validators are failing: `curl https://llm-gateway.context-x.org/v1/metrics | grep validation_failures`
- Review recent audit logs: `SELECT * FROM audit_log LIMIT 20 ORDER BY created_at DESC;`
---
**Questions?** Check `docs/adr/` for design rationale or contact René.