# Phase 2F Deployment Guide **Date**: 2026-04-19 **Phase**: 2F — Multi-Agent Integration (Phase 2E + ADRs + Client Enhancements) **Status**: Ready for deployment ## What's New in Phase 2F ### 1. Architecture Decision Records (ADRs) Located in `docs/adr/`: - **ADR-0001**: Multi-Agent Coworking Architecture with LLM Gateway Orchestrator - Gateway acts as central hub for Claude Code, Codex, Copilot, ChatGPT - Agents are clients, not workers - HTTP API for agent communication - Learning system improves routing for all agents collectively - **ADR-0002**: Tier Assignment Strategy for Model Selection - Cost-first approach: fast → medium → large → external - Confidence-based escalation: if confidence <5 on fast, escalate to medium - Learning cycles (6h, 12h, 24h) auto-tune tier assignment - **ADR-0003**: Confidence Gate Thresholds & Learning Cycle Intervals - Three-tier gating: 0–4 (pending_review), 4–7 (warning), 7–10 (approved) - Autonomous learning cycles adjust thresholds based on human review feedback - 6h (validators), 12h (thresholds), 24h (model assignments) - **ADR-0004**: External Provider Fallback Chain Ordering - Cerebras → Groq → Mistral AI → NVIDIA NIM → Cloudflare Workers AI - Cost-first: prefer free tiers; paid APIs only as fallback - Rate-limit aware: automatic backoff and provider rotation ### 2. Enhanced Client SDK (`@llm-gateway/client`) **New features**: - **Offline Ollama fallback**: If Gateway is down, client automatically falls back to local Ollama - Attempts completion via Gateway first (via HTTP) - On failure, switches to local Ollama (http://192.168.178.213:11434) - Automatic retry with exponential backoff (max 3 attempts) - Health check caching: doesn't spam health checks after failure - **Health status API**: - `client.getStatus()` returns: `{ gateway: bool, ollama: string, mode: 'gateway'|'fallback' }` - Useful for agent dashboards to show current routing mode - **Graceful degradation**: - Client catches timeouts, network errors, and API errors - Returns meaningful error messages for debugging **Code changes**: ```typescript // Old: No fallback capability const client = createTIPClient(); const result = await client.completion({...}); // Would throw if Gateway down // New: Automatic fallback const client = createTIPClient(); const result = await client.completion({...}); // Tries Gateway first, falls back to Ollama automatically const status = client.getStatus(); console.log(status.mode); // 'gateway' or 'fallback' ``` ### 3. Integration Tests Added `tests/integration/claude-code-integration.test.ts`: - Health checks - Completion requests (code explanation, analysis, summarization) - Offline fallback verification - Rate limiting and SLA validation - Error handling Run tests: ```bash npm run test:integration # or filter: npm run test -- claude-code-integration ``` ### 4. Deployment Changes No breaking changes to: - Gateway API (`POST /v1/completion`, `/v1/classify`, etc.) - Database schema - Routing rules - Configuration **Additive changes**: - `docs/adr/` directory with 4 ADRs - Enhanced client SDK with fallback capability - Integration test suite ## Pre-Deployment Checklist - [ ] All TypeScript compiles cleanly (`npm run build`) - [ ] Integration tests pass (`npm run test:integration`) - [ ] Client SDK builds without errors - [ ] Git branch is `main` with no uncommitted changes - [ ] Gitea remote is configured (`git remote -v` shows origin → gitea.context-x.org) - [ ] SSH key for Erik (root@82.165.222.127) is working - [ ] Ollama is running on Mac Studio (http://192.168.178.213:11434 reachable) ## Deployment Steps ### Option 1: Automated Deploy (Recommended) ```bash cd /Users/renefichtmueller/Desktop/Claude\ Code/llm-gateway bash deploy/deploy.sh ``` This will: 1. Check prerequisites (npm, git, ssh, curl) 2. Run local build (`npm run build`) 3. Push `main` branch to Gitea 4. SSH into Erik and: - Pull latest code - Run `npm install` - Run `npm run build` - Reload PM2 processes (llm-gateway, llm-learning) 5. Wait for service restart and verify health 6. Return summary with status **Expected output**: ``` Deploy successful! Commit: a1b2c3d Direct: http://82.165.222.127:3100/health Cloudflare: https://llm-gateway.context-x.org/health PM2 status: ssh root@82.165.222.127 'pm2 status' Logs: ssh root@82.165.222.127 'pm2 logs llm-gateway' ``` ### Option 2: Manual Deploy If automated deploy fails, deploy manually on Erik: ```bash ssh root@82.165.222.127 # On Erik: cd /opt/llm-gateway git fetch origin git reset --hard origin/main npm install npm run build pm2 reload llm-gateway llm-learning --update-env pm2 status pm2 logs llm-gateway --lines 20 ``` ## Post-Deployment Verification ### 1. Health Check ```bash curl -s https://llm-gateway.context-x.org/health | jq . # Expected output: # { "status": "ok", "ollama": {...}, "queue": {...} } ``` ### 2. Test Completion Endpoint ```bash curl -X POST https://llm-gateway.context-x.org/v1/completion \ -H "Content-Type: application/json" \ -d '{ "caller": "phase-2f-test", "task_type": "analysis", "input": "Test request for Phase 2F verification", "language": "en" }' | jq . ``` **Expected response**: ```json { "id": "call-XXXX-YYYY", "status": "approved", "confidence": 7.5, "model": "qwen2.5:14b", "task_type": "analysis", "latency_ms": 1234, "tokens": { "in": 42, "out": 156 }, "output": "Test request processed successfully..." } ``` ### 3. Verify Client Fallback Test that client can fall back to Ollama when Gateway is unavailable: ```typescript import { LLMGatewayClient } from '@llm-gateway/client'; const client = new LLMGatewayClient({ caller: 'test', baseUrl: 'http://localhost:9999', // non-existent ollamaUrl: 'http://192.168.178.213:11434', // falls back to Mac Studio timeout: 30_000, }); const result = await client.completion({ task_type: 'fallback-test', input: 'Test fallback to Ollama', }); console.log(result.status); // 'approved' console.log(client.getStatus().mode); // 'fallback' ``` ### 4. Check PM2 Processes ```bash ssh root@82.165.222.127 pm2 status # Expected: Both llm-gateway (PM2 id 19) and llm-learning (PM2 id 20) running ``` ### 5. Monitor Logs ```bash ssh root@82.165.222.127 pm2 logs llm-gateway --lines 50 # Check for errors, warnings, and learning cycle logs ``` ## Rollback Plan If Phase 2F deployment fails: ```bash # On Erik: cd /opt/llm-gateway git reset --hard HEAD~1 # Go back to previous commit npm run build pm2 reload llm-gateway llm-learning --update-env ``` Or automated rollback: ```bash # Locally: git reset --hard HEAD~1 bash deploy/deploy.sh ``` ## Monitoring After Deployment **Key metrics to watch** (via Shield Dashboard or logs): 1. **Request latency**: Should be <1s for fast tier, 1-3s for medium 2. **Confidence score distribution**: Should stay relatively stable 3. **Review queue depth**: Should be <100 items 4. **Provider fallback rate**: Should be <5% (most requests via Gateway) 5. **Learning cycle execution**: Should see 6h/12h/24h cycle logs **Alerts to configure**: - If any request latency >10s - If review queue >500 items - If provider fallback rate >20% - If confidence threshold drifts >0.5 in 24h - If Gateway health check fails 3 times in a row ## Agent Integration Next Steps (Phase 2G) Once Phase 2F is live: 1. **Claude Code integration**: Embed `@llm-gateway/client` into Claude Code 2. **Codex integration**: HTTP client in VS Code extension 3. **Copilot integration**: Plugin for GitHub Copilot 4. **ChatGPT integration**: Custom GPT with Gateway endpoint ## Support & Troubleshooting **If Gateway is slow**: - Check Ollama on Mac Studio: `curl http://192.168.178.213:11434/api/tags` - Check learning engine logs: `pm2 logs llm-learning` - Check CPU/memory on Erik: `ssh root@82.165.222.127 top` **If client can't reach Gateway**: - Check Cloudflare tunnel status: `ssh root@82.165.222.127 'ps aux | grep cloudflared'` - Check DNS: `dig llm-gateway.context-x.org` - Fall back to direct IP: client will auto-retry with 30s cooldown **If confidence scores are low**: - Check which validators are failing: `curl https://llm-gateway.context-x.org/v1/metrics | grep validation_failures` - Review recent audit logs: `SELECT * FROM audit_log LIMIT 20 ORDER BY created_at DESC;` --- **Questions?** Check `docs/adr/` for design rationale or contact René.