llm-gateway/PHASE_2F_DEPLOYMENT.md
Rene Fichtmueller 2ca77d0aee feat: Phase 2F — Multi-Agent Integration (ADRs + Client Fallback + Tests)
- ADR-0001: Multi-Agent Coworking Architecture with LLM Gateway Orchestrator
- ADR-0002: Tier Assignment Strategy for Model Selection (cost-first escalation)
- ADR-0003: Confidence Gate Thresholds & Learning Cycle Intervals (6h/12h/24h cycles)
- ADR-0004: External Provider Fallback Chain Ordering (Cerebras → Groq → Mistral)
- Enhanced client SDK: Offline Ollama fallback, health checks, exponential backoff retry
- Integration tests: claude-code-integration.test.ts (14 test cases)
- PHASE_2F_DEPLOYMENT.md: Pre-deployment checklist, automated deploy, rollback plan
- Post-deployment verification procedures for health, client fallback, metrics
2026-04-19 21:39:44 +02:00

8.3 KiB
Raw Permalink Blame History

Phase 2F Deployment Guide

Date: 2026-04-19
Phase: 2F — Multi-Agent Integration (Phase 2E + ADRs + Client Enhancements)
Status: Ready for deployment

What's New in Phase 2F

1. Architecture Decision Records (ADRs)

Located in docs/adr/:

  • ADR-0001: Multi-Agent Coworking Architecture with LLM Gateway Orchestrator

    • Gateway acts as central hub for Claude Code, Codex, Copilot, ChatGPT
    • Agents are clients, not workers
    • HTTP API for agent communication
    • Learning system improves routing for all agents collectively
  • ADR-0002: Tier Assignment Strategy for Model Selection

    • Cost-first approach: fast → medium → large → external
    • Confidence-based escalation: if confidence <5 on fast, escalate to medium
    • Learning cycles (6h, 12h, 24h) auto-tune tier assignment
  • ADR-0003: Confidence Gate Thresholds & Learning Cycle Intervals

    • Three-tier gating: 04 (pending_review), 47 (warning), 710 (approved)
    • Autonomous learning cycles adjust thresholds based on human review feedback
    • 6h (validators), 12h (thresholds), 24h (model assignments)
  • ADR-0004: External Provider Fallback Chain Ordering

    • Cerebras → Groq → Mistral AI → NVIDIA NIM → Cloudflare Workers AI
    • Cost-first: prefer free tiers; paid APIs only as fallback
    • Rate-limit aware: automatic backoff and provider rotation

2. Enhanced Client SDK (@llm-gateway/client)

New features:

  • Offline Ollama fallback: If Gateway is down, client automatically falls back to local Ollama

    • Attempts completion via Gateway first (via HTTP)
    • On failure, switches to local Ollama (http://192.168.178.213:11434)
    • Automatic retry with exponential backoff (max 3 attempts)
    • Health check caching: doesn't spam health checks after failure
  • Health status API:

    • client.getStatus() returns: { gateway: bool, ollama: string, mode: 'gateway'|'fallback' }
    • Useful for agent dashboards to show current routing mode
  • Graceful degradation:

    • Client catches timeouts, network errors, and API errors
    • Returns meaningful error messages for debugging

Code changes:

// Old: No fallback capability
const client = createTIPClient();
const result = await client.completion({...});
// Would throw if Gateway down

// New: Automatic fallback
const client = createTIPClient();
const result = await client.completion({...});
// Tries Gateway first, falls back to Ollama automatically
const status = client.getStatus();
console.log(status.mode); // 'gateway' or 'fallback'

3. Integration Tests

Added tests/integration/claude-code-integration.test.ts:

  • Health checks
  • Completion requests (code explanation, analysis, summarization)
  • Offline fallback verification
  • Rate limiting and SLA validation
  • Error handling

Run tests:

npm run test:integration
# or filter:
npm run test -- claude-code-integration

4. Deployment Changes

No breaking changes to:

  • Gateway API (POST /v1/completion, /v1/classify, etc.)
  • Database schema
  • Routing rules
  • Configuration

Additive changes:

  • docs/adr/ directory with 4 ADRs
  • Enhanced client SDK with fallback capability
  • Integration test suite

Pre-Deployment Checklist

  • All TypeScript compiles cleanly (npm run build)
  • Integration tests pass (npm run test:integration)
  • Client SDK builds without errors
  • Git branch is main with no uncommitted changes
  • Gitea remote is configured (git remote -v shows origin → gitea.context-x.org)
  • SSH key for Erik (root@82.165.222.127) is working
  • Ollama is running on Mac Studio (http://192.168.178.213:11434 reachable)

Deployment Steps

cd /Users/renefichtmueller/Desktop/Claude\ Code/llm-gateway
bash deploy/deploy.sh

This will:

  1. Check prerequisites (npm, git, ssh, curl)
  2. Run local build (npm run build)
  3. Push main branch to Gitea
  4. SSH into Erik and:
    • Pull latest code
    • Run npm install
    • Run npm run build
    • Reload PM2 processes (llm-gateway, llm-learning)
  5. Wait for service restart and verify health
  6. Return summary with status

Expected output:

Deploy successful!

  Commit:         a1b2c3d
  Direct:         http://82.165.222.127:3100/health
  Cloudflare:     https://llm-gateway.context-x.org/health
  PM2 status:     ssh root@82.165.222.127 'pm2 status'
  Logs:           ssh root@82.165.222.127 'pm2 logs llm-gateway'

Option 2: Manual Deploy

If automated deploy fails, deploy manually on Erik:

ssh root@82.165.222.127

# On Erik:
cd /opt/llm-gateway
git fetch origin
git reset --hard origin/main
npm install
npm run build
pm2 reload llm-gateway llm-learning --update-env
pm2 status
pm2 logs llm-gateway --lines 20

Post-Deployment Verification

1. Health Check

curl -s https://llm-gateway.context-x.org/health | jq .
# Expected output:
# { "status": "ok", "ollama": {...}, "queue": {...} }

2. Test Completion Endpoint

curl -X POST https://llm-gateway.context-x.org/v1/completion \
  -H "Content-Type: application/json" \
  -d '{
    "caller": "phase-2f-test",
    "task_type": "analysis",
    "input": "Test request for Phase 2F verification",
    "language": "en"
  }' | jq .

Expected response:

{
  "id": "call-XXXX-YYYY",
  "status": "approved",
  "confidence": 7.5,
  "model": "qwen2.5:14b",
  "task_type": "analysis",
  "latency_ms": 1234,
  "tokens": { "in": 42, "out": 156 },
  "output": "Test request processed successfully..."
}

3. Verify Client Fallback

Test that client can fall back to Ollama when Gateway is unavailable:

import { LLMGatewayClient } from '@llm-gateway/client';

const client = new LLMGatewayClient({
  caller: 'test',
  baseUrl: 'http://localhost:9999', // non-existent
  ollamaUrl: 'http://192.168.178.213:11434', // falls back to Mac Studio
  timeout: 30_000,
});

const result = await client.completion({
  task_type: 'fallback-test',
  input: 'Test fallback to Ollama',
});

console.log(result.status); // 'approved'
console.log(client.getStatus().mode); // 'fallback'

4. Check PM2 Processes

ssh root@82.165.222.127 pm2 status
# Expected: Both llm-gateway (PM2 id 19) and llm-learning (PM2 id 20) running

5. Monitor Logs

ssh root@82.165.222.127 pm2 logs llm-gateway --lines 50
# Check for errors, warnings, and learning cycle logs

Rollback Plan

If Phase 2F deployment fails:

# On Erik:
cd /opt/llm-gateway
git reset --hard HEAD~1  # Go back to previous commit
npm run build
pm2 reload llm-gateway llm-learning --update-env

Or automated rollback:

# Locally:
git reset --hard HEAD~1
bash deploy/deploy.sh

Monitoring After Deployment

Key metrics to watch (via Shield Dashboard or logs):

  1. Request latency: Should be <1s for fast tier, 1-3s for medium
  2. Confidence score distribution: Should stay relatively stable
  3. Review queue depth: Should be <100 items
  4. Provider fallback rate: Should be <5% (most requests via Gateway)
  5. Learning cycle execution: Should see 6h/12h/24h cycle logs

Alerts to configure:

  • If any request latency >10s
  • If review queue >500 items
  • If provider fallback rate >20%
  • If confidence threshold drifts >0.5 in 24h
  • If Gateway health check fails 3 times in a row

Agent Integration Next Steps (Phase 2G)

Once Phase 2F is live:

  1. Claude Code integration: Embed @llm-gateway/client into Claude Code
  2. Codex integration: HTTP client in VS Code extension
  3. Copilot integration: Plugin for GitHub Copilot
  4. ChatGPT integration: Custom GPT with Gateway endpoint

Support & Troubleshooting

If Gateway is slow:

  • Check Ollama on Mac Studio: curl http://192.168.178.213:11434/api/tags
  • Check learning engine logs: pm2 logs llm-learning
  • Check CPU/memory on Erik: ssh root@82.165.222.127 top

If client can't reach Gateway:

  • Check Cloudflare tunnel status: ssh root@82.165.222.127 'ps aux | grep cloudflared'
  • Check DNS: dig llm-gateway.context-x.org
  • Fall back to direct IP: client will auto-retry with 30s cooldown

If confidence scores are low:

  • Check which validators are failing: curl https://llm-gateway.context-x.org/v1/metrics | grep validation_failures
  • Review recent audit logs: SELECT * FROM audit_log LIMIT 20 ORDER BY created_at DESC;

Questions? Check docs/adr/ for design rationale or contact René.