llm-gateway/PHASE_2F_DEPLOYMENT.md

# Phase 2F Deployment Guide

**Date**: 2026-04-19
**Phase**: 2F — Multi-Agent Integration (Phase 2E + ADRs + Client Enhancements)
**Status**: Ready for deployment

## What's New in Phase 2F

### 1. Architecture Decision Records (ADRs)

Located in `docs/adr/`:

- **ADR-0001**: Multi-Agent Coworking Architecture with LLM Gateway Orchestrator
  - Gateway acts as central hub for Claude Code, Codex, Copilot, ChatGPT
  - Agents are clients, not workers
  - HTTP API for agent communication
  - Learning system improves routing for all agents collectively

- **ADR-0002**: Tier Assignment Strategy for Model Selection
  - Cost-first approach: fast → medium → large → external
  - Confidence-based escalation: if confidence <5 on fast, escalate to medium
  - Learning cycles (6h, 12h, 24h) auto-tune tier assignment

- **ADR-0003**: Confidence Gate Thresholds & Learning Cycle Intervals
  - Three-tier gating: 0–4 (pending_review), 4–7 (warning), 7–10 (approved)
  - Autonomous learning cycles adjust thresholds based on human review feedback
  - 6h (validators), 12h (thresholds), 24h (model assignments)

- **ADR-0004**: External Provider Fallback Chain Ordering
  - Cerebras → Groq → Mistral AI → NVIDIA NIM → Cloudflare Workers AI
  - Cost-first: prefer free tiers; paid APIs only as fallback
  - Rate-limit aware: automatic backoff and provider rotation

### 2. Enhanced Client SDK (`@llm-gateway/client`)

**New features**:

- **Offline Ollama fallback**: If Gateway is down, client automatically falls back to local Ollama
  - Attempts completion via Gateway first (via HTTP)
  - On failure, switches to local Ollama (http://192.168.178.213:11434)
  - Automatic retry with exponential backoff (max 3 attempts)
  - Health check caching: doesn't spam health checks after failure

- **Health status API**:
  - `client.getStatus()` returns: `{ gateway: bool, ollama: string, mode: 'gateway'|'fallback' }`
  - Useful for agent dashboards to show current routing mode

- **Graceful degradation**:
  - Client catches timeouts, network errors, and API errors
  - Returns meaningful error messages for debugging

**Code changes**:

```typescript
// Old: No fallback capability
const client = createTIPClient();
const result = await client.completion({...});
// Would throw if Gateway down

// New: Automatic fallback
const client = createTIPClient();
const result = await client.completion({...});
// Tries Gateway first, falls back to Ollama automatically
const status = client.getStatus();
console.log(status.mode); // 'gateway' or 'fallback'
```

### 3. Integration Tests

Added `tests/integration/claude-code-integration.test.ts`:

- Health checks
- Completion requests (code explanation, analysis, summarization)
- Offline fallback verification
- Rate limiting and SLA validation
- Error handling

Run tests:
```bash
npm run test:integration
# or filter:
npm run test -- claude-code-integration
```

### 4. Deployment Changes

No breaking changes to:
- Gateway API (`POST /v1/completion`, `/v1/classify`, etc.)
- Database schema
- Routing rules
- Configuration

**Additive changes**:
- `docs/adr/` directory with 4 ADRs
- Enhanced client SDK with fallback capability
- Integration test suite

## Pre-Deployment Checklist

- [ ] All TypeScript compiles cleanly (`npm run build`)
- [ ] Integration tests pass (`npm run test:integration`)
- [ ] Client SDK builds without errors
- [ ] Git branch is `main` with no uncommitted changes
- [ ] Gitea remote is configured (`git remote -v` shows origin → gitea.context-x.org)
- [ ] SSH key for Erik (root@82.165.222.127) is working
- [ ] Ollama is running on Mac Studio (http://192.168.178.213:11434 reachable)

## Deployment Steps

### Option 1: Automated Deploy (Recommended)

```bash
cd /Users/renefichtmueller/Desktop/Claude\ Code/llm-gateway
bash deploy/deploy.sh
```

This will:
1. Check prerequisites (npm, git, ssh, curl)
2. Run local build (`npm run build`)
3. Push `main` branch to Gitea
4. SSH into Erik and:
   - Pull latest code
   - Run `npm install`
   - Run `npm run build`
   - Reload PM2 processes (llm-gateway, llm-learning)
5. Wait for service restart and verify health
6. Return summary with status

**Expected output**:
```
Deploy successful!

  Commit:         a1b2c3d
  Direct:         http://82.165.222.127:3100/health
  Cloudflare:     https://llm-gateway.context-x.org/health
  PM2 status:     ssh root@82.165.222.127 'pm2 status'
  Logs:           ssh root@82.165.222.127 'pm2 logs llm-gateway'
```

### Option 2: Manual Deploy

If automated deploy fails, deploy manually on Erik:

```bash
ssh root@82.165.222.127

# On Erik:
cd /opt/llm-gateway
git fetch origin
git reset --hard origin/main
npm install
npm run build
pm2 reload llm-gateway llm-learning --update-env
pm2 status
pm2 logs llm-gateway --lines 20
```

## Post-Deployment Verification

### 1. Health Check

```bash
curl -s https://llm-gateway.context-x.org/health | jq .
# Expected output:
# { "status": "ok", "ollama": {...}, "queue": {...} }
```

### 2. Test Completion Endpoint

```bash
curl -X POST https://llm-gateway.context-x.org/v1/completion \
  -H "Content-Type: application/json" \
  -d '{
    "caller": "phase-2f-test",
    "task_type": "analysis",
    "input": "Test request for Phase 2F verification",
    "language": "en"
  }' | jq .
```

**Expected response**:
```json
{
  "id": "call-XXXX-YYYY",
  "status": "approved",
  "confidence": 7.5,
  "model": "qwen2.5:14b",
  "task_type": "analysis",
  "latency_ms": 1234,
  "tokens": { "in": 42, "out": 156 },
  "output": "Test request processed successfully..."
}
```

### 3. Verify Client Fallback

Test that client can fall back to Ollama when Gateway is unavailable:

```typescript
import { LLMGatewayClient } from '@llm-gateway/client';

const client = new LLMGatewayClient({
  caller: 'test',
  baseUrl: 'http://localhost:9999', // non-existent
  ollamaUrl: 'http://192.168.178.213:11434', // falls back to Mac Studio
  timeout: 30_000,
});

const result = await client.completion({
  task_type: 'fallback-test',
  input: 'Test fallback to Ollama',
});

console.log(result.status); // 'approved'
console.log(client.getStatus().mode); // 'fallback'
```

### 4. Check PM2 Processes

```bash
ssh root@82.165.222.127 pm2 status
# Expected: Both llm-gateway (PM2 id 19) and llm-learning (PM2 id 20) running
```

### 5. Monitor Logs

```bash
ssh root@82.165.222.127 pm2 logs llm-gateway --lines 50
# Check for errors, warnings, and learning cycle logs
```

## Rollback Plan

If Phase 2F deployment fails:

```bash
# On Erik:
cd /opt/llm-gateway
git reset --hard HEAD~1  # Go back to previous commit
npm run build
pm2 reload llm-gateway llm-learning --update-env
```

Or automated rollback:

```bash
# Locally:
git reset --hard HEAD~1
bash deploy/deploy.sh
```

## Monitoring After Deployment

**Key metrics to watch** (via Shield Dashboard or logs):

1. **Request latency**: Should be <1s for fast tier, 1-3s for medium
2. **Confidence score distribution**: Should stay relatively stable
3. **Review queue depth**: Should be <100 items
4. **Provider fallback rate**: Should be <5% (most requests via Gateway)
5. **Learning cycle execution**: Should see 6h/12h/24h cycle logs

**Alerts to configure**:

- If any request latency >10s
- If review queue >500 items
- If provider fallback rate >20%
- If confidence threshold drifts >0.5 in 24h
- If Gateway health check fails 3 times in a row

## Agent Integration Next Steps (Phase 2G)

Once Phase 2F is live:

1. **Claude Code integration**: Embed `@llm-gateway/client` into Claude Code
2. **Codex integration**: HTTP client in VS Code extension
3. **Copilot integration**: Plugin for GitHub Copilot
4. **ChatGPT integration**: Custom GPT with Gateway endpoint

## Support & Troubleshooting

**If Gateway is slow**:
- Check Ollama on Mac Studio: `curl http://192.168.178.213:11434/api/tags`
- Check learning engine logs: `pm2 logs llm-learning`
- Check CPU/memory on Erik: `ssh root@82.165.222.127 top`

**If client can't reach Gateway**:
- Check Cloudflare tunnel status: `ssh root@82.165.222.127 'ps aux | grep cloudflared'`
- Check DNS: `dig llm-gateway.context-x.org`
- Fall back to direct IP: client will auto-retry with 30s cooldown

**If confidence scores are low**:
- Check which validators are failing: `curl https://llm-gateway.context-x.org/v1/metrics | grep validation_failures`
- Review recent audit logs: `SELECT * FROM audit_log LIMIT 20 ORDER BY created_at DESC;`

---

**Questions?** Check `docs/adr/` for design rationale or contact René.