llm-gateway/packages/lightrag-sidecar/DEPLOYMENT_CHECKLIST.md

# LightRAG Sidecar Deployment Checklist

## Pre-Deployment Verification

### Local Development (Mac Studio)

- [ ] Python 3.10+ installed
- [ ] PostgreSQL running locally (`psql --version`)
- [ ] Qdrant running locally (`curl http://localhost:6333/health`)
- [ ] Ollama running with `qwen2.5:14b` model (`curl http://localhost:11434/api/tags`)
- [ ] Clone llm-gateway repo locally
- [ ] Create `.env` file from `.env.example`
- [ ] Install Python dependencies: `pip install -r requirements.txt`
- [ ] Run local database init: `python scripts/init_db.py`
- [ ] Start sidecar: `uvicorn app.main:app --reload`
- [ ] Test health endpoint: `curl http://localhost:3140/api/kg/health`
- [ ] Test query endpoint with test document

### Erik Server Deployment

#### Step 1: SSH Access
```bash
ssh erik@82.165.222.127
# or from local network: ssh erik@192.168.178.82
```

#### Step 2: Copy Files
```bash
# On local machine
scp -r packages/lightrag-sidecar/ erik@192.168.178.82:/opt/llm-gateway/packages/

# Or via rsync for large directories
rsync -avz packages/lightrag-sidecar/ erik@192.168.178.82:/opt/llm-gateway/packages/lightrag-sidecar/
```

#### Step 3: Setup Python Environment on Erik
```bash
cd /opt/llm-gateway/packages/lightrag-sidecar

# Create virtual environment
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install --upgrade pip
pip install -r requirements.txt

# Verify installations
python -c "import fastapi, sqlalchemy, sentence_transformers; print('OK')"
```

#### Step 4: Setup PostgreSQL on Erik
```bash
# Create database and user
sudo -u postgres psql << EOF
CREATE USER tip_kg WITH PASSWORD 'tip_secure_2026';
CREATE DATABASE tip_lightrag OWNER tip_kg;
GRANT ALL PRIVILEGES ON DATABASE tip_lightrag TO tip_kg;
EOF

# Initialize schema
python scripts/init_db.py

# Verify tables created
sudo -u postgres psql -d tip_lightrag -c "\dt"
```

#### Step 5: Setup Qdrant on Erik
```bash
# Qdrant should already be running on localhost:6333
# Verify connection
curl http://localhost:6333/health

# Create collections if needed (will be auto-created on first ingest)
# No manual action required
```

#### Step 6: Configure PM2
```bash
# Copy ecosystem config
cp ecosystem.config.cjs /opt/llm-gateway/

# Start sidecar with PM2
cd /opt/llm-gateway
pm2 start packages/lightrag-sidecar/ecosystem.config.cjs

# Verify running
pm2 status
pm2 logs lightrag-sidecar
```

#### Step 7: Setup Log Directories
```bash
sudo mkdir -p /var/log/lightrag-sidecar
sudo chown $(whoami):$(whoami) /var/log/lightrag-sidecar
```

#### Step 8: Configure Firewall (if needed)
```bash
# Allow port 3140 from local network
sudo ufw allow from 192.168.178.0/24 to any port 3140
# Or specific IP
sudo ufw allow from 192.168.178.213 to any port 3140
```

#### Step 9: Health Check on Erik
```bash
# SSH into Erik
curl http://localhost:3140/api/kg/health

# From local machine
curl http://192.168.178.82:3140/api/kg/health
```

#### Step 10: Bootstrap with TIP Data
```bash
# Set sidecar URL
export LIGHTRAG_SIDECAR_URL=http://localhost:3140

# Run bootstrap
python scripts/bootstrap_tip_data.py

# Monitor ingestion
pm2 logs lightrag-sidecar | grep "Job"
```

## Post-Deployment Verification

### Test Endpoints

```bash
# Health check
curl http://192.168.178.82:3140/api/kg/health

# Status
curl http://192.168.178.82:3140/api/kg/status

# Example query
curl -X POST http://192.168.178.82:3140/api/kg/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What 400G transceivers work with Cisco?",
    "domain": "transceiver",
    "top_k": 5
  }'

# List evaluation datasets
curl http://192.168.178.82:3140/api/kg/eval/datasets
```

### Verify Database

```bash
# Connect to PostgreSQL on Erik
psql -h localhost -U tip_kg -d tip_lightrag

# Check tables
\dt

# Check document count
SELECT COUNT(*) FROM documents;

# Check entities
SELECT COUNT(*) FROM entities;

# Check collection in Qdrant
curl http://localhost:6333/api/collections
```

### Monitoring

```bash
# Watch logs in real-time
pm2 logs lightrag-sidecar --lines 100 --follow

# Check PM2 process
pm2 show lightrag-sidecar

# Memory usage
pm2 monit
```

## Troubleshooting

### Connection Issues

**Problem**: Cannot reach sidecar from local machine
```bash
# Check if service is running
pm2 status

# Check if port is listening
ss -tulpn | grep 3140

# Check firewall
sudo ufw status
```

**Solution**:
```bash
# Restart service
pm2 restart lightrag-sidecar

# Check logs
pm2 logs lightrag-sidecar
```

### Database Issues

**Problem**: Database connection error
```bash
# Verify PostgreSQL is running
sudo systemctl status postgresql

# Check connection string
grep DATABASE_URL ecosystem.config.cjs

# Test connection
psql -h localhost -U tip_kg -d tip_lightrag -c "SELECT 1"
```

### Ollama Issues

**Problem**: Entity extraction timeouts
```bash
# Check Ollama status
curl http://192.168.178.213:11434/api/tags

# Check if model is loaded
ollama list

# Load model if missing
ollama pull qwen2.5:14b
```

### Qdrant Issues

**Problem**: Vector search not working
```bash
# Check Qdrant health
curl http://localhost:6333/health

# List collections
curl http://localhost:6333/api/collections

# Clear collection if corrupted
curl -X DELETE http://localhost:6333/api/collections/documents_transceiver
```

## Rollback

If deployment fails:

```bash
# Stop service
pm2 stop lightrag-sidecar

# Revert code
cd /opt/llm-gateway/packages/lightrag-sidecar
git checkout HEAD~1

# Clear problematic data
psql -U tip_kg -d tip_lightrag -c "TRUNCATE documents, entities, relations CASCADE;"

# Restart
pm2 restart lightrag-sidecar
```

## Performance Tuning

### Database Connection Pool
```env
DB_POOL_SIZE=10  # Increase for higher concurrency
```

### Worker Threads
```bash
# In ecosystem.config.cjs
args: 'app.main:app --host 0.0.0.0 --port 3140 --workers 4'  # Increase from 2
```

### Batch Size
```env
INGEST_BATCH_SIZE=20  # Larger batches = faster ingestion but more memory
```

### Embedding Cache
Consider caching bge-m3 embeddings to reduce recomputation.

## Success Criteria

- [ ] Service starts without errors (`pm2 status` shows "online")
- [ ] Health check passes all dependencies (postgresql, qdrant, ollama)
- [ ] Sample query returns results in <500ms
- [ ] Can ingest documents and see entities extracted
- [ ] Evaluation metrics calculate correctly
- [ ] Logs show no ERROR level messages
- [ ] Memory usage stays under 1GB
- [ ] Database contains ≥100 documents after bootstrap