llm-gateway/packages/lightrag-sidecar/TESTING.md

# LightRAG Sidecar Testing Guide

## Prerequisites

Ensure all services are running locally:

```bash
# PostgreSQL (verify running)
psql --version
psql -l | grep tip_lightrag

# Qdrant (verify running)
curl http://localhost:6333/health

# Ollama (verify running)
curl http://localhost:11434/api/tags | grep qwen2.5

# Sidecar (if not starting fresh)
ps aux | grep uvicorn
```

## Local Setup

### 1. Initialize Database

```bash
cd /Users/renefichtmueller/Desktop/Claude\ Code/llm-gateway/packages/lightrag-sidecar

# Create virtual environment (if needed)
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Initialize database and schema
python scripts/init_db.py
```

**Expected output:**
```
Creating database 'tip_lightrag'...
✓ Database created (or already exists)
Initializing schema...
✓ Tables created: entities, relations, documents, query_logs, evaluation_results
```

### 2. Start Sidecar

```bash
# Start with auto-reload for development
uvicorn app.main:app --host 0.0.0.0 --port 3140 --reload
```

**Expected output:**
```
INFO:     Uvicorn running on http://0.0.0.0:3140
INFO:     Application startup complete
```

## Testing Workflow

### Phase 1: Health & Dependency Check

Verify all dependencies are working:

```bash
curl http://localhost:3140/api/kg/health
```

**Expected response:**
```json
{
  "status": "healthy",
  "dependencies": {
    "postgresql": "healthy",
    "qdrant": "healthy",
    "ollama": "healthy"
  },
  "latencies_ms": {
    "postgresql": 5,
    "qdrant": 8,
    "ollama": 45
  }
}
```

### Phase 2: Document Ingestion

Test the ingestion pipeline with sample documents:

```bash
curl -X POST http://localhost:3140/api/kg/ingest \
  -H "Content-Type: application/json" \
  -d '{
    "domain": "transceiver",
    "documents": [
      {
        "title": "400G Transceiver Overview",
        "content": "400 gigabit per second transceivers are optical modules that transmit and receive data at 400 Gbps. Common form factors include QSFP-DD and OSFP. 400G transceivers use PAM4 modulation to achieve high speeds. Standard transmission distances range from 300m (DR4) to 10km (LR4) to 40km (ER4).",
        "source": "blog",
        "metadata": {}
      },
      {
        "title": "QSFP-DD vs OSFP",
        "content": "QSFP-DD (Quad Small Form-factor Pluggable Double Density) supports up to 400G over 8 lanes. OSFP (Octal Small Form-factor Pluggable) supports up to 800G over 8 lanes. Both are hot-swappable. Cisco and Arista prefer QSFP-DD, while Juniper and Infinera prefer OSFP. Compatibility between them is not guaranteed.",
        "source": "blog",
        "metadata": {}
      },
      {
        "title": "Transceiver Power Consumption",
        "content": "Modern 400G transceivers typically consume 5-8 watts. DR4 variants are more power-efficient at 5W, while ER4 variants consume up to 8W due to additional signal processing. Data center cooling requirements increase by 2-3% with 400G deployment at scale. Power budgets should be verified during capacity planning.",
        "source": "blog",
        "metadata": {}
      }
    ],
    "batch_size": 3
  }'
```

**Expected response:**
```json
{
  "job_id": "ingest-20260425-001",
  "status": "queued",
  "documents_submitted": 3,
  "estimated_time_sec": 5
}
```

Monitor ingestion progress:

```bash
# Check job status
curl http://localhost:3140/api/kg/ingest/status/ingest-20260425-001
```

**Expected response after completion:**
```json
{
  "job_id": "ingest-20260425-001",
  "status": "completed",
  "documents_processed": 3,
  "documents_failed": 0,
  "entities_extracted": 12,
  "entities_linked": 8,
  "timestamp": "2026-04-25T10:30:00Z"
}
```

### Phase 3: Hybrid Retrieval Testing

Test the query endpoint with various queries:

#### Query 1: Standard retrieval

```bash
curl -X POST http://localhost:3140/api/kg/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What are the differences between 400G transceiver form factors?",
    "domain": "transceiver",
    "top_k": 5,
    "entity_links": true,
    "min_relevance": 0.3
  }'
```

**Expected behavior:**
- Should return 2-3 relevant documents from ingestion (QSFP-DD vs OSFP doc)
- relevance_score should range from 0.6-0.9 for relevant docs
- Latency should be <500ms
- Should extract entities like "QSFP-DD", "OSFP", "400G"

#### Query 2: Semantic search

```bash
curl -X POST http://localhost:3140/api/kg/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Power efficiency and thermal requirements for high-speed optics",
    "domain": "transceiver",
    "top_k": 5,
    "entity_links": false,
    "min_relevance": 0.4
  }'
```

**Expected behavior:**
- Should retrieve the Power Consumption document via semantic similarity
- BM25 ranking may be lower (no keyword match) but RRF fusion should rank it high
- Demonstrates hybrid approach effectiveness

#### Query 3: Edge case - no results

```bash
curl -X POST http://localhost:3140/api/kg/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What is quantum computing?",
    "domain": "transceiver",
    "top_k": 5
  }'
```

**Expected response:**
```json
{
  "results": [],
  "entities": [],
  "total_results": 0,
  "latency_ms": 50
}
```

### Phase 4: Entity Extraction Verification

Check extracted entities in database:

```bash
psql -h localhost -U tip_kg -d tip_lightrag << EOF
SELECT id, name, entity_type, confidence
FROM entities
WHERE domain = 'transceiver'
LIMIT 10;
EOF
```

**Expected output:**
```
                   id                   |  name   | entity_type | confidence
----------------------------------------+---------+-------------+------------
 550e8400-e29b-41d4-a716-446655440000   | 400G    | transceiver | 0.92
 550e8400-e29b-41d4-a716-446655440001   | QSFP-DD | standard    | 0.89
 550e8400-e29b-41d4-a716-446655440002   | Cisco   | vendor      | 0.95
```

### Phase 5: Evaluation Metrics

Run evaluation against sample queries:

```bash
curl -X POST http://localhost:3140/api/kg/eval \
  -H "Content-Type: application/json" \
  -d '{
    "domain": "transceiver",
    "eval_set": "transceiver-test",
    "queries": [
      {
        "query": "What is QSFP-DD?",
        "ground_truth_doc_ids": ["<UUID-from-ingestion>"]
      },
      {
        "query": "How much power do 400G transceivers consume?",
        "ground_truth_doc_ids": ["<UUID-from-ingestion>"]
      }
    ],
    "metrics": ["precision@5", "recall@10", "mrr@5", "ndcg@10"],
    "compare_to": "baseline_fts"
  }'
```

**Expected response:**
```json
{
  "eval_set": "transceiver-test",
  "domain": "transceiver",
  "metrics": [
    {
      "metric": "precision@5",
      "value": 0.8,
      "baseline_value": 0.65,
      "improvement_pct": 23.1
    },
    ...
  ],
  "total_queries": 2,
  "latency_p95_ms": 234
}
```

## Populating Evaluation Set

Once documents are ingested and queries are tested, populate the full evaluation set:

```bash
# Start sidecar in one terminal
uvicorn app.main:app --host 0.0.0.0 --port 3140 --reload

# In another terminal, run population script
cd /Users/renefichtmueller/Desktop/Claude\ Code/llm-gateway/packages/lightrag-sidecar
python scripts/populate_eval_set.py
```

**Workflow:**
1. Script runs each query in `eval-transceiver-50qa.json`
2. For each query, it shows suggested document IDs from retrieval results
3. You verify/correct the ground truth (y/n/edit)
4. Script saves updated evaluation set with ground_truth_doc_ids populated

## Troubleshooting

### Issue: "Cannot connect to PostgreSQL"

```bash
# Verify PostgreSQL is running
sudo systemctl status postgresql

# Check connection string
echo $DATABASE_URL

# Test connection
psql $DATABASE_URL -c "SELECT 1"
```

### Issue: "Ollama timeouts during entity extraction"

```bash
# Verify Ollama is responding
curl http://192.168.178.213:11434/api/tags

# Check if model is loaded
ollama list

# Reload model if needed
ollama run qwen2.5:14b
```

### Issue: "Qdrant connection refused"

```bash
# Verify Qdrant is running
curl http://localhost:6333/health

# List collections
curl http://localhost:6333/api/collections

# Start Qdrant if not running
docker run -p 6333:6333 qdrant/qdrant:latest
```

### Issue: "Entity extraction returns empty"

Check Ollama logs:
```bash
# Monitor Ollama
tail -f ~/.ollama/logs/server.log

# Test Ollama directly
curl http://192.168.178.213:11434/api/generate \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen2.5:14b",
    "prompt": "Extract entities from: 400G QSFP-DD transceivers from Cisco",
    "stream": false
  }'
```

## Performance Validation

### Query Latency Benchmark

```bash
# Run 100 queries and measure latency
for i in {1..100}; do
  curl -s -X POST http://localhost:3140/api/kg/query \
    -H "Content-Type: application/json" \
    -d '{"query": "400G transceiver", "domain": "transceiver", "top_k": 5}' \
    | jq '.latency_ms'
done | awk '{sum+=$1; n++} END {print "Avg latency:", sum/n, "ms"}'
```

**Expected result:** Average latency <200ms

### Recall@10 Baseline

After populating evaluation set, run full evaluation:

```bash
python scripts/populate_eval_set.py  # Ensures all docs are in ground_truth

curl -X POST http://localhost:3140/api/kg/eval \
  -H "Content-Type: application/json" \
  -d '{
    "domain": "transceiver",
    "eval_set": "transceiver-50qa",
    "queries": "<load from eval-transceiver-50qa.json>",
    "metrics": ["precision@5", "recall@10", "mrr@5", "ndcg@10"],
    "compare_to": "baseline_fts"
  }'
```

**Target metrics:**
- Precision@5: ≥0.80 (vs 0.65 baseline)
- Recall@10: ≥0.85 (vs 0.72 baseline)
- MRR@5: ≥0.75 (vs 0.58 baseline)
- NDCG@10: ≥0.80 (vs 0.70 baseline)

## Cleanup Between Tests

```bash
# Clear all data and restart fresh
psql -U tip_kg -d tip_lightrag << EOF
TRUNCATE documents, entities, relations, query_logs, evaluation_results CASCADE;
EOF

# Clear Qdrant collections
curl -X DELETE http://localhost:6333/api/collections/documents_transceiver

# Restart sidecar
# (stop and start uvicorn)
```

## Next: Erik Deployment

Once local testing passes all checks:

1. Verify all tests pass
2. Commit changes to Gitea
3. Follow DEPLOYMENT_CHECKLIST.md for Erik deployment
4. Monitor logs: `pm2 logs lightrag-sidecar`