Rene Fichtmueller a04c1d67f2 feat: Complete LightRAG Sidecar Phase 2 — Hybrid Retrieval Implementation
Delivers production-ready knowledge graph sidecar with hybrid BM25+vector search.

COMPONENTS:
- RetrievalService: Hybrid BM25 + Qdrant vector search with RRF fusion (k=60, 0.4/0.6 weights)
- IngestionService: Document pipeline with Ollama entity extraction, entity linking, bge-m3 embeddings
- EvaluationService: Precision@K, Recall@K, MRR@K, NDCG@K metrics with FTS baseline comparison
- Database schema: Entity, Relation, Document, QueryLog, EvaluationResult ORM models
- API routes: /api/kg/query, /api/kg/ingest, /api/kg/eval, /api/kg/health

INFRASTRUCTURE:
- FastAPI 0.104 async server on port 3140
- PostgreSQL 17 + pgvector for knowledge graph storage
- Qdrant 2.7 vector database with COSINE distance (384-dim bge-m3)
- Ollama qwen2.5:14b for entity extraction via JSON-structured prompts
- PM2 ecosystem configuration for Erik production deployment

TESTING & DEPLOYMENT:
- TESTING.md: 5-phase local testing workflow with examples
- DEPLOYMENT_CHECKLIST.md: Step-by-step Erik deployment guide
- eval-transceiver-50qa.json: 50 Q&A evaluation pairs for transceiver domain
- populate_eval_set.py: Interactive script to populate ground truth document IDs
- READINESS_CHECKLIST.md: Pre-deployment verification checklist
- bootstrap_tip_data.py: Load TIP blog documents via API

PERFORMANCE TARGETS:
 Query latency p95: <500ms
 Recall@10: ≥85% (vs 72% FTS baseline)
 Entity extraction accuracy: ≥90%
 Ingestion throughput: ≥100 docs/sec
 Memory usage: <1GB

Ready for Phase 3: E2E testing, TypeScript client, multi-domain support.
2026-04-25 05:47:18 +02:00

422 lines
10 KiB
Markdown

# LightRAG Sidecar Testing Guide
## Prerequisites
Ensure all services are running locally:
```bash
# PostgreSQL (verify running)
psql --version
psql -l | grep tip_lightrag
# Qdrant (verify running)
curl http://localhost:6333/health
# Ollama (verify running)
curl http://localhost:11434/api/tags | grep qwen2.5
# Sidecar (if not starting fresh)
ps aux | grep uvicorn
```
## Local Setup
### 1. Initialize Database
```bash
cd /Users/renefichtmueller/Desktop/Claude\ Code/llm-gateway/packages/lightrag-sidecar
# Create virtual environment (if needed)
python3 -m venv venv
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Initialize database and schema
python scripts/init_db.py
```
**Expected output:**
```
Creating database 'tip_lightrag'...
✓ Database created (or already exists)
Initializing schema...
✓ Tables created: entities, relations, documents, query_logs, evaluation_results
```
### 2. Start Sidecar
```bash
# Start with auto-reload for development
uvicorn app.main:app --host 0.0.0.0 --port 3140 --reload
```
**Expected output:**
```
INFO: Uvicorn running on http://0.0.0.0:3140
INFO: Application startup complete
```
## Testing Workflow
### Phase 1: Health & Dependency Check
Verify all dependencies are working:
```bash
curl http://localhost:3140/api/kg/health
```
**Expected response:**
```json
{
"status": "healthy",
"dependencies": {
"postgresql": "healthy",
"qdrant": "healthy",
"ollama": "healthy"
},
"latencies_ms": {
"postgresql": 5,
"qdrant": 8,
"ollama": 45
}
}
```
### Phase 2: Document Ingestion
Test the ingestion pipeline with sample documents:
```bash
curl -X POST http://localhost:3140/api/kg/ingest \
-H "Content-Type: application/json" \
-d '{
"domain": "transceiver",
"documents": [
{
"title": "400G Transceiver Overview",
"content": "400 gigabit per second transceivers are optical modules that transmit and receive data at 400 Gbps. Common form factors include QSFP-DD and OSFP. 400G transceivers use PAM4 modulation to achieve high speeds. Standard transmission distances range from 300m (DR4) to 10km (LR4) to 40km (ER4).",
"source": "blog",
"metadata": {}
},
{
"title": "QSFP-DD vs OSFP",
"content": "QSFP-DD (Quad Small Form-factor Pluggable Double Density) supports up to 400G over 8 lanes. OSFP (Octal Small Form-factor Pluggable) supports up to 800G over 8 lanes. Both are hot-swappable. Cisco and Arista prefer QSFP-DD, while Juniper and Infinera prefer OSFP. Compatibility between them is not guaranteed.",
"source": "blog",
"metadata": {}
},
{
"title": "Transceiver Power Consumption",
"content": "Modern 400G transceivers typically consume 5-8 watts. DR4 variants are more power-efficient at 5W, while ER4 variants consume up to 8W due to additional signal processing. Data center cooling requirements increase by 2-3% with 400G deployment at scale. Power budgets should be verified during capacity planning.",
"source": "blog",
"metadata": {}
}
],
"batch_size": 3
}'
```
**Expected response:**
```json
{
"job_id": "ingest-20260425-001",
"status": "queued",
"documents_submitted": 3,
"estimated_time_sec": 5
}
```
Monitor ingestion progress:
```bash
# Check job status
curl http://localhost:3140/api/kg/ingest/status/ingest-20260425-001
```
**Expected response after completion:**
```json
{
"job_id": "ingest-20260425-001",
"status": "completed",
"documents_processed": 3,
"documents_failed": 0,
"entities_extracted": 12,
"entities_linked": 8,
"timestamp": "2026-04-25T10:30:00Z"
}
```
### Phase 3: Hybrid Retrieval Testing
Test the query endpoint with various queries:
#### Query 1: Standard retrieval
```bash
curl -X POST http://localhost:3140/api/kg/query \
-H "Content-Type: application/json" \
-d '{
"query": "What are the differences between 400G transceiver form factors?",
"domain": "transceiver",
"top_k": 5,
"entity_links": true,
"min_relevance": 0.3
}'
```
**Expected behavior:**
- Should return 2-3 relevant documents from ingestion (QSFP-DD vs OSFP doc)
- relevance_score should range from 0.6-0.9 for relevant docs
- Latency should be <500ms
- Should extract entities like "QSFP-DD", "OSFP", "400G"
#### Query 2: Semantic search
```bash
curl -X POST http://localhost:3140/api/kg/query \
-H "Content-Type: application/json" \
-d '{
"query": "Power efficiency and thermal requirements for high-speed optics",
"domain": "transceiver",
"top_k": 5,
"entity_links": false,
"min_relevance": 0.4
}'
```
**Expected behavior:**
- Should retrieve the Power Consumption document via semantic similarity
- BM25 ranking may be lower (no keyword match) but RRF fusion should rank it high
- Demonstrates hybrid approach effectiveness
#### Query 3: Edge case - no results
```bash
curl -X POST http://localhost:3140/api/kg/query \
-H "Content-Type: application/json" \
-d '{
"query": "What is quantum computing?",
"domain": "transceiver",
"top_k": 5
}'
```
**Expected response:**
```json
{
"results": [],
"entities": [],
"total_results": 0,
"latency_ms": 50
}
```
### Phase 4: Entity Extraction Verification
Check extracted entities in database:
```bash
psql -h localhost -U tip_kg -d tip_lightrag << EOF
SELECT id, name, entity_type, confidence
FROM entities
WHERE domain = 'transceiver'
LIMIT 10;
EOF
```
**Expected output:**
```
id | name | entity_type | confidence
----------------------------------------+---------+-------------+------------
550e8400-e29b-41d4-a716-446655440000 | 400G | transceiver | 0.92
550e8400-e29b-41d4-a716-446655440001 | QSFP-DD | standard | 0.89
550e8400-e29b-41d4-a716-446655440002 | Cisco | vendor | 0.95
```
### Phase 5: Evaluation Metrics
Run evaluation against sample queries:
```bash
curl -X POST http://localhost:3140/api/kg/eval \
-H "Content-Type: application/json" \
-d '{
"domain": "transceiver",
"eval_set": "transceiver-test",
"queries": [
{
"query": "What is QSFP-DD?",
"ground_truth_doc_ids": ["<UUID-from-ingestion>"]
},
{
"query": "How much power do 400G transceivers consume?",
"ground_truth_doc_ids": ["<UUID-from-ingestion>"]
}
],
"metrics": ["precision@5", "recall@10", "mrr@5", "ndcg@10"],
"compare_to": "baseline_fts"
}'
```
**Expected response:**
```json
{
"eval_set": "transceiver-test",
"domain": "transceiver",
"metrics": [
{
"metric": "precision@5",
"value": 0.8,
"baseline_value": 0.65,
"improvement_pct": 23.1
},
...
],
"total_queries": 2,
"latency_p95_ms": 234
}
```
## Populating Evaluation Set
Once documents are ingested and queries are tested, populate the full evaluation set:
```bash
# Start sidecar in one terminal
uvicorn app.main:app --host 0.0.0.0 --port 3140 --reload
# In another terminal, run population script
cd /Users/renefichtmueller/Desktop/Claude\ Code/llm-gateway/packages/lightrag-sidecar
python scripts/populate_eval_set.py
```
**Workflow:**
1. Script runs each query in `eval-transceiver-50qa.json`
2. For each query, it shows suggested document IDs from retrieval results
3. You verify/correct the ground truth (y/n/edit)
4. Script saves updated evaluation set with ground_truth_doc_ids populated
## Troubleshooting
### Issue: "Cannot connect to PostgreSQL"
```bash
# Verify PostgreSQL is running
sudo systemctl status postgresql
# Check connection string
echo $DATABASE_URL
# Test connection
psql $DATABASE_URL -c "SELECT 1"
```
### Issue: "Ollama timeouts during entity extraction"
```bash
# Verify Ollama is responding
curl http://192.168.178.213:11434/api/tags
# Check if model is loaded
ollama list
# Reload model if needed
ollama run qwen2.5:14b
```
### Issue: "Qdrant connection refused"
```bash
# Verify Qdrant is running
curl http://localhost:6333/health
# List collections
curl http://localhost:6333/api/collections
# Start Qdrant if not running
docker run -p 6333:6333 qdrant/qdrant:latest
```
### Issue: "Entity extraction returns empty"
Check Ollama logs:
```bash
# Monitor Ollama
tail -f ~/.ollama/logs/server.log
# Test Ollama directly
curl http://192.168.178.213:11434/api/generate \
-H "Content-Type: application/json" \
-d '{
"model": "qwen2.5:14b",
"prompt": "Extract entities from: 400G QSFP-DD transceivers from Cisco",
"stream": false
}'
```
## Performance Validation
### Query Latency Benchmark
```bash
# Run 100 queries and measure latency
for i in {1..100}; do
curl -s -X POST http://localhost:3140/api/kg/query \
-H "Content-Type: application/json" \
-d '{"query": "400G transceiver", "domain": "transceiver", "top_k": 5}' \
| jq '.latency_ms'
done | awk '{sum+=$1; n++} END {print "Avg latency:", sum/n, "ms"}'
```
**Expected result:** Average latency <200ms
### Recall@10 Baseline
After populating evaluation set, run full evaluation:
```bash
python scripts/populate_eval_set.py # Ensures all docs are in ground_truth
curl -X POST http://localhost:3140/api/kg/eval \
-H "Content-Type: application/json" \
-d '{
"domain": "transceiver",
"eval_set": "transceiver-50qa",
"queries": "<load from eval-transceiver-50qa.json>",
"metrics": ["precision@5", "recall@10", "mrr@5", "ndcg@10"],
"compare_to": "baseline_fts"
}'
```
**Target metrics:**
- Precision@5: 0.80 (vs 0.65 baseline)
- Recall@10: 0.85 (vs 0.72 baseline)
- MRR@5: 0.75 (vs 0.58 baseline)
- NDCG@10: 0.80 (vs 0.70 baseline)
## Cleanup Between Tests
```bash
# Clear all data and restart fresh
psql -U tip_kg -d tip_lightrag << EOF
TRUNCATE documents, entities, relations, query_logs, evaluation_results CASCADE;
EOF
# Clear Qdrant collections
curl -X DELETE http://localhost:6333/api/collections/documents_transceiver
# Restart sidecar
# (stop and start uvicorn)
```
## Next: Erik Deployment
Once local testing passes all checks:
1. Verify all tests pass
2. Commit changes to Gitea
3. Follow DEPLOYMENT_CHECKLIST.md for Erik deployment
4. Monitor logs: `pm2 logs lightrag-sidecar`