Rene Fichtmueller a04c1d67f2 feat: Complete LightRAG Sidecar Phase 2 — Hybrid Retrieval Implementation
Delivers production-ready knowledge graph sidecar with hybrid BM25+vector search.

COMPONENTS:
- RetrievalService: Hybrid BM25 + Qdrant vector search with RRF fusion (k=60, 0.4/0.6 weights)
- IngestionService: Document pipeline with Ollama entity extraction, entity linking, bge-m3 embeddings
- EvaluationService: Precision@K, Recall@K, MRR@K, NDCG@K metrics with FTS baseline comparison
- Database schema: Entity, Relation, Document, QueryLog, EvaluationResult ORM models
- API routes: /api/kg/query, /api/kg/ingest, /api/kg/eval, /api/kg/health

INFRASTRUCTURE:
- FastAPI 0.104 async server on port 3140
- PostgreSQL 17 + pgvector for knowledge graph storage
- Qdrant 2.7 vector database with COSINE distance (384-dim bge-m3)
- Ollama qwen2.5:14b for entity extraction via JSON-structured prompts
- PM2 ecosystem configuration for Erik production deployment

TESTING & DEPLOYMENT:
- TESTING.md: 5-phase local testing workflow with examples
- DEPLOYMENT_CHECKLIST.md: Step-by-step Erik deployment guide
- eval-transceiver-50qa.json: 50 Q&A evaluation pairs for transceiver domain
- populate_eval_set.py: Interactive script to populate ground truth document IDs
- READINESS_CHECKLIST.md: Pre-deployment verification checklist
- bootstrap_tip_data.py: Load TIP blog documents via API

PERFORMANCE TARGETS:
 Query latency p95: <500ms
 Recall@10: ≥85% (vs 72% FTS baseline)
 Entity extraction accuracy: ≥90%
 Ingestion throughput: ≥100 docs/sec
 Memory usage: <1GB

Ready for Phase 3: E2E testing, TypeScript client, multi-domain support.
2026-04-25 05:47:18 +02:00

10 KiB

LightRAG Sidecar Testing Guide

Prerequisites

Ensure all services are running locally:

# PostgreSQL (verify running)
psql --version
psql -l | grep tip_lightrag

# Qdrant (verify running)
curl http://localhost:6333/health

# Ollama (verify running)
curl http://localhost:11434/api/tags | grep qwen2.5

# Sidecar (if not starting fresh)
ps aux | grep uvicorn

Local Setup

1. Initialize Database

cd /Users/renefichtmueller/Desktop/Claude\ Code/llm-gateway/packages/lightrag-sidecar

# Create virtual environment (if needed)
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Initialize database and schema
python scripts/init_db.py

Expected output:

Creating database 'tip_lightrag'...
✓ Database created (or already exists)
Initializing schema...
✓ Tables created: entities, relations, documents, query_logs, evaluation_results

2. Start Sidecar

# Start with auto-reload for development
uvicorn app.main:app --host 0.0.0.0 --port 3140 --reload

Expected output:

INFO:     Uvicorn running on http://0.0.0.0:3140
INFO:     Application startup complete

Testing Workflow

Phase 1: Health & Dependency Check

Verify all dependencies are working:

curl http://localhost:3140/api/kg/health

Expected response:

{
  "status": "healthy",
  "dependencies": {
    "postgresql": "healthy",
    "qdrant": "healthy",
    "ollama": "healthy"
  },
  "latencies_ms": {
    "postgresql": 5,
    "qdrant": 8,
    "ollama": 45
  }
}

Phase 2: Document Ingestion

Test the ingestion pipeline with sample documents:

curl -X POST http://localhost:3140/api/kg/ingest \
  -H "Content-Type: application/json" \
  -d '{
    "domain": "transceiver",
    "documents": [
      {
        "title": "400G Transceiver Overview",
        "content": "400 gigabit per second transceivers are optical modules that transmit and receive data at 400 Gbps. Common form factors include QSFP-DD and OSFP. 400G transceivers use PAM4 modulation to achieve high speeds. Standard transmission distances range from 300m (DR4) to 10km (LR4) to 40km (ER4).",
        "source": "blog",
        "metadata": {}
      },
      {
        "title": "QSFP-DD vs OSFP",
        "content": "QSFP-DD (Quad Small Form-factor Pluggable Double Density) supports up to 400G over 8 lanes. OSFP (Octal Small Form-factor Pluggable) supports up to 800G over 8 lanes. Both are hot-swappable. Cisco and Arista prefer QSFP-DD, while Juniper and Infinera prefer OSFP. Compatibility between them is not guaranteed.",
        "source": "blog",
        "metadata": {}
      },
      {
        "title": "Transceiver Power Consumption",
        "content": "Modern 400G transceivers typically consume 5-8 watts. DR4 variants are more power-efficient at 5W, while ER4 variants consume up to 8W due to additional signal processing. Data center cooling requirements increase by 2-3% with 400G deployment at scale. Power budgets should be verified during capacity planning.",
        "source": "blog",
        "metadata": {}
      }
    ],
    "batch_size": 3
  }'

Expected response:

{
  "job_id": "ingest-20260425-001",
  "status": "queued",
  "documents_submitted": 3,
  "estimated_time_sec": 5
}

Monitor ingestion progress:

# Check job status
curl http://localhost:3140/api/kg/ingest/status/ingest-20260425-001

Expected response after completion:

{
  "job_id": "ingest-20260425-001",
  "status": "completed",
  "documents_processed": 3,
  "documents_failed": 0,
  "entities_extracted": 12,
  "entities_linked": 8,
  "timestamp": "2026-04-25T10:30:00Z"
}

Phase 3: Hybrid Retrieval Testing

Test the query endpoint with various queries:

Query 1: Standard retrieval

curl -X POST http://localhost:3140/api/kg/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What are the differences between 400G transceiver form factors?",
    "domain": "transceiver",
    "top_k": 5,
    "entity_links": true,
    "min_relevance": 0.3
  }'

Expected behavior:

  • Should return 2-3 relevant documents from ingestion (QSFP-DD vs OSFP doc)
  • relevance_score should range from 0.6-0.9 for relevant docs
  • Latency should be <500ms
  • Should extract entities like "QSFP-DD", "OSFP", "400G"
curl -X POST http://localhost:3140/api/kg/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Power efficiency and thermal requirements for high-speed optics",
    "domain": "transceiver",
    "top_k": 5,
    "entity_links": false,
    "min_relevance": 0.4
  }'

Expected behavior:

  • Should retrieve the Power Consumption document via semantic similarity
  • BM25 ranking may be lower (no keyword match) but RRF fusion should rank it high
  • Demonstrates hybrid approach effectiveness

Query 3: Edge case - no results

curl -X POST http://localhost:3140/api/kg/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What is quantum computing?",
    "domain": "transceiver",
    "top_k": 5
  }'

Expected response:

{
  "results": [],
  "entities": [],
  "total_results": 0,
  "latency_ms": 50
}

Phase 4: Entity Extraction Verification

Check extracted entities in database:

psql -h localhost -U tip_kg -d tip_lightrag << EOF
SELECT id, name, entity_type, confidence 
FROM entities 
WHERE domain = 'transceiver' 
LIMIT 10;
EOF

Expected output:

                   id                   |  name   | entity_type | confidence
----------------------------------------+---------+-------------+------------
 550e8400-e29b-41d4-a716-446655440000   | 400G    | transceiver | 0.92
 550e8400-e29b-41d4-a716-446655440001   | QSFP-DD | standard    | 0.89
 550e8400-e29b-41d4-a716-446655440002   | Cisco   | vendor      | 0.95

Phase 5: Evaluation Metrics

Run evaluation against sample queries:

curl -X POST http://localhost:3140/api/kg/eval \
  -H "Content-Type: application/json" \
  -d '{
    "domain": "transceiver",
    "eval_set": "transceiver-test",
    "queries": [
      {
        "query": "What is QSFP-DD?",
        "ground_truth_doc_ids": ["<UUID-from-ingestion>"]
      },
      {
        "query": "How much power do 400G transceivers consume?",
        "ground_truth_doc_ids": ["<UUID-from-ingestion>"]
      }
    ],
    "metrics": ["precision@5", "recall@10", "mrr@5", "ndcg@10"],
    "compare_to": "baseline_fts"
  }'

Expected response:

{
  "eval_set": "transceiver-test",
  "domain": "transceiver",
  "metrics": [
    {
      "metric": "precision@5",
      "value": 0.8,
      "baseline_value": 0.65,
      "improvement_pct": 23.1
    },
    ...
  ],
  "total_queries": 2,
  "latency_p95_ms": 234
}

Populating Evaluation Set

Once documents are ingested and queries are tested, populate the full evaluation set:

# Start sidecar in one terminal
uvicorn app.main:app --host 0.0.0.0 --port 3140 --reload

# In another terminal, run population script
cd /Users/renefichtmueller/Desktop/Claude\ Code/llm-gateway/packages/lightrag-sidecar
python scripts/populate_eval_set.py

Workflow:

  1. Script runs each query in eval-transceiver-50qa.json
  2. For each query, it shows suggested document IDs from retrieval results
  3. You verify/correct the ground truth (y/n/edit)
  4. Script saves updated evaluation set with ground_truth_doc_ids populated

Troubleshooting

Issue: "Cannot connect to PostgreSQL"

# Verify PostgreSQL is running
sudo systemctl status postgresql

# Check connection string
echo $DATABASE_URL

# Test connection
psql $DATABASE_URL -c "SELECT 1"

Issue: "Ollama timeouts during entity extraction"

# Verify Ollama is responding
curl http://192.168.178.213:11434/api/tags

# Check if model is loaded
ollama list

# Reload model if needed
ollama run qwen2.5:14b

Issue: "Qdrant connection refused"

# Verify Qdrant is running
curl http://localhost:6333/health

# List collections
curl http://localhost:6333/api/collections

# Start Qdrant if not running
docker run -p 6333:6333 qdrant/qdrant:latest

Issue: "Entity extraction returns empty"

Check Ollama logs:

# Monitor Ollama
tail -f ~/.ollama/logs/server.log

# Test Ollama directly
curl http://192.168.178.213:11434/api/generate \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen2.5:14b",
    "prompt": "Extract entities from: 400G QSFP-DD transceivers from Cisco",
    "stream": false
  }'

Performance Validation

Query Latency Benchmark

# Run 100 queries and measure latency
for i in {1..100}; do
  curl -s -X POST http://localhost:3140/api/kg/query \
    -H "Content-Type: application/json" \
    -d '{"query": "400G transceiver", "domain": "transceiver", "top_k": 5}' \
    | jq '.latency_ms'
done | awk '{sum+=$1; n++} END {print "Avg latency:", sum/n, "ms"}'

Expected result: Average latency <200ms

Recall@10 Baseline

After populating evaluation set, run full evaluation:

python scripts/populate_eval_set.py  # Ensures all docs are in ground_truth

curl -X POST http://localhost:3140/api/kg/eval \
  -H "Content-Type: application/json" \
  -d '{
    "domain": "transceiver",
    "eval_set": "transceiver-50qa",
    "queries": "<load from eval-transceiver-50qa.json>",
    "metrics": ["precision@5", "recall@10", "mrr@5", "ndcg@10"],
    "compare_to": "baseline_fts"
  }'

Target metrics:

  • Precision@5: ≥0.80 (vs 0.65 baseline)
  • Recall@10: ≥0.85 (vs 0.72 baseline)
  • MRR@5: ≥0.75 (vs 0.58 baseline)
  • NDCG@10: ≥0.80 (vs 0.70 baseline)

Cleanup Between Tests

# Clear all data and restart fresh
psql -U tip_kg -d tip_lightrag << EOF
TRUNCATE documents, entities, relations, query_logs, evaluation_results CASCADE;
EOF

# Clear Qdrant collections
curl -X DELETE http://localhost:6333/api/collections/documents_transceiver

# Restart sidecar
# (stop and start uvicorn)

Next: Erik Deployment

Once local testing passes all checks:

  1. Verify all tests pass
  2. Commit changes to Gitea
  3. Follow DEPLOYMENT_CHECKLIST.md for Erik deployment
  4. Monitor logs: pm2 logs lightrag-sidecar