Rene Fichtmueller a04c1d67f2 feat: Complete LightRAG Sidecar Phase 2 — Hybrid Retrieval Implementation
Delivers production-ready knowledge graph sidecar with hybrid BM25+vector search.

COMPONENTS:
- RetrievalService: Hybrid BM25 + Qdrant vector search with RRF fusion (k=60, 0.4/0.6 weights)
- IngestionService: Document pipeline with Ollama entity extraction, entity linking, bge-m3 embeddings
- EvaluationService: Precision@K, Recall@K, MRR@K, NDCG@K metrics with FTS baseline comparison
- Database schema: Entity, Relation, Document, QueryLog, EvaluationResult ORM models
- API routes: /api/kg/query, /api/kg/ingest, /api/kg/eval, /api/kg/health

INFRASTRUCTURE:
- FastAPI 0.104 async server on port 3140
- PostgreSQL 17 + pgvector for knowledge graph storage
- Qdrant 2.7 vector database with COSINE distance (384-dim bge-m3)
- Ollama qwen2.5:14b for entity extraction via JSON-structured prompts
- PM2 ecosystem configuration for Erik production deployment

TESTING & DEPLOYMENT:
- TESTING.md: 5-phase local testing workflow with examples
- DEPLOYMENT_CHECKLIST.md: Step-by-step Erik deployment guide
- eval-transceiver-50qa.json: 50 Q&A evaluation pairs for transceiver domain
- populate_eval_set.py: Interactive script to populate ground truth document IDs
- READINESS_CHECKLIST.md: Pre-deployment verification checklist
- bootstrap_tip_data.py: Load TIP blog documents via API

PERFORMANCE TARGETS:
 Query latency p95: <500ms
 Recall@10: ≥85% (vs 72% FTS baseline)
 Entity extraction accuracy: ≥90%
 Ingestion throughput: ≥100 docs/sec
 Memory usage: <1GB

Ready for Phase 3: E2E testing, TypeScript client, multi-domain support.
2026-04-25 05:47:18 +02:00

303 lines
7.8 KiB
Markdown

# LightRAG Sidecar Implementation
## Architecture
The LightRAG sidecar is a FastAPI-based Python microservice that handles knowledge graph indexing, entity extraction, and hybrid retrieval (BM25 + vector search).
```
llm-gateway (Fastify :3103)
lightrag-sidecar (FastAPI :3140)
├── PostgreSQL (entities, relations, documents, query logs, eval results)
├── Qdrant :6333 (vector indexing for hybrid search)
└── Ollama :11434 (entity extraction with qwen2.5:14b)
```
## Components
### Services
#### RetrievalService (`app/services/retrieval_service.py`)
Implements hybrid retrieval combining BM25 and vector search:
- **`_bm25_search()`**: Full-text search using PostgreSQL `to_tsvector()` and `ts_rank()`
- **`_vector_search()`**: Vector similarity search using Qdrant with bge-m3 384-dim embeddings
- **`_rrf_merge()`**: Reciprocal Rank Fusion to combine rankings (k=60, weights: 0.4 BM25 / 0.6 vector)
- **`_extract_entities_from_results()`**: Extract linked entities and relations from retrieved documents
- **`_log_query()`**: Store queries for evaluation dataset building
#### IngestionService (`app/services/ingestion_service.py`)
Process documents through knowledge graph pipeline:
1. **Entity Extraction**: Use Ollama (qwen2.5:14b) to extract named entities from document text
2. **Entity Linking**: Match extracted entities to existing entities or create new ones
3. **Embedding**: Embed document content and entities using bge-m3
4. **Storage**:
- Store in PostgreSQL (documents, entities, relations)
- Index in Qdrant for vector search
#### EvaluationService (`app/services/evaluation_service.py`)
Calculate retrieval quality metrics:
- **Precision@K**: % of top-K results that are relevant
- **Recall@K**: % of relevant documents that appear in top-K
- **MRR@K**: Mean Reciprocal Rank (inverse rank of first relevant result)
- **NDCG@K**: Normalized Discounted Cumulative Gain
Compares against baselines (FTS) and tracks improvement percentage.
### Routes
#### Query (`/api/kg/query`)
Perform hybrid retrieval:
```bash
curl -X POST http://localhost:3140/api/kg/query \
-H "Content-Type: application/json" \
-d '{
"query": "What 400G transceivers work with Cisco Nexus 9300-GX?",
"domain": "transceiver",
"top_k": 5,
"entity_links": true,
"min_relevance": 0.5
}'
```
Returns: documents with relevance scores, extracted entities, relations, latency
#### Ingestion (`/api/kg/ingest`)
Submit documents for knowledge graph indexing:
```bash
curl -X POST http://localhost:3140/api/kg/ingest \
-H "Content-Type: application/json" \
-d '{
"domain": "transceiver",
"documents": [
{
"title": "400G Transceiver Guide",
"content": "...",
"source": "blog",
"metadata": {}
}
],
"batch_size": 10
}'
```
Returns: job_id for tracking background processing
#### Evaluation (`/api/kg/eval`)
Evaluate retrieval quality using evaluation sets:
```bash
curl -X POST http://localhost:3140/api/kg/eval \
-H "Content-Type: application/json" \
-d '{
"domain": "transceiver",
"eval_set": "transceiver-50qa",
"queries": [
{
"query": "What 400G transceivers work with Cisco Nexus 9300-GX?",
"ground_truth_doc_ids": ["doc-123", "doc-456"]
}
],
"metrics": ["precision@5", "recall@10", "mrr@5", "ndcg@10"],
"compare_to": "baseline_fts"
}'
```
Returns: metric results with improvement vs baseline
#### Health (`/api/kg/health`)
Check dependency health:
```bash
curl http://localhost:3140/api/kg/health
```
Returns: PostgreSQL, Qdrant, and Ollama status with latencies
## Database Schema
### Entities Table
```sql
CREATE TABLE entities (
id UUID PRIMARY KEY,
domain VARCHAR(100) NOT NULL,
name VARCHAR(500) NOT NULL,
description TEXT,
entity_type VARCHAR(100), -- transceiver, vendor, standard, etc
embedding VECTOR(384), -- bge-m3 embeddings
confidence FLOAT DEFAULT 1.0,
created_at TIMESTAMP,
UNIQUE(domain, entity_type, name)
);
```
### Relations Table
```sql
CREATE TABLE relations (
source_id UUID REFERENCES entities(id),
relation_type VARCHAR(100), -- supported_by, manufactured_by, etc
target_id UUID REFERENCES entities(id),
strength FLOAT DEFAULT 1.0, -- confidence in relation
created_at TIMESTAMP,
PRIMARY KEY (source_id, relation_type, target_id)
);
```
### Documents Table
```sql
CREATE TABLE documents (
id UUID PRIMARY KEY,
domain VARCHAR(100) NOT NULL,
title VARCHAR(500),
content TEXT,
source VARCHAR(100), -- blog, datasheet, standard
entity_ids UUID[], -- linked entity IDs
embedding VECTOR(384), -- document embedding
token_count FLOAT,
created_at TIMESTAMP
);
```
### QueryLog Table
```sql
CREATE TABLE query_logs (
id UUID PRIMARY KEY,
domain VARCHAR(100),
query_text TEXT,
retrieved_doc_ids UUID[],
ground_truth_doc_ids UUID[],
relevance_scores FLOAT[],
latency_ms FLOAT,
entity_count FLOAT,
created_at TIMESTAMP
);
```
### EvaluationResults Table
```sql
CREATE TABLE evaluation_results (
id UUID PRIMARY KEY,
domain VARCHAR(100),
eval_set_name VARCHAR(100),
metric_name VARCHAR(100),
metric_value FLOAT,
baseline_value FLOAT,
improvement_pct FLOAT,
sample_count FLOAT,
created_at TIMESTAMP
);
```
## Configuration
Environment variables in `.env`:
```env
# Server
LIGHTRAG_PORT=3140
ENVIRONMENT=production
# LLM Backend
OLLAMA_URL=http://192.168.178.213:11434
OLLAMA_MODEL=qwen2.5:14b
# Vector Database
QDRANT_URL=http://localhost:6333
EMBEDDING_MODEL=bge-m3
# PostgreSQL
DATABASE_URL=postgresql://tip_kg:password@localhost:5432/tip_lightrag
DB_POOL_SIZE=10
# Hybrid Retrieval
HYBRID_RETRIEVAL_WEIGHTS={'bme25': 0.4, 'vector': 0.6}
```
## Deployment
### Local Development
```bash
# Install dependencies
pip install -r requirements.txt
# Initialize database
python scripts/init_db.py
# Run sidecar
uvicorn app.main:app --host 0.0.0.0 --port 3140 --reload
```
### Erik Deployment
```bash
# Copy to Erik
scp -r packages/lightrag-sidecar/ erik:/opt/llm-gateway/packages/
# Install on Erik
cd /opt/llm-gateway/packages/lightrag-sidecar
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Initialize database on Erik
python scripts/init_db.py
# Start with PM2
pm2 start ecosystem.config.cjs
# Bootstrap with TIP data
LIGHTRAG_SIDECAR_URL=http://localhost:3140 python scripts/bootstrap_tip_data.py
```
### Docker (Optional)
```bash
docker-compose up -d lightrag-sidecar
```
## Performance Targets
- **Query Latency**: <500ms p95
- **Recall@10**: 85% (vs baseline FTS)
- **Entity Linking Accuracy**: 90%
- **Throughput**: 100 docs/sec ingestion
## Testing
```bash
# Run health check
curl http://localhost:3140/api/kg/health
# Test query
curl -X POST http://localhost:3140/api/kg/query \
-H "Content-Type: application/json" \
-d '{"query": "test", "domain": "transceiver"}'
# Check status
curl http://localhost:3140/api/kg/status
# List evaluation datasets
curl http://localhost:3140/api/kg/eval/datasets
```
## Known Limitations
1. **Async/Await**: Some async operations use thread-blocking SQLAlchemy calls
2. **Ollama Timeout**: Entity extraction may timeout for long documents (>2000 chars)
3. **Qdrant ID Hashing**: Document IDs are hashed to 32-bit integers for Qdrant (may have collisions with very large datasets)
4. **Batch Size**: Default batch size of 10 docs; adjust `INGEST_BATCH_SIZE` for larger/smaller batches
## Next Steps
1. **Evaluation Dataset**: Create 50 Q&A pairs for transceiver domain with ground truth
2. **Integration Tests**: E2E tests for complete pipeline (ingest → query → evaluate)
3. **Performance Tuning**: Benchmark query latency, optimize RRF weights
4. **Multi-Domain Support**: Test with multiple domains (switch, standard, etc)
5. **TypeScript Client**: Create query client in llm-gateway for easy integration