# LightRAG Sidecar Implementation ## Architecture The LightRAG sidecar is a FastAPI-based Python microservice that handles knowledge graph indexing, entity extraction, and hybrid retrieval (BM25 + vector search). ``` llm-gateway (Fastify :3103) ↓ lightrag-sidecar (FastAPI :3140) ↓ ├── PostgreSQL (entities, relations, documents, query logs, eval results) ├── Qdrant :6333 (vector indexing for hybrid search) └── Ollama :11434 (entity extraction with qwen2.5:14b) ``` ## Components ### Services #### RetrievalService (`app/services/retrieval_service.py`) Implements hybrid retrieval combining BM25 and vector search: - **`_bm25_search()`**: Full-text search using PostgreSQL `to_tsvector()` and `ts_rank()` - **`_vector_search()`**: Vector similarity search using Qdrant with bge-m3 384-dim embeddings - **`_rrf_merge()`**: Reciprocal Rank Fusion to combine rankings (k=60, weights: 0.4 BM25 / 0.6 vector) - **`_extract_entities_from_results()`**: Extract linked entities and relations from retrieved documents - **`_log_query()`**: Store queries for evaluation dataset building #### IngestionService (`app/services/ingestion_service.py`) Process documents through knowledge graph pipeline: 1. **Entity Extraction**: Use Ollama (qwen2.5:14b) to extract named entities from document text 2. **Entity Linking**: Match extracted entities to existing entities or create new ones 3. **Embedding**: Embed document content and entities using bge-m3 4. **Storage**: - Store in PostgreSQL (documents, entities, relations) - Index in Qdrant for vector search #### EvaluationService (`app/services/evaluation_service.py`) Calculate retrieval quality metrics: - **Precision@K**: % of top-K results that are relevant - **Recall@K**: % of relevant documents that appear in top-K - **MRR@K**: Mean Reciprocal Rank (inverse rank of first relevant result) - **NDCG@K**: Normalized Discounted Cumulative Gain Compares against baselines (FTS) and tracks improvement percentage. ### Routes #### Query (`/api/kg/query`) Perform hybrid retrieval: ```bash curl -X POST http://localhost:3140/api/kg/query \ -H "Content-Type: application/json" \ -d '{ "query": "What 400G transceivers work with Cisco Nexus 9300-GX?", "domain": "transceiver", "top_k": 5, "entity_links": true, "min_relevance": 0.5 }' ``` Returns: documents with relevance scores, extracted entities, relations, latency #### Ingestion (`/api/kg/ingest`) Submit documents for knowledge graph indexing: ```bash curl -X POST http://localhost:3140/api/kg/ingest \ -H "Content-Type: application/json" \ -d '{ "domain": "transceiver", "documents": [ { "title": "400G Transceiver Guide", "content": "...", "source": "blog", "metadata": {} } ], "batch_size": 10 }' ``` Returns: job_id for tracking background processing #### Evaluation (`/api/kg/eval`) Evaluate retrieval quality using evaluation sets: ```bash curl -X POST http://localhost:3140/api/kg/eval \ -H "Content-Type: application/json" \ -d '{ "domain": "transceiver", "eval_set": "transceiver-50qa", "queries": [ { "query": "What 400G transceivers work with Cisco Nexus 9300-GX?", "ground_truth_doc_ids": ["doc-123", "doc-456"] } ], "metrics": ["precision@5", "recall@10", "mrr@5", "ndcg@10"], "compare_to": "baseline_fts" }' ``` Returns: metric results with improvement vs baseline #### Health (`/api/kg/health`) Check dependency health: ```bash curl http://localhost:3140/api/kg/health ``` Returns: PostgreSQL, Qdrant, and Ollama status with latencies ## Database Schema ### Entities Table ```sql CREATE TABLE entities ( id UUID PRIMARY KEY, domain VARCHAR(100) NOT NULL, name VARCHAR(500) NOT NULL, description TEXT, entity_type VARCHAR(100), -- transceiver, vendor, standard, etc embedding VECTOR(384), -- bge-m3 embeddings confidence FLOAT DEFAULT 1.0, created_at TIMESTAMP, UNIQUE(domain, entity_type, name) ); ``` ### Relations Table ```sql CREATE TABLE relations ( source_id UUID REFERENCES entities(id), relation_type VARCHAR(100), -- supported_by, manufactured_by, etc target_id UUID REFERENCES entities(id), strength FLOAT DEFAULT 1.0, -- confidence in relation created_at TIMESTAMP, PRIMARY KEY (source_id, relation_type, target_id) ); ``` ### Documents Table ```sql CREATE TABLE documents ( id UUID PRIMARY KEY, domain VARCHAR(100) NOT NULL, title VARCHAR(500), content TEXT, source VARCHAR(100), -- blog, datasheet, standard entity_ids UUID[], -- linked entity IDs embedding VECTOR(384), -- document embedding token_count FLOAT, created_at TIMESTAMP ); ``` ### QueryLog Table ```sql CREATE TABLE query_logs ( id UUID PRIMARY KEY, domain VARCHAR(100), query_text TEXT, retrieved_doc_ids UUID[], ground_truth_doc_ids UUID[], relevance_scores FLOAT[], latency_ms FLOAT, entity_count FLOAT, created_at TIMESTAMP ); ``` ### EvaluationResults Table ```sql CREATE TABLE evaluation_results ( id UUID PRIMARY KEY, domain VARCHAR(100), eval_set_name VARCHAR(100), metric_name VARCHAR(100), metric_value FLOAT, baseline_value FLOAT, improvement_pct FLOAT, sample_count FLOAT, created_at TIMESTAMP ); ``` ## Configuration Environment variables in `.env`: ```env # Server LIGHTRAG_PORT=3140 ENVIRONMENT=production # LLM Backend OLLAMA_URL=http://192.168.178.213:11434 OLLAMA_MODEL=qwen2.5:14b # Vector Database QDRANT_URL=http://localhost:6333 EMBEDDING_MODEL=bge-m3 # PostgreSQL DATABASE_URL=postgresql://tip_kg:password@localhost:5432/tip_lightrag DB_POOL_SIZE=10 # Hybrid Retrieval HYBRID_RETRIEVAL_WEIGHTS={'bme25': 0.4, 'vector': 0.6} ``` ## Deployment ### Local Development ```bash # Install dependencies pip install -r requirements.txt # Initialize database python scripts/init_db.py # Run sidecar uvicorn app.main:app --host 0.0.0.0 --port 3140 --reload ``` ### Erik Deployment ```bash # Copy to Erik scp -r packages/lightrag-sidecar/ erik:/opt/llm-gateway/packages/ # Install on Erik cd /opt/llm-gateway/packages/lightrag-sidecar python -m venv venv source venv/bin/activate pip install -r requirements.txt # Initialize database on Erik python scripts/init_db.py # Start with PM2 pm2 start ecosystem.config.cjs # Bootstrap with TIP data LIGHTRAG_SIDECAR_URL=http://localhost:3140 python scripts/bootstrap_tip_data.py ``` ### Docker (Optional) ```bash docker-compose up -d lightrag-sidecar ``` ## Performance Targets - **Query Latency**: <500ms p95 - **Recall@10**: ≥85% (vs baseline FTS) - **Entity Linking Accuracy**: ≥90% - **Throughput**: ≥100 docs/sec ingestion ## Testing ```bash # Run health check curl http://localhost:3140/api/kg/health # Test query curl -X POST http://localhost:3140/api/kg/query \ -H "Content-Type: application/json" \ -d '{"query": "test", "domain": "transceiver"}' # Check status curl http://localhost:3140/api/kg/status # List evaluation datasets curl http://localhost:3140/api/kg/eval/datasets ``` ## Known Limitations 1. **Async/Await**: Some async operations use thread-blocking SQLAlchemy calls 2. **Ollama Timeout**: Entity extraction may timeout for long documents (>2000 chars) 3. **Qdrant ID Hashing**: Document IDs are hashed to 32-bit integers for Qdrant (may have collisions with very large datasets) 4. **Batch Size**: Default batch size of 10 docs; adjust `INGEST_BATCH_SIZE` for larger/smaller batches ## Next Steps 1. **Evaluation Dataset**: Create 50 Q&A pairs for transceiver domain with ground truth 2. **Integration Tests**: E2E tests for complete pipeline (ingest → query → evaluate) 3. **Performance Tuning**: Benchmark query latency, optimize RRF weights 4. **Multi-Domain Support**: Test with multiple domains (switch, standard, etc) 5. **TypeScript Client**: Create query client in llm-gateway for easy integration