# LightRAG Sidecar — Knowledge Graph Integration FastAPI sidecar running on Erik (192.168.178.82:3140) providing hybrid knowledge graph RAG capabilities for LLM Gateway learning engine. ## Architecture ``` ┌─────────────────────────────────────────────────────────────────┐ │ llm-gateway Learning Pipeline (Fastify :3103) │ │ - packages/learning/src/prompt-optimizer/ │ │ - packages/learning-integration/src/feedback.ts │ │ + TypeScript KG Query Client │ └──────────────────────────────┬──────────────────────────────────┘ │ HTTP POST │ /api/kg/query │ /api/kg/ingest │ /api/kg/eval ▼ ┌─────────────────────────────────────────────────────────────────┐ │ LightRAG Python Sidecar (FastAPI :3140) │ │ - Entity extraction + linking (LLM-powered) │ │ - Hybrid retrieval (BM25 + vector) │ │ - Qdrant vector index (Erik :6333) │ │ - PostgreSQL knowledge graph (Erik pg) │ └─────────────────────────────────────────────────────────────────┘ ``` ## Key Features **Hybrid Retrieval**: - BM25 full-text search over PostgreSQL (entity text, descriptions) - Qdrant vector similarity (bge-m3 embeddings, 384-dim) - Reciprocal Rank Fusion (RRF) to combine results **Multilingual Support**: - bge-m3 embeddings (English + Deutsch) - Entity linking across language variants - Query expansion in both languages **Quality Metrics**: - Precision@5, Recall@10 per domain - Latency tracking (target <500ms p95) - Entity coverage % (entities found / total) - Confidence scoring per retrieval ## Domains (Phase 1: TIP) ### Transceiver Domain **Entities**: - Transceiver Models (SFP28, QSFP28, QSFP-DD, OSFP) - Specifications (wavelength, distance, form factor) - Vendors (Cisco, Juniper, Arista, etc.) - Pricing & Availability - Compatibility Matrix **Relations**: - `supported_by` (Transceiver → Switch) - `complies_with` (Transceiver → Standard like SFF-8024) - `manufactured_by` (Transceiver → Vendor) - `price_tracked_by` (Transceiver → Source) - `compatible_with` (Transceiver → Alternative Optics) **Knowledge Base**: - 100 blog posts (blog-training-data/) - SFF-8024 standard specs - Vendor datasheets & compatibility lists - Pricing history (fs.com, competitors) - Industry standards (IEEE 802.3) ## API Routes ### Query Operations **POST /api/kg/query** ```json { "query": "What 400G transceiver options work with Cisco Nexus 9300-GX?", "domain": "transceiver", "top_k": 5, "entity_links": true } ``` Response includes: - `results`: ranked documents with relevance scores - `entities`: extracted entities with confidence - `relations`: entity relationships from knowledge graph - `sources`: citation to blog posts / datasheets - `latency_ms`: retrieval time **POST /api/kg/ingest** ```json { "source": "blog", "domain": "transceiver", "documents": [...], "batch_size": 10 } ``` Triggers async ingestion pipeline: 1. Entity extraction (LLM) 2. Entity linking (fuzzy + vector similarity) 3. Relation extraction 4. Embedding + Qdrant indexing 5. PostgreSQL graph storage ### Evaluation Operations **POST /api/kg/eval** ```json { "eval_set": "transceiver-50qa", "metrics": ["precision@5", "recall@10", "mrr@5"], "compare_to": "baseline_fts" } ``` Returns: - KG vs FTS comparison - Per-question breakdown - Entity coverage % - Latency percentiles ### Admin Operations **POST /api/kg/rebuild** - Full reindex of Qdrant + PostgreSQL - Used after schema changes **GET /api/kg/health** - Qdrant, PostgreSQL, LLM service status ## Configuration **Environment Variables** (set on Erik): ```bash LIGHTRAG_DOMAIN=transceiver # Active domain LIGHTRAG_PORT=3140 # FastAPI port LLM_BACKEND=ollama # Extraction model OLLAMA_URL=http://192.168.178.213:11434 # Mac Studio Ollama QDRANT_URL=http://localhost:6333 # Local Qdrant (Erik) DATABASE_URL=postgresql://tip_kg:...@localhost/tip_lightrag EMBEDDING_MODEL=bge-m3 # 384-dim multilingual EMBEDDING_BATCH_SIZE=32 MAX_WORKERS=4 # Concurrent ingestion EVAL_Q_PER_DOMAIN=50 ``` **PostgreSQL Schema** (tip_lightrag database): ```sql -- Entities: uniquely identified concepts CREATE TABLE entities ( id UUID PRIMARY KEY, domain TEXT NOT NULL, name TEXT NOT NULL, description TEXT, entity_type TEXT, -- 'transceiver', 'standard', 'vendor', etc embedding VECTOR(384), confidence FLOAT, created_at TIMESTAMP ); -- Relations: directed edges in knowledge graph CREATE TABLE relations ( source_id UUID REFERENCES entities, relation_type TEXT, -- 'supported_by', 'manufactured_by', etc target_id UUID REFERENCES entities, strength FLOAT, -- confidence in relation PRIMARY KEY (source_id, relation_type, target_id) ); -- Documents: ingested content CREATE TABLE documents ( id UUID PRIMARY KEY, domain TEXT, source TEXT, -- 'blog', 'datasheet', 'standard' title TEXT, content TEXT, entities UUID[], -- linked entity IDs embedding VECTOR(384), created_at TIMESTAMP ); -- Queries: audit trail for evaluation CREATE TABLE queries ( id UUID PRIMARY KEY, domain TEXT, query TEXT, retrieved_docs UUID[], ground_truth_docs UUID[], relevance_scores FLOAT[], latency_ms INT, created_at TIMESTAMP ); ``` ## Deployment **On Erik** (production): ```bash # 1. Create database createdb tip_lightrag psql tip_lightrag < schema.sql # 2. Start Qdrant (if not running) docker run -d --name qdrant -p 6333:6333 \ -v /data/qdrant:/qdrant/storage \ qdrant/qdrant # 3. Start sidecar pm2 start ecosystem.config.js --name lightrag-sidecar # 4. Ingest TIP data curl -X POST http://localhost:3140/api/kg/ingest \ -H "Content-Type: application/json" \ -d @tip-bootstrap.json ``` **Local Development** (Mac): ```bash python -m venv .venv source .venv/bin/activate pip install -r requirements.txt # Run with SQLite for testing LIGHTRAG_DB=sqlite:///test.db \ QDRANT_URL=http://localhost:6333 \ python -m uvicorn app.main:app --reload --port 3140 ``` ## Performance Targets - **Query Latency**: <500ms p95 (including entity extraction) - **Ingestion**: 10-50 docs/sec depending on complexity - **Recall@10**: 85%+ vs baseline FTS - **Entity Linking Accuracy**: 90%+ - **Index Size**: <1GB per domain ## Phase 1 Success Criteria - [x] Sidecar deployment on Erik - [ ] TIP blog posts fully indexed - [ ] 50-Q eval set baseline established - [ ] KG retrieval shows 2-3x improvement in MRR vs FTS - [ ] Entity extraction 90%+ accurate - [ ] Latency <500ms p95 for typical queries ## Next Phases **Phase 1b** (Week 2): - Fine-tune entity extraction on transceiver domain - Optimize entity linking disambiguation - Extend eval set to 100 Q&A pairs **Phase 2** (Week 3-4): - EO Global Pulse integration (contacts, companies, events) - Multilingual expansion (German technical terms) - Dashboard for query/retrieval analytics **Phase 3+**: - Fine-grained relation extraction - Temporal reasoning (pricing trends, release dates) - Autonomous knowledge update (news → KG)