llm-gateway/packages/lightrag-sidecar
Rene Fichtmueller a04c1d67f2 feat: Complete LightRAG Sidecar Phase 2 — Hybrid Retrieval Implementation
Delivers production-ready knowledge graph sidecar with hybrid BM25+vector search.

COMPONENTS:
- RetrievalService: Hybrid BM25 + Qdrant vector search with RRF fusion (k=60, 0.4/0.6 weights)
- IngestionService: Document pipeline with Ollama entity extraction, entity linking, bge-m3 embeddings
- EvaluationService: Precision@K, Recall@K, MRR@K, NDCG@K metrics with FTS baseline comparison
- Database schema: Entity, Relation, Document, QueryLog, EvaluationResult ORM models
- API routes: /api/kg/query, /api/kg/ingest, /api/kg/eval, /api/kg/health

INFRASTRUCTURE:
- FastAPI 0.104 async server on port 3140
- PostgreSQL 17 + pgvector for knowledge graph storage
- Qdrant 2.7 vector database with COSINE distance (384-dim bge-m3)
- Ollama qwen2.5:14b for entity extraction via JSON-structured prompts
- PM2 ecosystem configuration for Erik production deployment

TESTING & DEPLOYMENT:
- TESTING.md: 5-phase local testing workflow with examples
- DEPLOYMENT_CHECKLIST.md: Step-by-step Erik deployment guide
- eval-transceiver-50qa.json: 50 Q&A evaluation pairs for transceiver domain
- populate_eval_set.py: Interactive script to populate ground truth document IDs
- READINESS_CHECKLIST.md: Pre-deployment verification checklist
- bootstrap_tip_data.py: Load TIP blog documents via API

PERFORMANCE TARGETS:
 Query latency p95: <500ms
 Recall@10: ≥85% (vs 72% FTS baseline)
 Entity extraction accuracy: ≥90%
 Ingestion throughput: ≥100 docs/sec
 Memory usage: <1GB

Ready for Phase 3: E2E testing, TypeScript client, multi-domain support.
2026-04-25 05:47:18 +02:00
..

LightRAG Sidecar — Knowledge Graph Integration

FastAPI sidecar running on Erik (192.168.178.82:3140) providing hybrid knowledge graph RAG capabilities for LLM Gateway learning engine.

Architecture

┌─────────────────────────────────────────────────────────────────┐
│ llm-gateway Learning Pipeline (Fastify :3103)                   │
│ - packages/learning/src/prompt-optimizer/                       │
│ - packages/learning-integration/src/feedback.ts                 │
│ + TypeScript KG Query Client                                    │
└──────────────────────────────┬──────────────────────────────────┘
                               │ HTTP POST
                               │ /api/kg/query
                               │ /api/kg/ingest
                               │ /api/kg/eval
                               ▼
┌─────────────────────────────────────────────────────────────────┐
│ LightRAG Python Sidecar (FastAPI :3140)                         │
│ - Entity extraction + linking (LLM-powered)                     │
│ - Hybrid retrieval (BM25 + vector)                              │
│ - Qdrant vector index (Erik :6333)                              │
│ - PostgreSQL knowledge graph (Erik pg)                          │
└─────────────────────────────────────────────────────────────────┘

Key Features

Hybrid Retrieval:

  • BM25 full-text search over PostgreSQL (entity text, descriptions)
  • Qdrant vector similarity (bge-m3 embeddings, 384-dim)
  • Reciprocal Rank Fusion (RRF) to combine results

Multilingual Support:

  • bge-m3 embeddings (English + Deutsch)
  • Entity linking across language variants
  • Query expansion in both languages

Quality Metrics:

  • Precision@5, Recall@10 per domain
  • Latency tracking (target <500ms p95)
  • Entity coverage % (entities found / total)
  • Confidence scoring per retrieval

Domains (Phase 1: TIP)

Transceiver Domain

Entities:

  • Transceiver Models (SFP28, QSFP28, QSFP-DD, OSFP)
  • Specifications (wavelength, distance, form factor)
  • Vendors (Cisco, Juniper, Arista, etc.)
  • Pricing & Availability
  • Compatibility Matrix

Relations:

  • supported_by (Transceiver → Switch)
  • complies_with (Transceiver → Standard like SFF-8024)
  • manufactured_by (Transceiver → Vendor)
  • price_tracked_by (Transceiver → Source)
  • compatible_with (Transceiver → Alternative Optics)

Knowledge Base:

  • 100 blog posts (blog-training-data/)
  • SFF-8024 standard specs
  • Vendor datasheets & compatibility lists
  • Pricing history (fs.com, competitors)
  • Industry standards (IEEE 802.3)

API Routes

Query Operations

POST /api/kg/query

{
  "query": "What 400G transceiver options work with Cisco Nexus 9300-GX?",
  "domain": "transceiver",
  "top_k": 5,
  "entity_links": true
}

Response includes:

  • results: ranked documents with relevance scores
  • entities: extracted entities with confidence
  • relations: entity relationships from knowledge graph
  • sources: citation to blog posts / datasheets
  • latency_ms: retrieval time

POST /api/kg/ingest

{
  "source": "blog",
  "domain": "transceiver",
  "documents": [...],
  "batch_size": 10
}

Triggers async ingestion pipeline:

  1. Entity extraction (LLM)
  2. Entity linking (fuzzy + vector similarity)
  3. Relation extraction
  4. Embedding + Qdrant indexing
  5. PostgreSQL graph storage

Evaluation Operations

POST /api/kg/eval

{
  "eval_set": "transceiver-50qa",
  "metrics": ["precision@5", "recall@10", "mrr@5"],
  "compare_to": "baseline_fts"
}

Returns:

  • KG vs FTS comparison
  • Per-question breakdown
  • Entity coverage %
  • Latency percentiles

Admin Operations

POST /api/kg/rebuild

  • Full reindex of Qdrant + PostgreSQL
  • Used after schema changes

GET /api/kg/health

  • Qdrant, PostgreSQL, LLM service status

Configuration

Environment Variables (set on Erik):

LIGHTRAG_DOMAIN=transceiver           # Active domain
LIGHTRAG_PORT=3140                    # FastAPI port
LLM_BACKEND=ollama                    # Extraction model
OLLAMA_URL=http://192.168.178.213:11434  # Mac Studio Ollama
QDRANT_URL=http://localhost:6333      # Local Qdrant (Erik)
DATABASE_URL=postgresql://tip_kg:...@localhost/tip_lightrag
EMBEDDING_MODEL=bge-m3                # 384-dim multilingual
EMBEDDING_BATCH_SIZE=32
MAX_WORKERS=4                         # Concurrent ingestion
EVAL_Q_PER_DOMAIN=50

PostgreSQL Schema (tip_lightrag database):

-- Entities: uniquely identified concepts
CREATE TABLE entities (
  id UUID PRIMARY KEY,
  domain TEXT NOT NULL,
  name TEXT NOT NULL,
  description TEXT,
  entity_type TEXT,  -- 'transceiver', 'standard', 'vendor', etc
  embedding VECTOR(384),
  confidence FLOAT,
  created_at TIMESTAMP
);

-- Relations: directed edges in knowledge graph
CREATE TABLE relations (
  source_id UUID REFERENCES entities,
  relation_type TEXT,  -- 'supported_by', 'manufactured_by', etc
  target_id UUID REFERENCES entities,
  strength FLOAT,  -- confidence in relation
  PRIMARY KEY (source_id, relation_type, target_id)
);

-- Documents: ingested content
CREATE TABLE documents (
  id UUID PRIMARY KEY,
  domain TEXT,
  source TEXT,  -- 'blog', 'datasheet', 'standard'
  title TEXT,
  content TEXT,
  entities UUID[],  -- linked entity IDs
  embedding VECTOR(384),
  created_at TIMESTAMP
);

-- Queries: audit trail for evaluation
CREATE TABLE queries (
  id UUID PRIMARY KEY,
  domain TEXT,
  query TEXT,
  retrieved_docs UUID[],
  ground_truth_docs UUID[],
  relevance_scores FLOAT[],
  latency_ms INT,
  created_at TIMESTAMP
);

Deployment

On Erik (production):

# 1. Create database
createdb tip_lightrag
psql tip_lightrag < schema.sql

# 2. Start Qdrant (if not running)
docker run -d --name qdrant -p 6333:6333 \
  -v /data/qdrant:/qdrant/storage \
  qdrant/qdrant

# 3. Start sidecar
pm2 start ecosystem.config.js --name lightrag-sidecar

# 4. Ingest TIP data
curl -X POST http://localhost:3140/api/kg/ingest \
  -H "Content-Type: application/json" \
  -d @tip-bootstrap.json

Local Development (Mac):

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# Run with SQLite for testing
LIGHTRAG_DB=sqlite:///test.db \
QDRANT_URL=http://localhost:6333 \
python -m uvicorn app.main:app --reload --port 3140

Performance Targets

  • Query Latency: <500ms p95 (including entity extraction)
  • Ingestion: 10-50 docs/sec depending on complexity
  • Recall@10: 85%+ vs baseline FTS
  • Entity Linking Accuracy: 90%+
  • Index Size: <1GB per domain

Phase 1 Success Criteria

  • Sidecar deployment on Erik
  • TIP blog posts fully indexed
  • 50-Q eval set baseline established
  • KG retrieval shows 2-3x improvement in MRR vs FTS
  • Entity extraction 90%+ accurate
  • Latency <500ms p95 for typical queries

Next Phases

Phase 1b (Week 2):

  • Fine-tune entity extraction on transceiver domain
  • Optimize entity linking disambiguation
  • Extend eval set to 100 Q&A pairs

Phase 2 (Week 3-4):

  • EO Global Pulse integration (contacts, companies, events)
  • Multilingual expansion (German technical terms)
  • Dashboard for query/retrieval analytics

Phase 3+:

  • Fine-grained relation extraction
  • Temporal reasoning (pricing trends, release dates)
  • Autonomous knowledge update (news → KG)