llm-gateway/packages/lightrag-sidecar/README.md

# LightRAG Sidecar — Knowledge Graph Integration

FastAPI sidecar running on Erik (192.168.178.82:3140) providing hybrid knowledge graph RAG capabilities for LLM Gateway learning engine.

## Architecture

```
┌─────────────────────────────────────────────────────────────────┐
│ llm-gateway Learning Pipeline (Fastify :3103)                   │
│ - packages/learning/src/prompt-optimizer/                       │
│ - packages/learning-integration/src/feedback.ts                 │
│ + TypeScript KG Query Client                                    │
└──────────────────────────────┬──────────────────────────────────┘
                               │ HTTP POST
                               │ /api/kg/query
                               │ /api/kg/ingest
                               │ /api/kg/eval
                               ▼
┌─────────────────────────────────────────────────────────────────┐
│ LightRAG Python Sidecar (FastAPI :3140)                         │
│ - Entity extraction + linking (LLM-powered)                     │
│ - Hybrid retrieval (BM25 + vector)                              │
│ - Qdrant vector index (Erik :6333)                              │
│ - PostgreSQL knowledge graph (Erik pg)                          │
└─────────────────────────────────────────────────────────────────┘
```

## Key Features

**Hybrid Retrieval**:
- BM25 full-text search over PostgreSQL (entity text, descriptions)
- Qdrant vector similarity (bge-m3 embeddings, 384-dim)
- Reciprocal Rank Fusion (RRF) to combine results

**Multilingual Support**:
- bge-m3 embeddings (English + Deutsch)
- Entity linking across language variants
- Query expansion in both languages

**Quality Metrics**:
- Precision@5, Recall@10 per domain
- Latency tracking (target <500ms p95)
- Entity coverage % (entities found / total)
- Confidence scoring per retrieval

## Domains (Phase 1: TIP)

### Transceiver Domain
**Entities**:
- Transceiver Models (SFP28, QSFP28, QSFP-DD, OSFP)
- Specifications (wavelength, distance, form factor)
- Vendors (Cisco, Juniper, Arista, etc.)
- Pricing & Availability
- Compatibility Matrix

**Relations**:
- `supported_by` (Transceiver → Switch)
- `complies_with` (Transceiver → Standard like SFF-8024)
- `manufactured_by` (Transceiver → Vendor)
- `price_tracked_by` (Transceiver → Source)
- `compatible_with` (Transceiver → Alternative Optics)

**Knowledge Base**:
- 100 blog posts (blog-training-data/)
- SFF-8024 standard specs
- Vendor datasheets & compatibility lists
- Pricing history (fs.com, competitors)
- Industry standards (IEEE 802.3)

## API Routes

### Query Operations

**POST /api/kg/query**
```json
{
  "query": "What 400G transceiver options work with Cisco Nexus 9300-GX?",
  "domain": "transceiver",
  "top_k": 5,
  "entity_links": true
}
```

Response includes:
- `results`: ranked documents with relevance scores
- `entities`: extracted entities with confidence
- `relations`: entity relationships from knowledge graph
- `sources`: citation to blog posts / datasheets
- `latency_ms`: retrieval time

**POST /api/kg/ingest**
```json
{
  "source": "blog",
  "domain": "transceiver",
  "documents": [...],
  "batch_size": 10
}
```

Triggers async ingestion pipeline:
1. Entity extraction (LLM)
2. Entity linking (fuzzy + vector similarity)
3. Relation extraction
4. Embedding + Qdrant indexing
5. PostgreSQL graph storage

### Evaluation Operations

**POST /api/kg/eval**
```json
{
  "eval_set": "transceiver-50qa",
  "metrics": ["precision@5", "recall@10", "mrr@5"],
  "compare_to": "baseline_fts"
}
```

Returns:
- KG vs FTS comparison
- Per-question breakdown
- Entity coverage %
- Latency percentiles

### Admin Operations

**POST /api/kg/rebuild**
- Full reindex of Qdrant + PostgreSQL
- Used after schema changes

**GET /api/kg/health**
- Qdrant, PostgreSQL, LLM service status

## Configuration

**Environment Variables** (set on Erik):
```bash
LIGHTRAG_DOMAIN=transceiver           # Active domain
LIGHTRAG_PORT=3140                    # FastAPI port
LLM_BACKEND=ollama                    # Extraction model
OLLAMA_URL=http://192.168.178.213:11434  # Mac Studio Ollama
QDRANT_URL=http://localhost:6333      # Local Qdrant (Erik)
DATABASE_URL=postgresql://tip_kg:...@localhost/tip_lightrag
EMBEDDING_MODEL=bge-m3                # 384-dim multilingual
EMBEDDING_BATCH_SIZE=32
MAX_WORKERS=4                         # Concurrent ingestion
EVAL_Q_PER_DOMAIN=50
```

**PostgreSQL Schema** (tip_lightrag database):
```sql
-- Entities: uniquely identified concepts
CREATE TABLE entities (
  id UUID PRIMARY KEY,
  domain TEXT NOT NULL,
  name TEXT NOT NULL,
  description TEXT,
  entity_type TEXT,  -- 'transceiver', 'standard', 'vendor', etc
  embedding VECTOR(384),
  confidence FLOAT,
  created_at TIMESTAMP
);

-- Relations: directed edges in knowledge graph
CREATE TABLE relations (
  source_id UUID REFERENCES entities,
  relation_type TEXT,  -- 'supported_by', 'manufactured_by', etc
  target_id UUID REFERENCES entities,
  strength FLOAT,  -- confidence in relation
  PRIMARY KEY (source_id, relation_type, target_id)
);

-- Documents: ingested content
CREATE TABLE documents (
  id UUID PRIMARY KEY,
  domain TEXT,
  source TEXT,  -- 'blog', 'datasheet', 'standard'
  title TEXT,
  content TEXT,
  entities UUID[],  -- linked entity IDs
  embedding VECTOR(384),
  created_at TIMESTAMP
);

-- Queries: audit trail for evaluation
CREATE TABLE queries (
  id UUID PRIMARY KEY,
  domain TEXT,
  query TEXT,
  retrieved_docs UUID[],
  ground_truth_docs UUID[],
  relevance_scores FLOAT[],
  latency_ms INT,
  created_at TIMESTAMP
);
```

## Deployment

**On Erik** (production):
```bash
# 1. Create database
createdb tip_lightrag
psql tip_lightrag < schema.sql

# 2. Start Qdrant (if not running)
docker run -d --name qdrant -p 6333:6333 \
  -v /data/qdrant:/qdrant/storage \
  qdrant/qdrant

# 3. Start sidecar
pm2 start ecosystem.config.js --name lightrag-sidecar

# 4. Ingest TIP data
curl -X POST http://localhost:3140/api/kg/ingest \
  -H "Content-Type: application/json" \
  -d @tip-bootstrap.json
```

**Local Development** (Mac):
```bash
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# Run with SQLite for testing
LIGHTRAG_DB=sqlite:///test.db \
QDRANT_URL=http://localhost:6333 \
python -m uvicorn app.main:app --reload --port 3140
```

## Performance Targets

- **Query Latency**: <500ms p95 (including entity extraction)
- **Ingestion**: 10-50 docs/sec depending on complexity
- **Recall@10**: 85%+ vs baseline FTS
- **Entity Linking Accuracy**: 90%+
- **Index Size**: <1GB per domain

## Phase 1 Success Criteria

- [x] Sidecar deployment on Erik
- [ ] TIP blog posts fully indexed
- [ ] 50-Q eval set baseline established
- [ ] KG retrieval shows 2-3x improvement in MRR vs FTS
- [ ] Entity extraction 90%+ accurate
- [ ] Latency <500ms p95 for typical queries

## Next Phases

**Phase 1b** (Week 2):
- Fine-tune entity extraction on transceiver domain
- Optimize entity linking disambiguation
- Extend eval set to 100 Q&A pairs

**Phase 2** (Week 3-4):
- EO Global Pulse integration (contacts, companies, events)
- Multilingual expansion (German technical terms)
- Dashboard for query/retrieval analytics

**Phase 3+**:
- Fine-grained relation extraction
- Temporal reasoning (pricing trends, release dates)
- Autonomous knowledge update (news → KG)