llm-gateway/packages/lightrag-sidecar
Rene Fichtmueller 7599f33866 feat: integrate OpenAI Codex and ChatGPT as primary LLM providers via subscription
- Add openai-bridge service (port 3251) for ChatGPT and Codex integration
- Update external-providers.ts with openai and chatgpt provider definitions
- Add GPT-4 Turbo, GPT-4, and GPT-3.5 Turbo models to provider registry
- Modify getApiKey() to handle bridge provider authentication
- Modify getBaseUrl() to construct URLs from env vars
- Update ecosystem.config.cjs with OPENAI_BRIDGE_URL and OPENAI_API_KEY config
- Add openai-bridge PM2 service configuration (port 3251)
- Support both claude-bridge (port 3250) and openai-bridge (port 3251) as subscription services
- Extend fallback chain: claude → openai/chatgpt → cerebras → groq → mistral → nvidia → cloudflare

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-04-25 12:29:55 +02:00
..

LightRAG Sidecar — Knowledge Graph Integration

FastAPI sidecar running on Erik (192.168.178.82:3140) providing hybrid knowledge graph RAG capabilities for LLM Gateway learning engine.

Architecture

┌─────────────────────────────────────────────────────────────────┐
│ llm-gateway Learning Pipeline (Fastify :3103)                   │
│ - packages/learning/src/prompt-optimizer/                       │
│ - packages/learning-integration/src/feedback.ts                 │
│ + TypeScript KG Query Client                                    │
└──────────────────────────────┬──────────────────────────────────┘
                               │ HTTP POST
                               │ /api/kg/query
                               │ /api/kg/ingest
                               │ /api/kg/eval
                               ▼
┌─────────────────────────────────────────────────────────────────┐
│ LightRAG Python Sidecar (FastAPI :3140)                         │
│ - Entity extraction + linking (LLM-powered)                     │
│ - Hybrid retrieval (BM25 + vector)                              │
│ - Qdrant vector index (Erik :6333)                              │
│ - PostgreSQL knowledge graph (Erik pg)                          │
└─────────────────────────────────────────────────────────────────┘

Key Features

Hybrid Retrieval:

  • BM25 full-text search over PostgreSQL (entity text, descriptions)
  • Qdrant vector similarity (bge-m3 embeddings, 384-dim)
  • Reciprocal Rank Fusion (RRF) to combine results

Multilingual Support:

  • bge-m3 embeddings (English + Deutsch)
  • Entity linking across language variants
  • Query expansion in both languages

Quality Metrics:

  • Precision@5, Recall@10 per domain
  • Latency tracking (target <500ms p95)
  • Entity coverage % (entities found / total)
  • Confidence scoring per retrieval

Domains (Phase 1: TIP)

Transceiver Domain

Entities:

  • Transceiver Models (SFP28, QSFP28, QSFP-DD, OSFP)
  • Specifications (wavelength, distance, form factor)
  • Vendors (Cisco, Juniper, Arista, etc.)
  • Pricing & Availability
  • Compatibility Matrix

Relations:

  • supported_by (Transceiver → Switch)
  • complies_with (Transceiver → Standard like SFF-8024)
  • manufactured_by (Transceiver → Vendor)
  • price_tracked_by (Transceiver → Source)
  • compatible_with (Transceiver → Alternative Optics)

Knowledge Base:

  • 100 blog posts (blog-training-data/)
  • SFF-8024 standard specs
  • Vendor datasheets & compatibility lists
  • Pricing history (fs.com, competitors)
  • Industry standards (IEEE 802.3)

API Routes

Query Operations

POST /api/kg/query

{
  "query": "What 400G transceiver options work with Cisco Nexus 9300-GX?",
  "domain": "transceiver",
  "top_k": 5,
  "entity_links": true
}

Response includes:

  • results: ranked documents with relevance scores
  • entities: extracted entities with confidence
  • relations: entity relationships from knowledge graph
  • sources: citation to blog posts / datasheets
  • latency_ms: retrieval time

POST /api/kg/ingest

{
  "source": "blog",
  "domain": "transceiver",
  "documents": [...],
  "batch_size": 10
}

Triggers async ingestion pipeline:

  1. Entity extraction (LLM)
  2. Entity linking (fuzzy + vector similarity)
  3. Relation extraction
  4. Embedding + Qdrant indexing
  5. PostgreSQL graph storage

Evaluation Operations

POST /api/kg/eval

{
  "eval_set": "transceiver-50qa",
  "metrics": ["precision@5", "recall@10", "mrr@5"],
  "compare_to": "baseline_fts"
}

Returns:

  • KG vs FTS comparison
  • Per-question breakdown
  • Entity coverage %
  • Latency percentiles

Admin Operations

POST /api/kg/rebuild

  • Full reindex of Qdrant + PostgreSQL
  • Used after schema changes

GET /api/kg/health

  • Qdrant, PostgreSQL, LLM service status

Configuration

Environment Variables (set on Erik):

LIGHTRAG_DOMAIN=transceiver           # Active domain
LIGHTRAG_PORT=3140                    # FastAPI port
LLM_BACKEND=ollama                    # Extraction model
OLLAMA_URL=http://192.168.178.213:11434  # Mac Studio Ollama
QDRANT_URL=http://localhost:6333      # Local Qdrant (Erik)
DATABASE_URL=postgresql://tip_kg:...@localhost/tip_lightrag
EMBEDDING_MODEL=bge-m3                # 384-dim multilingual
EMBEDDING_BATCH_SIZE=32
MAX_WORKERS=4                         # Concurrent ingestion
EVAL_Q_PER_DOMAIN=50

PostgreSQL Schema (tip_lightrag database):

-- Entities: uniquely identified concepts
CREATE TABLE entities (
  id UUID PRIMARY KEY,
  domain TEXT NOT NULL,
  name TEXT NOT NULL,
  description TEXT,
  entity_type TEXT,  -- 'transceiver', 'standard', 'vendor', etc
  embedding VECTOR(384),
  confidence FLOAT,
  created_at TIMESTAMP
);

-- Relations: directed edges in knowledge graph
CREATE TABLE relations (
  source_id UUID REFERENCES entities,
  relation_type TEXT,  -- 'supported_by', 'manufactured_by', etc
  target_id UUID REFERENCES entities,
  strength FLOAT,  -- confidence in relation
  PRIMARY KEY (source_id, relation_type, target_id)
);

-- Documents: ingested content
CREATE TABLE documents (
  id UUID PRIMARY KEY,
  domain TEXT,
  source TEXT,  -- 'blog', 'datasheet', 'standard'
  title TEXT,
  content TEXT,
  entities UUID[],  -- linked entity IDs
  embedding VECTOR(384),
  created_at TIMESTAMP
);

-- Queries: audit trail for evaluation
CREATE TABLE queries (
  id UUID PRIMARY KEY,
  domain TEXT,
  query TEXT,
  retrieved_docs UUID[],
  ground_truth_docs UUID[],
  relevance_scores FLOAT[],
  latency_ms INT,
  created_at TIMESTAMP
);

Deployment

On Erik (production):

# 1. Create database
createdb tip_lightrag
psql tip_lightrag < schema.sql

# 2. Start Qdrant (if not running)
docker run -d --name qdrant -p 6333:6333 \
  -v /data/qdrant:/qdrant/storage \
  qdrant/qdrant

# 3. Start sidecar
pm2 start ecosystem.config.js --name lightrag-sidecar

# 4. Ingest TIP data
curl -X POST http://localhost:3140/api/kg/ingest \
  -H "Content-Type: application/json" \
  -d @tip-bootstrap.json

Local Development (Mac):

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# Run with SQLite for testing
LIGHTRAG_DB=sqlite:///test.db \
QDRANT_URL=http://localhost:6333 \
python -m uvicorn app.main:app --reload --port 3140

Performance Targets

  • Query Latency: <500ms p95 (including entity extraction)
  • Ingestion: 10-50 docs/sec depending on complexity
  • Recall@10: 85%+ vs baseline FTS
  • Entity Linking Accuracy: 90%+
  • Index Size: <1GB per domain

Phase 1 Success Criteria

  • Sidecar deployment on Erik
  • TIP blog posts fully indexed
  • 50-Q eval set baseline established
  • KG retrieval shows 2-3x improvement in MRR vs FTS
  • Entity extraction 90%+ accurate
  • Latency <500ms p95 for typical queries

Next Phases

Phase 1b (Week 2):

  • Fine-tune entity extraction on transceiver domain
  • Optimize entity linking disambiguation
  • Extend eval set to 100 Q&A pairs

Phase 2 (Week 3-4):

  • EO Global Pulse integration (contacts, companies, events)
  • Multilingual expansion (German technical terms)
  • Dashboard for query/retrieval analytics

Phase 3+:

  • Fine-grained relation extraction
  • Temporal reasoning (pricing trends, release dates)
  • Autonomous knowledge update (news → KG)