llm-gateway/packages/gateway/src/modules/request-stream.ts
Rene Fichtmueller a04c1d67f2 feat: Complete LightRAG Sidecar Phase 2 — Hybrid Retrieval Implementation
Delivers production-ready knowledge graph sidecar with hybrid BM25+vector search.

COMPONENTS:
- RetrievalService: Hybrid BM25 + Qdrant vector search with RRF fusion (k=60, 0.4/0.6 weights)
- IngestionService: Document pipeline with Ollama entity extraction, entity linking, bge-m3 embeddings
- EvaluationService: Precision@K, Recall@K, MRR@K, NDCG@K metrics with FTS baseline comparison
- Database schema: Entity, Relation, Document, QueryLog, EvaluationResult ORM models
- API routes: /api/kg/query, /api/kg/ingest, /api/kg/eval, /api/kg/health

INFRASTRUCTURE:
- FastAPI 0.104 async server on port 3140
- PostgreSQL 17 + pgvector for knowledge graph storage
- Qdrant 2.7 vector database with COSINE distance (384-dim bge-m3)
- Ollama qwen2.5:14b for entity extraction via JSON-structured prompts
- PM2 ecosystem configuration for Erik production deployment

TESTING & DEPLOYMENT:
- TESTING.md: 5-phase local testing workflow with examples
- DEPLOYMENT_CHECKLIST.md: Step-by-step Erik deployment guide
- eval-transceiver-50qa.json: 50 Q&A evaluation pairs for transceiver domain
- populate_eval_set.py: Interactive script to populate ground truth document IDs
- READINESS_CHECKLIST.md: Pre-deployment verification checklist
- bootstrap_tip_data.py: Load TIP blog documents via API

PERFORMANCE TARGETS:
 Query latency p95: <500ms
 Recall@10: ≥85% (vs 72% FTS baseline)
 Entity extraction accuracy: ≥90%
 Ingestion throughput: ≥100 docs/sec
 Memory usage: <1GB

Ready for Phase 3: E2E testing, TypeScript client, multi-domain support.
2026-04-25 05:47:18 +02:00

67 lines
1.6 KiB
TypeScript

import { EventEmitter } from 'events';
/**
* Request event emitted whenever a completion request is processed
*/
export interface RequestEvent {
request_id: string;
caller: string;
task_type?: string;
model: string;
status: 'approved' | 'warning' | 'pending_review' | 'rejected' | 'error';
confidence_score?: number;
tokens_in: number;
tokens_out: number;
cost_usd: number;
latency_ms: number;
fallback_used: boolean;
error_message?: string;
timestamp: number; // Unix epoch seconds
}
/**
* GlobalRequestStream: Singleton EventEmitter for broadcasting request events
* Used for SSE endpoints and real-time dashboard updates
*/
class GlobalRequestStream extends EventEmitter {
private static instance: GlobalRequestStream;
private maxListeners = 50;
private constructor() {
super();
this.setMaxListeners(this.maxListeners);
}
static getInstance(): GlobalRequestStream {
if (!GlobalRequestStream.instance) {
GlobalRequestStream.instance = new GlobalRequestStream();
}
return GlobalRequestStream.instance;
}
/**
* Emit a request event to all subscribers
*/
emitRequest(event: RequestEvent): void {
this.emit('request', event);
}
/**
* Subscribe to request events (used by SSE endpoint)
*/
onRequest(callback: (event: RequestEvent) => void): () => void {
this.on('request', callback);
// Return unsubscribe function
return () => this.off('request', callback);
}
/**
* Get current number of active listeners
*/
getListenerCount(): number {
return this.listenerCount('request');
}
}
export const globalRequestStream = GlobalRequestStream.getInstance();