Docling-powered OCR pipeline: PDF → markdown → chunks → Ollama embed → Qdrant. News embedding seeder for news_embeddings collection. Document and news semantic search API endpoints. - embeddings/ocr-pipeline.ts: Docling convert → chunk → embed pipeline - embeddings/seed-news.ts: Batch embed news_articles into Qdrant - routes/documents.ts: POST /api/documents/process, GET /api/documents - routes/search.ts: GET /search/documents, GET /search/news endpoints - sql/005-documents.sql: Add chunks_count, processed_at to documents table - Ollama + nomic-embed-text installed on Erik (CPU mode) - 89 products + 40 datasheet chunks + 33 news articles in Qdrant
4 lines
204 B
SQL
4 lines
204 B
SQL
-- Add OCR pipeline columns to existing documents table
|
|
ALTER TABLE documents ADD COLUMN IF NOT EXISTS chunks_count INT DEFAULT 0;
|
|
ALTER TABLE documents ADD COLUMN IF NOT EXISTS processed_at TIMESTAMPTZ;
|