4 Commits

Author SHA1 Message Date
Rene Fichtmueller
4b452ab49e feat(scrapers+mcp): ATGBICS + ProLabs scrapers, MCP HTTP/SSE server
Scrapers:
- atgbics.ts: PlaywrightCrawler for UK vendor ATGBICS (Shopify store),
  scrapes SFP/SFP+/SFP28/QSFP+/QSFP28/QSFP-DD in GBP, max 50 pages/run
- prolabs.ts: HttpCrawler for ProLabs (Legrand subsidiary), USD pricing,
  category-driven crawl with reach/fiber/speed detection
- Both registered in scheduler (every 8h, staggered) and index.ts CLI

MCP HTTP Server:
- packages/mcp-server/src/http-server.ts: Express + SSEServerTransport
- Exposes all 12 TIP tools via GET /sse + POST /message
- Bearer token auth (MCP_SECRET env), CORS-configurable
- GET /health → { status: "ok", tools: 12 }
- Port: MCP_HTTP_PORT (default 3201)

SQL + tools:
- sql/006-009: seed scripts for whitebox switches, vendors, assets
- switch-docs.ts: MCP tool for switch documentation queries
2026-03-29 02:26:45 +08:00
Rene Fichtmueller
0a63307505 feat: Phase 6 — FAQ + troubleshooting knowledge base embeddings
19 curated FAQ entries covering form factors, fiber types, reach,
compatibility, WDM, power, and emerging tech (CPO, LPO, 400ZR).
10 troubleshooting guides with symptom/cause/solution format.

All 6 Qdrant collections now populated:
- product_embeddings: 89 transceivers
- datasheet_chunks: 40 chunks (OCR pipeline)
- faq_embeddings: 19 FAQ entries
- troubleshooting_embeddings: 10 guides
- news_embeddings: 33 articles
- manual_chunks: 0 (pending manual ingestion)
2026-03-28 00:24:50 +13:00
Rene Fichtmueller
122ca8444d feat: Phase 5 — OCR pipeline + document/news search
Docling-powered OCR pipeline: PDF → markdown → chunks → Ollama embed → Qdrant.
News embedding seeder for news_embeddings collection.
Document and news semantic search API endpoints.

- embeddings/ocr-pipeline.ts: Docling convert → chunk → embed pipeline
- embeddings/seed-news.ts: Batch embed news_articles into Qdrant
- routes/documents.ts: POST /api/documents/process, GET /api/documents
- routes/search.ts: GET /search/documents, GET /search/news endpoints
- sql/005-documents.sql: Add chunks_count, processed_at to documents table
- Ollama + nomic-embed-text installed on Erik (CPU mode)
- 89 products + 40 datasheet chunks + 33 news articles in Qdrant
2026-03-28 00:22:01 +13:00
Rene Fichtmueller
0260d0b365 feat: Phase 4 — Vector embeddings + semantic search
Ollama nomic-embed-text (768 dim) → Qdrant vector search pipeline.
Embeds all 89 transceivers with rich text representation and payload
filters (form_factor, speed_gbps, fiber_type, wdm_type).

- embeddings/client.ts: Ollama embed + Qdrant upsert/search
- embeddings/seed-products.ts: Batch seeder for product_embeddings
- routes/search.ts: GET /api/search, /search/products, /search/stats
- 6 Qdrant collections: products, datasheets, FAQs, manuals, troubleshooting, news
2026-03-28 00:05:29 +13:00