45 Commits

Author SHA1 Message Date
Rene Fichtmueller
4b1734379a fix: Finder 404 shows helpful message + fuzzy switch name matching
- api() helper now parses JSON body on non-2xx responses so error.suggestion
  is available in catch blocks
- runFinder() catch shows 'Switch not found' + suggestion instead of 'Error: HTTP 404'
- finder.ts: normalized search (removes hyphens/spaces) + token-based fallback
  so 'sg350-28' → 'SG350-28', 'N9K-C93180' → Nexus 93180, etc.
2026-04-01 22:17:07 +02:00
Rene Fichtmueller
dad4750a86 feat: Changelog — CHANGELOG_PENDING.md, /api/changelog route, Overview tab widget
- CHANGELOG_PENDING.md: 26 entries from v0.1.0 to today in JSON-line format
- GET /api/changelog: parses and serves entries as JSON array
- Overview tab: changelog card with type badges (FEAT/FIX/UI/DATA/AI/INFRA),
  dates, show recent/all toggle
2026-04-01 22:14:14 +02:00
Rene Fichtmueller
681da54523 feat: Procurement Intelligence Engine (WS0c)
- Migration 019: stock_snapshots, abc_classification, reorder_signals,
  product_lifecycle_events, market_intelligence, crawler_llm_log tables
- Seeded 7 market intel events (OFC 2026, AWS/Azure CapEx, Coherent lead times,
  EU TED tenders, ECOC 2026, IEEE 802.3df)
- Seeded 4 lifecycle events (Cisco SFP-10G-LR EOL, Juniper EOL,
  400ZR ratified, 800G MSA draft)
- Crawler LLM: core.ts (Ollama-based extractor), stock-schema.ts (typed schemas
  + vendor profiles for Flexoptix/FS.com/10Gtek/ATGBICS/ProLabs/Farnell/Mouser),
  validator.ts (rule-based sanity checks + cross-validation)
- market-intelligence.ts scraper: OFC/ECOC, LightReading, IEEE 802.3, EU TED,
  Farnell/Mouser lead times, FierceTelecom — weekly via pg-boss
- computeAbcClassification(): dynamic A/B/C classification from price obs +
  compat count + vendor breadth
- computeReorderSignals(): buy_now/wait/hold/monitor with reasons + signal strength
- API: GET /api/procurement/overview|signals|signals/:id|abc|market-intel|
  stock-trends/:id|lifecycle
- Dashboard: Procurement Intel tab with Reorder Signals, ABC table,
  Market Intel cards, Lifecycle Events
2026-04-01 22:04:33 +02:00
Rene Fichtmueller
7fd9fd3c8a feat: competitor price comparison in transceiver detail
- API: also returns comparable_prices from technically equivalent products
  (same form_factor + speed_gbps + reach ±25%, different vendor, last 30 days)
- Dashboard: direct prices shown first, then separator + comparable products
- Comparable entries show vendor + exact part number scraped from their site
- Verified badge = real URL + observed within 7 days (strict)
2026-04-01 21:08:09 +02:00
Rene Fichtmueller
f91d2a15b9 feat: switch Flexoptix recommendations, switch verified labels, stronger verification check
- getCompatibleTransceivers: adds vendor_name, price, verification fields; Flexoptix sorted first
- Switch detail: data quality bar (Image/Product Page/Datasheet confirmed)
- Switch detail: Flexoptix Recommended section with prices, verified badges, shop links
- Switch detail: other vendors section shows 100% badge on slugs
- Transceiver detail: verification condition explicit === true (cache-safe)
- Transceiver detail: fallback text when no verification data exists yet
2026-04-01 20:59:30 +02:00
Rene Fichtmueller
3811b3b953 feat: temp range display, verification badges, competitor prices, tag tooltips
- Temperature Range: COM→'0-70°C (COM)', IND→'-40-85°C (IND)'
- GET /api/transceivers/🆔 returns competitor_prices[] from price_observations
- Detail view: verification summary bar (★ 100% VERIFIED / partial)
- Detail view: Current Prices section with vendor, price, verified badge, date, link
- Detail view: tag tooltips on vendor/category/market_status chips
- List view: new Verified column with 100% stamp or price check
- Optical Budget: TX Power Min/Max labels clarified
2026-04-01 20:47:02 +02:00
Rene Fichtmueller
2b683dadfb feat: Verified Price + 100% Verified stamp system
DB (017-verification-tags.sql):
- New columns: price_verified, price_verified_eur, price_verified_url, price_verified_at
- New columns: image_verified, details_verified, fully_verified, fully_verified_at
- compute_transceiver_verification(uuid): per-product verification logic
  • price_verified: real scraped URL + price > 0 + observed in last 30 days
  • image_verified: R2 stored OR image_url from known vendor CDNs (flexoptix.net, fs.com, etc.), no placeholder
  • details_verified: product_page_url + all core fields (form_factor, speed, reach, fiber_type, part_number) populated
  • fully_verified: all three true simultaneously
- recompute_all_verification(): bulk recompute, returns stats
- Initial run: 3575 price_verified, 1173 image_verified, 1380 details_verified, 258 fully_verified
- Indexes on price_verified, fully_verified for fast filtering
- v_verified_products view

API finder.ts:
- SELECT now includes all verification fields
- Response maps: price_verified, price_verified_eur, price_verified_url, image_verified, details_verified, fully_verified

API health.ts:
- verification block: counts + coverage percentages in /api/health

Dashboard Finder:
- 'Verified Price': green checkmark ✓ next to price, tooltip explains source
- '100% Verified' stamp: dark green gradient badge top of card, card gets green border
- 'price source ↗' link to original scraped URL
- Summary bar: 'X × 100% Verified · Y with verified prices'
2026-04-01 17:43:48 +02:00
Rene Fichtmueller
174078efdb feat: 100% verified data — no invented prices, part numbers, or designations
gatherBlogData():
- Fetches real prices from price_observations (last 30 days) per product
- Filters transceivers by speed extracted from topic keywords
- Enriches every product with verified_prices array + has_verified_price flag
- Joins DB products with vector search results (DB first — they have real prices)

contextData injection (blog.ts):
- [PRODUCT] lines: exact standard_name, form_factor, speed, reach, connector, dBm specs, Watts
- [VERIFIED PRICE] lines: real EUR/USD price, vendor, observed date, source URL
- [NO VERIFIED PRICE IN DB]: explicit tag — LLM must not invent a number
- [NO PRODUCT DATA AVAILABLE]: fallback when DB returns nothing

fo-blog-pipeline.ts system prompt:
- DATA INTEGRITY RULES block: prices/part numbers/vendors ONLY from context
- Never approximate with ~€350 or 'typically $200-600' for specific products
- Power specs only from [PRODUCT] data or REFERENCE VALUES

STEP4 context instructions:
- Explicit rules on how to use [VERIFIED PRICE] vs [NO VERIFIED PRICE]
- Invented data = HARD FAIL in QA

STEP9 QA — 3 new hard fail checks (30, 31, 32):
- Check 30: invented prices → remove or replace with flexoptix.net reference
- Check 31: invented part numbers → remove, use class name instead
- Check 32: invented vendor names → remove if not in known list
2026-04-01 17:27:55 +02:00
Rene Fichtmueller
ee8b3c0779 feat: hot topics daily rotation — 30+ topic pool, seeded shuffle, next-refresh countdown
- Expanded research pool to 9 topics (was 3), evergreen to 12 (was 3)
- Conference topics: added Photonics West, CIOE, NFOEC follow-up, year-end review
- Standards topics: 3 rotating variants (IEEE tracker, SFF-8024 registry, OIF CEI-112G)
- seededShuffle(): day-of-year as seed — stable within the day, different every day
- API response adds refreshes_at (next midnight UTC) for frontend countdown
- Dashboard subtitle shows 'rotates daily · next refresh in Xh'
- Hot topic cards now pass full title + angle into generateBlog() correctly
2026-04-01 11:12:38 +02:00
Rene Fichtmueller
580df8be01 blog: calibration v8 — AI phrasing blacklist, STEP8 6-step rewrite, Flexoptix author identity
- STRICTLY FORBIDDEN section: comprehensive AI phrasing blacklist (leverage, utilize,
  Furthermore, robust, seamless, delve into, in conclusion, etc.)
- STEP8 rewritten into 6 structured steps: hunt AI words, break rhythm, add human element,
  fix label formats, structural cleanup, ensure Flexoptix identity
- Banned sentence structures added (parallel triplets, same-length paragraphs)
- STEP6 author identity: reader should know this came from Flexoptix
- Version bump to 0.3.0
2026-04-01 00:43:38 +02:00
Rene Fichtmueller
52a04129e2 blog: calibration v7 — remove Cause/Fix/Example labels, integrate as prose narrative 2026-04-01 00:10:50 +02:00
Rene Fichtmueller
6b77b18842 fix(blog): extract article from QA, status badge ready/step X/10, calibration v6 Flexoptix balance 2026-03-31 23:52:56 +02:00
Rene Fichtmueller
ef0b0bb148 fix(llm): add 429 retry with exponential backoff + ollamaQueue concurrency guard 2026-03-31 21:45:46 +02:00
Rene Fichtmueller
01ad16464d blog: calibration v5 — anti-consulting-prose, correct loss budget math, vendor lock-in specifics 2026-03-31 21:26:39 +02:00
Rene Fichtmueller
58a26116b1 fix(blog): 3s delay between queued LLM pipelines to prevent nginx 429 bursts 2026-03-31 19:40:40 +02:00
Rene Fichtmueller
45abd15fe4 blog: calibration v4 — technical accuracy + structure limits
- Fix SR4/DR4 fiber count: both use 8 fibers (4TX+4RX), difference is MMF vs SMF
- Fix power per port: 400G=10-15W/port, 800G=15-25W/port (not 1kW/port)
- Fix pricing context: always distinguish OEM ($1-5K) vs compatible ($200-600)
- Add HARD RULES 15-21: fiber count, power, pricing, no markdown headers,
  max 6 sections, no repeated topics, flow over format
- Add QA CALIBRATION FAILS 16-21: same rules enforced at QA step
- Add fiber/power reference tables with correct values
- Strip markdown (##/###/####/**) from all output — plain text only
- Add Style B gold example (10/10 validated prose article)
- Update STEP5 reality injection with correct SR4→DR4 description
- Update STEP8 kill-AI-tone to strip markdown headers + merge duplicates
2026-03-31 17:27:51 +02:00
Rene Fichtmueller
315a988775 feat(blog): add Style B prose calibration — 10/10 narrative flow standard
- CALIBRATION_GOLD_STANDARD now covers two validated styles: A (structured) and B (prose)
- Style B: no headers, no bullets, 1-3 sentence paragraphs, reframe ending
- STEP8_KILL_AI_TONE: prose conversion option for over-structured articles
- STEP4_MASTER_DRAFT: explicit style choice instruction (A vs B based on angle)
- Gold standard includes exact prose rhythm patterns from 10/10 human-reviewed article
- Wrong patterns expanded: symmetric sections, checklist endings, transition clichés
2026-03-31 16:48:10 +02:00
Rene Fichtmueller
f71ef2b20c feat(blog): regenerate button, SEO hashtags, calibration engine v2
- POST /api/blog/:id/regenerate — re-runs full 10-step LLM pipeline on existing draft
- Regenerate button visible when quality_issues present or status=review
- SEO keywords now displayed as clickable #hashtags (copy-to-clipboard)
- fo-blog-pipeline: added PoE misuse, DR4 mislabeling, ZR/DR4 conflation as hard QA fails
- fo-blog-pipeline: 14 hard rules in system prompt (was 10)
- fo-blog-pipeline: CALIBRATION_GOLD_STANDARD + withCalibration() from 10/10 human review
- System prompt now includes gold standard example on every pipeline run
2026-03-31 16:46:25 +02:00
Rene Fichtmueller
12d12aab4f feat(v0.2.6): hot topics + pipeline lock + blog delete + clean external JS
Hot Topics:
- Dynamic topics from /api/hot-topics loaded in Blog Engine tab
- 7 data sources (prices, competitors, hype cycle, news, conferences, research, evergreen)
- Urgency badges: BREAKING (red), HOT (orange), TRENDING (yellow), EMERGING (green)

Pipeline Lock:
- Only 1 generation at a time, 'Pipeline Busy' toast on double-click
- Progress bar with step names (external hot-topics.js, no inline hacks)

Blog Delete:
- DELETE /api/blog/:id endpoint
- Delete button (✕) on each blog in list
- 'Delete All Templates' button to clean up test drafts

Fix: dashboard JS extracted to external hot-topics.js to avoid sed quote hell
2026-03-31 09:54:33 +02:00
Rene Fichtmueller
3132b58309 feat(v0.2.5): hot topics engine + pipeline lock + UX fixes
Hot Topics Engine (GET /api/hot-topics):
- 7 data sources: price movements, competitor alerts, hype cycle transitions,
  news articles, conference calendar, research trends, evergreen topics
- Auto-discovers BREAKING/HOT/TRENDING/EMERGING topics
- Dashboard loads topics dynamically with urgency badges and source labels
- Click any topic → generates blog with that angle

Pipeline Lock (critical UX fix):
- Only 1 blog generation at a time (blogPipelineRunning flag)
- 'Pipeline Busy' toast if user clicks while generating
- Lock released on completion, timeout, or error

Dashboard:
- Static 3 cards replaced with dynamic hot topics grid
- 'Refresh Topics' button
- Topics show urgency color (red=breaking, orange=hot, yellow=trending, green=emerging)
- Auto-loads when Blog Engine tab opens
2026-03-31 09:49:43 +02:00
Rene Fichtmueller
278207078b feat(v0.2.4): blog generation UX overhaul — live progress bar
When you click Generate:
- Dark overlay with orange progress bar shows pipeline status
- Live step counter: 'Step 3/10: Outline Generation — decision-driven structure'
- Percentage updates every 15 seconds via API polling
- When done: shows word count + QA score, auto-opens the article
- No more silent template dump — user sees the entire pipeline working
2026-03-31 09:44:29 +02:00
Rene Fichtmueller
4233118505 fix(v0.2.3): dashboard polling for LLM blog pipeline
Root cause: pollBlogLlm() checked for 'llm' in generated_by but pipeline
sets 'fo-blog-engine-v3'. Dashboard showed template forever.

Fixes:
- Poll check: now detects any non-template generated_by
- Poll timeout: 20s interval × 60 attempts = 20 min (pipeline takes ~10 min)
- Status toast shows pipeline step progress (Step X/10)
- Generation message tells user LLM runs ~10 min in background
- Version bump to v0.2.3
2026-03-31 09:41:20 +02:00
Rene Fichtmueller
9bb2f549f8 fix(v0.2.2): OLLAMA_URL pointed to localhost instead of .213 via WireGuard
Blog engine was falling back to template because qwen2.5:14b is on Mac Studio (.213),
not on Erik (localhost). Fixed ecosystem.config.js to use 192.168.178.213:11434.
This was the root cause why the 10-step pipeline never executed.
2026-03-31 09:28:34 +02:00
Rene Fichtmueller
c01d69e02e fix(blog): harden pipeline prompts based on v0.2.1 blog review feedback
System prompt: 10 HARD RULES (non-negotiable, article fails QA without them)
- Mandatory WHAT BREAKS IN PRODUCTION section (2+ specific failures with symptoms/cause/fix)
- Mandatory HIDDEN COSTS section (cleaning, troubleshooting time, cabling redesign, training)
- Mandatory WHEN NOT TO USE section for every recommendation
- Absolute statement rule: NEVER without conditions/context
- Cabling reality: MPO polarity, SR4→DR4 migration, cleaning requirements
- Brutal hook requirement: not 'If you're still...' but 'You're about to sign a PO. Stop.'
- Minimum 2500 words (was 2000)

Step 5 (Reality Injection): Now checks for ALL mandatory sections and adds if missing
Step 9 (QA Check): Hard fail checks — article is NOT publishable without production failures + hidden costs
Feedback source: Human expert review scoring 7.5/10, targeting 9.5+
2026-03-31 09:24:08 +02:00
Rene Fichtmueller
6bd168e958 chore: bump version to v0.2.1 2026-03-31 09:19:38 +02:00
Rene Fichtmueller
eec42e4818 feat: wire 10-step FO Blog Pipeline into blog generation route
Replaces old 2-pass pipeline with full Flexoptix Style 10-step generation:
1. Topic Expansion (real scenarios + wrong assumptions)
2. Angle Selection (single strong angle + audience)
3. Outline Generation (decision-driven, no generic sections)
4. Master Draft (Flexoptix voice, 2000+ words)
5. Reality Injection (failure scenarios, operational pain)
6. Technical Deepening (specific optics, power, density)
7. Opinion Layer (clear positions, no neutrality)
8. Kill AI Tone (remove all AI fingerprints)
9. QA Check (technical accuracy verification)
10. Quality Score (1-10 auto-rating, saved as self-feedback)

Feedback loop active:
- Accumulated feedback injected into system prompt
- Auto QA scores saved to blog_feedback table
- Training data export via GET /api/blog/feedback/training-data
2026-03-31 09:16:23 +02:00
Rene Fichtmueller
d1d23ce31d feat(v0.2.1): data confidence tracking + validation + blog feedback system
- Migration 016: data_confidence column (vendor_verified/enriched_estimated/scraped_unverified)
- Migration 015: blog_feedback table with 8 quality scores + free text
- Validation script: 8 physics-based rules (wavelength↔fiber, reach plausibility, power limits)
- Blog feedback API: POST /api/blog/:id/feedback + training data export
- FO Blog Pipeline v3: 10-step Flexoptix Style prompts (Less bullshit. More engineering.)
- Auto-fix: wavelength↔fiber mismatches corrected automatically
2026-03-31 09:12:37 +02:00
Rene Fichtmueller
531e25b327 chore: bump version to 0.2.0 in health endpoint 2026-03-31 08:59:00 +02:00
Rene Fichtmueller
1f8176bf8e fix: UUID cast in datasheet routes — use slug-first lookup 2026-03-31 08:58:26 +02:00
Rene Fichtmueller
24a9eba9ce feat(v0.2.0): datasheets + adoption roadmap + all routes registered
- GET /api/datasheets/transceiver/:id — Full datasheet with power budget, pricing, compatibility, HTML export
- GET /api/datasheets/switch/:id — Switch datasheet with compatible transceivers
- GET /api/adoption — Full technology roadmap with maturity indicators
- GET /api/adoption/:technology — Detailed adoption analysis, migration paths, risks, timelines
- All v0.2.0 routes registered in index.ts
2026-03-31 08:57:03 +02:00
Rene Fichtmueller
a69acc4588 feat(v0.2.0): Sales Intelligence Engine — Phase 0+A
New API routes:
- GET /api/finder — Switch→Flexoptix transceiver finder with FlexBox coding
- GET /api/competitor-alerts — Competitor intelligence (price changes, new products, stock)
- GET /api/forecast/:technology — Sales forecast 3/9/12/18 months + buy/wait/hold signal
- POST /api/transport/plan — Transport system planner (city→city BOM with fiber providers)

New MCP tools:
- find_flexoptix_for_switch — Customer switch → Flexoptix products
- get_competitor_alerts — Competitor monitoring
- plan_transport — Network transport planning
- forecast_sales — Volume/revenue prediction
- generate_blog — Enhanced blog generation

New DB tables (migration 013):
- competitor_alerts, price_changes, flexoptix_product_map
- sales_forecasts, fiber_providers, fiber_routes, cities
- generated_datasheets, blog_series
- Views: v_price_coverage, v_image_coverage, v_switch_flexoptix_finder

Seed data (migration 014):
- 25 European cities with IX/DC locations + coordinates
- 15 fiber providers (euNetworks, Telia, DTAG, Colt, Zayo, etc.)
- 16 fiber routes with pricing (Germany focus)

Infrastructure:
- Scraper scheduler: 2h Flexoptix, 4h FS.com/Optcore (was 6-8h)
- Change detector for competitor price/stock monitoring
- Image downloader utility with coverage tracking
2026-03-31 08:51:22 +02:00
Rene Fichtmueller
814325b349 feat: dashboard v2, blog expansion, market/cable MCP tools, switch asset scrapers, scraper utilities 2026-03-30 08:07:12 +02:00
Rene Fichtmueller
615a7e50c7 fix: remove non-existent vendor URL columns, fix text=uuid cast in transceiver lookup 2026-03-30 07:49:54 +02:00
Rene Fichtmueller
39dc5a4ab4 fix: add trust proxy for Cloudflare — fixes ERR_ERL_UNEXPECTED_X_FORWARDED_FOR in rate limiter 2026-03-30 06:41:36 +02:00
Rene Fichtmueller
6f7c834752 feat(scrapers+mcp): ATGBICS + ProLabs scrapers, MCP HTTP/SSE server
Scrapers:
- atgbics.ts: PlaywrightCrawler for UK vendor ATGBICS (Shopify store),
  scrapes SFP/SFP+/SFP28/QSFP+/QSFP28/QSFP-DD in GBP, max 50 pages/run
- prolabs.ts: HttpCrawler for ProLabs (Legrand subsidiary), USD pricing,
  category-driven crawl with reach/fiber/speed detection
- Both registered in scheduler (every 8h, staggered) and index.ts CLI

MCP HTTP Server:
- packages/mcp-server/src/http-server.ts: Express + SSEServerTransport
- Exposes all 12 TIP tools via GET /sse + POST /message
- Bearer token auth (MCP_SECRET env), CORS-configurable
- GET /health → { status: "ok", tools: 12 }
- Port: MCP_HTTP_PORT (default 3201)

SQL + tools:
- sql/006-009: seed scripts for whitebox switches, vendors, assets
- switch-docs.ts: MCP tool for switch documentation queries
2026-03-29 02:26:45 +08:00
Rene Fichtmueller
280bf8f50a feat: calibrate regional adoption model with research-backed parameters
Update REGIONAL_LAGS with data from LightCounting, vendor earnings,
OFC market sessions, and Chinese IPO prospectuses. Add price index
per region and segment mix (hyperscaler/telco/enterprise) for
more accurate regional revenue modeling.
2026-03-28 02:34:29 +13:00
Rene Fichtmueller
70447def02 feat: massive scraper expansion + hype cycle engine + lifecycle prediction
New scrapers:
- GBICS.com (BigCommerce, GBP prices, 10 categories, 78 products)
- Juniper HCT (Next.js SSR parser, 475 transceivers with specs/EOL)
- SFPcables.com (Magento store, 16 categories, 78 products)
- Fluxlight (BigCommerce, 6 pages, 118 products)
- Champion ONE (compatible vendor scraper)

Scraper fixes:
- 10Gtek: rewritten to parse HTML spec tables (152 products)
- Flexoptix: fix price extraction from Magento Hyva HTML
- Register all scrapers in CLI (--gbics, --juniper, --sfpcables, etc.)

Hype Cycle Engine enhancements:
- Data-driven enrichment from scraped vendor/price data
- Revenue lifecycle prediction (peak year, decline, revenue index)
- Regional adoption model (NA, China, APAC, Europe, RoW with lag coefficients)
- New API endpoints: /enriched, /lifecycle, /regional/:tech

DB growth: 89 → 1,168 transceivers, 0 → 416 prices, 6 vendors
Qdrant: 1,162 products embedded with nomic-embed-text

Research: Norton-Bass model, standards-to-market timelines, hype signals
2026-03-28 02:30:19 +13:00
Rene Fichtmueller
312c5cb815 fix: hype cycle findTechnology matched wrong tech (1G instead of 1.6T)
findTechnology used loose includes() matching — '1.6T OSFP-XD' matched
'1G SFP' first because query contained '1'. Now matches exact name first,
then by speed prefix with proper unit parsing (G/T).
2026-03-28 01:00:52 +13:00
Rene Fichtmueller
a6f2b2ef9e feat: Phase 8 — Dashboard frontend + static serving
Single-file dashboard with 6 tabs: Overview, Semantic Search,
Hype Cycle, Transceivers, News, Blog Engine. Dark theme, no
build step, served as static HTML from Express.

- Overview: health stats, vector collection counts, recent news
- Semantic Search: query across all 6 Qdrant collections
- Hype Cycle: Norton-Bass table with phase colors + position bars
- Transceivers: searchable table with form factor/speed/reach
- News: semantic news search with source links
- Blog: generate drafts from templates, view draft history

Live at: https://transceiver-db.context-x.org/dashboard/
2026-03-28 00:37:10 +13:00
Rene Fichtmueller
274b80a4f1 feat: Phase 7 — Blog generator + scraper scheduler activation
Blog draft engine generates structured markdown from all Qdrant
collections (products, news, FAQ, troubleshooting). Supports 4
topic types: hype_cycle, comparison, new_product, tutorial.

- routes/blog.ts: POST /api/blog/generate, GET/PUT endpoints
- ecosystem.config.js: Added tip-scraper PM2 process
- Scraper scheduler (pg-boss) now running on Erik with 8 job queues
- News scraper running every 6 hours on Erik
2026-03-28 00:32:08 +13:00
Rene Fichtmueller
4cb2db6455 feat: Phase 6 — FAQ + troubleshooting knowledge base embeddings
19 curated FAQ entries covering form factors, fiber types, reach,
compatibility, WDM, power, and emerging tech (CPO, LPO, 400ZR).
10 troubleshooting guides with symptom/cause/solution format.

All 6 Qdrant collections now populated:
- product_embeddings: 89 transceivers
- datasheet_chunks: 40 chunks (OCR pipeline)
- faq_embeddings: 19 FAQ entries
- troubleshooting_embeddings: 10 guides
- news_embeddings: 33 articles
- manual_chunks: 0 (pending manual ingestion)
2026-03-28 00:24:50 +13:00
Rene Fichtmueller
8bb3b586f3 feat: Phase 5 — OCR pipeline + document/news search
Docling-powered OCR pipeline: PDF → markdown → chunks → Ollama embed → Qdrant.
News embedding seeder for news_embeddings collection.
Document and news semantic search API endpoints.

- embeddings/ocr-pipeline.ts: Docling convert → chunk → embed pipeline
- embeddings/seed-news.ts: Batch embed news_articles into Qdrant
- routes/documents.ts: POST /api/documents/process, GET /api/documents
- routes/search.ts: GET /search/documents, GET /search/news endpoints
- sql/005-documents.sql: Add chunks_count, processed_at to documents table
- Ollama + nomic-embed-text installed on Erik (CPU mode)
- 89 products + 40 datasheet chunks + 33 news articles in Qdrant
2026-03-28 00:22:01 +13:00
Rene Fichtmueller
6d3e5cc04a feat: Phase 4 — Vector embeddings + semantic search
Ollama nomic-embed-text (768 dim) → Qdrant vector search pipeline.
Embeds all 89 transceivers with rich text representation and payload
filters (form_factor, speed_gbps, fiber_type, wdm_type).

- embeddings/client.ts: Ollama embed + Qdrant upsert/search
- embeddings/seed-products.ts: Batch seeder for product_embeddings
- routes/search.ts: GET /api/search, /search/products, /search/stats
- 6 Qdrant collections: products, datasheets, FAQs, manuals, troubleshooting, news
2026-03-28 00:05:29 +13:00
Rene Fichtmueller
eb875f37d2 feat: Phase 3 — Norton-Bass Hype Cycle Engine
Implements the full Norton-Bass Multigenerational Diffusion Model for
transceiver technology lifecycle forecasting.

Math: Bass diffusion F(t) + logistic adoption S(t) = L / (1 + e^(-k(t-t0)))
Parameters: p (innovation ~0.03), q (imitation ~0.3-0.5), m (market potential)

Phase Classification Engine (composite score):
  30% Port shipment share + 20% ASP decline rate + 15% Standards maturity
  + 15% Interop validation + 10% Vendor trajectory + 10% Media sentiment

11 technologies tracked: 1G → 10G → 25G → 40G → 100G → 400G → 800G → 1.6T
  + CPO, LPO, 400ZR Coherent
5-year adoption forecast per technology

API: GET /api/hype-cycle (all) + GET /api/hype-cycle/:tech (detail)
Live: https://transceiver-db.context-x.org/api/hype-cycle
2026-03-27 23:35:57 +13:00
Rene Fichtmueller
e9fb50a248 feat: TIP Phase 0+1 — monorepo, DB schema, API, scraper engine
Phase 0 - Foundation:
- Restructure into npm workspace monorepo (packages/core, api, scraper)
- PostgreSQL 17 + TimescaleDB schema (15 tables incl. hypertables)
- Docker Compose for local dev (PostgreSQL on 5433 + Qdrant)
- Express 5 API on port 3200 with 6 routes
- Seed script to migrate 159 transceivers + 42 standards from npm package
- Erik server setup script + PM2 ecosystem config

Phase 1 - Scraper Engine:
- Crawlee + Playwright framework with pg-boss scheduler
- FS.com scraper (PlaywrightCrawler, anti-bot workaround)
- Optcore.net scraper (WP REST API enumeration + PlaywrightCrawler)
  - Uses /wp-json/wp/v2/product to get 2000+ product URLs
  - Playwright renders individual product pages for price extraction
- Cisco TMG Matrix scraper (compatibility data)
- News RSS aggregator (optics.org, SPIE, Network World, Nature Photonics)
  - Keyword relevance scoring for transceiver/fiber topics
  - xml2js with malformed XML sanitization
- SHA-256 content hashing for change detection (skip unchanged records)
- pg-boss v10 with explicit queue creation before scheduling
2026-03-27 16:27:31 +13:00