- fetchUrlContent() now extracts OG/meta tags (og:title, og:description,
name="description", og:site_name) as fallback content for JS-rendered SPAs
- Returns spaDetected=true when body text < 300 chars after stripping scripts
- from-url endpoint skips gatherBlogData() product injection when SPA detected,
preventing fo-blog-v10 from defaulting to optical networking domain
- additionalContext now includes SPA warning instructing LLM not to default
to optical transceiver topics unless the page is actually about that
- generated_by in pipeline UPDATE query now uses active model name instead of
hardcoded 'fo-blog-engine-v7' (reads getLlmProvider().ollamaModel)
- Dashboard shows SPA warning toast when spa_detected=true in response
- Response now includes spa_detected field for client awareness
New POST /api/blog/from-url endpoint:
- Accepts url + topic in request body
- Fetches page server-side (no CORS, 20s timeout, redirect-follow)
- Strips script/style/nav/footer/svg; extracts readable text (~5000 chars)
- Extracts page title from <title> or <h1>
- Passes extracted content as structured additional_context to the
existing 16-step FO blog pipeline (same as manual generation)
- Returns immediately; LLM pipeline runs async
- Validated: smoke test fetched flexoptix.net/en/blog/, 5040 chars,
pipeline launched with llm_enhancing=true
New "🔗 Blog aus URL generieren" panel in dashboard:
- URL input (Enter key triggers generation)
- Blog-Typ dropdown (same 8 types as manual panel)
- Button shows loading state "⏳ Fetching…" during API call
- Status line shows extracted char count after success
- Reuses pollBlogLlm() for step-by-step progress polling
- Inline status field for error display without toast spam
CODEX-TASK-flexoptix-reference-matching.md — comprehensive plan to fix
zero-match gap for ATGBICS/NADDOD/10Gtek/ShopFiber24 (8.260+ products
with 0 Flexoptix equivalences).
Root cause: 30-day price_observation window excludes vendors whose
scrapers ran >30 days ago. Solution: catalog-reconcile robot (full
bulk match, no time limit), form_factor normalization (SQL 108),
30→90 day window fix in nightly matcher, on-demand API endpoint.
Expected: coverage from 22% → 45-60% after one reconcile run.
Blog draft engine generates structured markdown from all Qdrant
collections (products, news, FAQ, troubleshooting). Supports 4
topic types: hype_cycle, comparison, new_product, tutorial.
- routes/blog.ts: POST /api/blog/generate, GET/PUT endpoints
- ecosystem.config.js: Added tip-scraper PM2 process
- Scraper scheduler (pg-boss) now running on Erik with 8 job queues
- News scraper running every 6 hours on Erik