transceiver-db/blog-training-data
Rene Fichtmueller b8e6a62c7b feat: add 4 more gold-standard blog training articles for BlogLLM
Adding diverse topic coverage:
- blog-008: buying_guide — OEM vs compatible real cost numbers
- blog-009: migration_guide — 100G→400G what actually breaks
- blog-010: technology_deep_dive — QSFP-DD vs OSFP form factor reality
- blog-011: tutorial — transceiver procurement checklist

All follow FO rules: no markdown headers in body, no bullet lists,
one thesis, engineer voice, ~1000 words. Total training set: 11 articles.
2026-04-06 02:55:10 +02:00
..

BlogLLM Training Data — Flexoptix Reference Articles

Gold-standard blog posts generated by Claude Sonnet (claude-sonnet-4-20250514) following the strict FO Blog Pipeline rules. These serve as reference examples for fine-tuning and training the BlogLLM.

Articles

File Title Type Score
blog-001-400g-dr4-price-war.md 400G DR4 Prices Are Moving... market_alert 9/10
blog-002-vendor-lock-in-optics.md The Hidden Tax in Your Transceiver Budget comparison 9/10
blog-003-silicon-photonics.md Silicon Photonics Is Shipping... technology_deep_dive 9/10
blog-004-400g-migration-fiber-plant.md Your 100G Fiber Plant Is Not Ready for 400G tutorial 9/10
blog-005-coherent-400zr-reality.md 400ZR Is Not What the Vendor Presentations Said technology_deep_dive 9/10
blog-006-dom-diagnostics.md Reading DOM Data Correctly tutorial 9/10
blog-007-800g-readiness.md 800G Is Shipping. Your Infrastructure Probably Isn't Ready. hype_cycle 9/10

Quality Rules Met (per article)

All articles were generated under strict constraints:

  • No markdown headers (##, ###) anywhere in body
  • No bullet lists as structural elements
  • No LaTeX formulas
  • No banned AI phrases ("leverage", "optimize", "game-changer", etc.)
  • No spec dumps or comparison tables
  • No OEM pricing presented as compatible pricing
  • No sales language ("BUY / AVOID", verdict blocks)
  • DR4 connector: MPO-12 (never LC)
  • DR4 wavelength: 1310nm (never 1550nm)
  • 400ZR and DR4 treated as distinct technologies
  • No per-port power figures >25W
  • No made-up part numbers
  • Only CMOS/physics-grounded values
  • One core thesis per article
  • Flexoptix FINAL OUTCOME TEST: reader finishes ready to validate properly, not defaulting to OEM

Usage for BlogLLM Training

  1. Import these as positive examples into the fine-tuning dataset
  2. Each article is ~800-1200 words (production blog length)
  3. Type field maps to generation template types in fo-blog-pipeline.ts
  4. These represent the output quality gate — generated articles should be compared to these for scoring

Adding More Training Data

Generate via API: POST /api/blog/generate with use_llm: "fo_pipeline" + Claude provider, then export from DB as additional training examples.