transceiver-db/blog-training-data
Rene Fichtmueller 80aa85961b feat: add 7 gold-standard blog training articles for BlogLLM
Reference quality articles covering: 400G DR4 pricing, vendor lock-in,
silicon photonics, fiber plant readiness, 400ZR reality check,
DOM diagnostics, 800G readiness. All follow strict FO Blog Pipeline
rules — no markdown headers, no spec dumps, one thesis per article.
2026-04-06 01:58:05 +02:00
..

BlogLLM Training Data — Flexoptix Reference Articles

Gold-standard blog posts generated by Claude Sonnet (claude-sonnet-4-20250514) following the strict FO Blog Pipeline rules. These serve as reference examples for fine-tuning and training the BlogLLM.

Articles

File Title Type Score
blog-001-400g-dr4-price-war.md 400G DR4 Prices Are Moving... market_alert 9/10
blog-002-vendor-lock-in-optics.md The Hidden Tax in Your Transceiver Budget comparison 9/10
blog-003-silicon-photonics.md Silicon Photonics Is Shipping... technology_deep_dive 9/10
blog-004-400g-migration-fiber-plant.md Your 100G Fiber Plant Is Not Ready for 400G tutorial 9/10
blog-005-coherent-400zr-reality.md 400ZR Is Not What the Vendor Presentations Said technology_deep_dive 9/10
blog-006-dom-diagnostics.md Reading DOM Data Correctly tutorial 9/10
blog-007-800g-readiness.md 800G Is Shipping. Your Infrastructure Probably Isn't Ready. hype_cycle 9/10

Quality Rules Met (per article)

All articles were generated under strict constraints:

  • No markdown headers (##, ###) anywhere in body
  • No bullet lists as structural elements
  • No LaTeX formulas
  • No banned AI phrases ("leverage", "optimize", "game-changer", etc.)
  • No spec dumps or comparison tables
  • No OEM pricing presented as compatible pricing
  • No sales language ("BUY / AVOID", verdict blocks)
  • DR4 connector: MPO-12 (never LC)
  • DR4 wavelength: 1310nm (never 1550nm)
  • 400ZR and DR4 treated as distinct technologies
  • No per-port power figures >25W
  • No made-up part numbers
  • Only CMOS/physics-grounded values
  • One core thesis per article
  • Flexoptix FINAL OUTCOME TEST: reader finishes ready to validate properly, not defaulting to OEM

Usage for BlogLLM Training

  1. Import these as positive examples into the fine-tuning dataset
  2. Each article is ~800-1200 words (production blog length)
  3. Type field maps to generation template types in fo-blog-pipeline.ts
  4. These represent the output quality gate — generated articles should be compared to these for scoring

Adding More Training Data

Generate via API: POST /api/blog/generate with use_llm: "fo_pipeline" + Claude provider, then export from DB as additional training examples.