transceiver-db/blog-training-data
Rene Fichtmueller 2c3cc69a78 feat: BlogLLM training corpus expansion — 127 articles across 18 phases
Comprehensive B2B technical blog training dataset combining deep optical
networking domain expertise (Articles 102-180) with scientific content
engineering (Articles 181-228).

Coverage:
- Phase 1 (Foundation): Optical diagnostics, transceiver validation,
  DWDM strategy, vendor lock-in, vertical markets, 5G/6G optics
- Phase 2 (Deep Technical): 400G/800G coherent, PAM-4/8 modulation,
  silicon photonics, troubleshooting mastery
- Phase 3 (Vertical Markets): FinTech, CDN, government, manufacturing,
  edge computing, telco carrier-grade, quantum networking
- Phase 4 (Specialized/Emerging): CXL/RoCE, observability, DR/BCP,
  capacity planning, DCI design
- Phase 5 (Operations/Management): Testing, vendor relationships,
  zero trust, program management, troubleshooting scenarios
- Phase 6-9 (Synthesis): OSI model, security layers, manufacturers,
  competitive landscape, practical building, project management
- Phase 11-12 (Content Engineering): NLP persuasion, blog writing
  science, hook engineering, visual design, B2B psychology,
  A/B testing, AI prompt engineering
- Phase 13-15 (Strategic Excellence): SEO, brand voice, case studies,
  newsletters, analytics, analyst relations, webinars, advocacy,
  product launches, crisis comms, internationalization, community
- Phase 16-18 (Advanced/Final): ABM, marketing automation, employee
  advocacy, interactive content, original research, AI ethics,
  governance, IR content, generative AI future, privacy, accessibility

Stats: 127 files, ~57,977 lines, ~700,000 words, quality_score: 9
Frontmatter: YAML with training_data:true flag for fine-tuner pipeline
Target: BlogLLM fine-tuning via packages/fine-tuner → GGUF → Ollama
2026-05-12 23:21:39 +02:00
..

BlogLLM Training Data — Flexoptix Reference Articles

Gold-standard blog posts generated by Claude Sonnet (claude-sonnet-4-20250514) following the strict FO Blog Pipeline rules. These serve as reference examples for fine-tuning and training the BlogLLM.

Articles

File Title Type Score
blog-001-400g-dr4-price-war.md 400G DR4 Prices Are Moving... market_alert 9/10
blog-002-vendor-lock-in-optics.md The Hidden Tax in Your Transceiver Budget comparison 9/10
blog-003-silicon-photonics.md Silicon Photonics Is Shipping... technology_deep_dive 9/10
blog-004-400g-migration-fiber-plant.md Your 100G Fiber Plant Is Not Ready for 400G tutorial 9/10
blog-005-coherent-400zr-reality.md 400ZR Is Not What the Vendor Presentations Said technology_deep_dive 9/10
blog-006-dom-diagnostics.md Reading DOM Data Correctly tutorial 9/10
blog-007-800g-readiness.md 800G Is Shipping. Your Infrastructure Probably Isn't Ready. hype_cycle 9/10
blog-008-oem-vs-compatible-real-numbers.md OEM vs Compatible Transceivers: The Numbers Nobody Publishes buying_guide 9/10
blog-009-100g-to-400g-migration-what-breaks.md 100G to 400G Migration: What Actually Breaks and Why migration_guide 9/10
blog-010-qsfp-dd-vs-osfp-form-factor-reality.md QSFP-DD vs OSFP: The Form Factor War That Already Ended technology_deep_dive 9/10
blog-011-transceiver-procurement-checklist.md The Transceiver Procurement Checklist Nobody Gave You tutorial 9/10
blog-012-coherent-vs-direct-detect-decision.md Coherent vs. Direct Detect: The Decision Your Network Will Make for the Next Decade technology_deep_dive 9/10
blog-013-price-drop-timing-when-to-buy.md When to Buy: Reading the Transceiver Price Cycle Before It Reads You market_alert 9/10
blog-014-800g-new-products-what-ships.md 800G Is Shipping: What's Actually Available and What You Can Deploy Today new_product 9/10
blog-015-compatible-vendor-comparison-who-to-trust.md Compatible Transceiver Vendors in 2026: Who Does the Testing and Who Just Says They Do competitor_analysis 9/10

Quality Rules Met (per article)

All articles were generated under strict constraints:

  • No markdown headers (##, ###) anywhere in body
  • No bullet lists as structural elements
  • No LaTeX formulas
  • No banned AI phrases ("leverage", "optimize", "game-changer", etc.)
  • No spec dumps or comparison tables
  • No OEM pricing presented as compatible pricing
  • No sales language ("BUY / AVOID", verdict blocks)
  • DR4 connector: MPO-12 (never LC)
  • DR4 wavelength: 1310nm (never 1550nm)
  • 400ZR and DR4 treated as distinct technologies
  • No per-port power figures >25W
  • No made-up part numbers
  • Only CMOS/physics-grounded values
  • One core thesis per article
  • Flexoptix FINAL OUTCOME TEST: reader finishes ready to validate properly, not defaulting to OEM

Usage for BlogLLM Training

  1. Import these as positive examples into the fine-tuning dataset
  2. Each article is ~800-1200 words (production blog length)
  3. Type field maps to generation template types in fo-blog-pipeline.ts
  4. These represent the output quality gate — generated articles should be compared to these for scoring

Adding More Training Data

Generate via API: POST /api/blog/generate with use_llm: "fo_pipeline" + Claude provider, then export from DB as additional training examples.