Compare commits
25 Commits
main
...
feature/v0
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
aa977abc97 | ||
|
|
b238815cb5 | ||
|
|
9d9d9ed8ae | ||
|
|
5a0cbed5a2 | ||
|
|
f940bf2cd4 | ||
|
|
891bd018a8 | ||
|
|
1853d1c9f1 | ||
|
|
4b452ab49e | ||
|
|
83f4acc976 | ||
|
|
66b722a5e4 | ||
|
|
c6308e93c0 | ||
|
|
d43b98e91b | ||
|
|
46af736db3 | ||
|
|
9a5b21a19a | ||
|
|
a4f738d093 | ||
|
|
94c6b7f42d | ||
|
|
1b0b602aa4 | ||
|
|
f48a809e40 | ||
|
|
0a63307505 | ||
|
|
122ca8444d | ||
|
|
0260d0b365 | ||
|
|
a6f7968393 | ||
|
|
ae411cb575 | ||
|
|
649e6a9796 | ||
|
|
b43bdd3060 |
@ -1 +0,0 @@
|
|||||||
{"version":"0.0.1","configurations":[{"name":"dashboard","runtimeExecutable":"npx","runtimeArgs":["serve","-p","5555","packages/dashboard"],"port":5555}]}
|
|
||||||
11
.gitignore
vendored
11
.gitignore
vendored
@ -5,19 +5,8 @@ dist/
|
|||||||
.env*
|
.env*
|
||||||
.dev.vars
|
.dev.vars
|
||||||
*.local
|
*.local
|
||||||
wrangler.toml
|
|
||||||
ecosystem.config.js
|
|
||||||
|
|
||||||
# Downloaded product assets (images, PDFs)
|
# Downloaded product assets (images, PDFs)
|
||||||
assets/images/
|
assets/images/
|
||||||
assets/datasheets/
|
assets/datasheets/
|
||||||
assets/manuals/
|
assets/manuals/
|
||||||
|
|
||||||
# Crawlee runtime artifacts (all scraper-specific storage dirs)
|
|
||||||
storage/
|
|
||||||
storage-*/
|
|
||||||
.crawlee/
|
|
||||||
|
|
||||||
# Local credentials (never commit)
|
|
||||||
.tip/.env
|
|
||||||
run-fs-scraper-mac.sh.local
|
|
||||||
|
|||||||
@ -1,298 +0,0 @@
|
|||||||
# TIP Changelog
|
|
||||||
|
|
||||||
Format: `{"d":"YYYY-MM-DD","t":"TYPE","m":"Description"}`
|
|
||||||
{"d":"2026-05-14","t":"FEAT","m":"Equivalences Explorer: new dashboard tab '🔀 Equivalences' — search 63,362 cross-brand mappings (46 vendors, 7,516 competitor products → 846 Flexoptix alternatives, Ø 93.9% confidence). APIs: GET /api/equivalences (search), /api/equivalences/transceiver/:id (per-product), /api/equivalences/stats, /api/equivalences/top-vendors. Transceiver detail modal now shows equivalences panel (FX alternatives or competitor products) + SVG price history sparklines (30-day, per source vendor) from 392k+ price observations."}
|
|
||||||
{"d":"2026-05-14","t":"FEAT","m":"LinkedIn Distribution Status: Blog tab shows DRY_RUN badge, posted/dry_run/skipped/failed counters, history table with live URN links. GET /api/blog/linkedin/history reads blog_linkedin_distribution table + detects DRY_RUN mode from ecosystem config."}
|
|
||||||
{"d":"2026-05-14","t":"FEAT","m":"MCP Server: 2 new tools — find_equivalences (search 63k+ verified cross-brand mappings with confidence filter, returns FX alternatives + competitor matches formatted for LLM) + get_price_history (392k+ obs, daily series, per-vendor min/max/avg, cheapest source identification). Total: 21 MCP tools."}
|
|
||||||
{"d":"2026-05-14","t":"FIX","m":"Blog from URL: SPA-aware content extraction. fetchUrlContent() now extracts OG/meta tags (og:title, og:description, name=description, og:site_name) as fallback for JavaScript SPAs. Returns spaDetected=true when body text < 300 chars. from-url endpoint skips gatherBlogData() product injection when SPA detected — prevents fo-blog model from defaulting to optical networking domain on non-networking URLs. additionalContext now includes explicit SPA warning + meta content. generated_by in pipeline UPDATE uses active model name (no more hardcoded 'fo-blog-engine-v7'). Dashboard shows SPA warning toast + spa_detected field in response."}
|
|
||||||
{"d":"2026-05-14","t":"FEAT","m":"Blog Engine: URL → Blog feature. POST /api/blog/from-url fetches any URL server-side (20s timeout, redirect-follow), strips scripts/nav/footer/SVG, extracts readable text (~5000 chars) + page title, passes as structured additional_context to the 16-step FO blog pipeline. Dashboard: new '🔗 Blog aus URL generieren' panel with URL input (Enter key supported), Blog-Typ selector, loading state, and char count confirmation. Same pollBlogLlm() polling reused for step progress."}
|
|
||||||
{"d":"2026-05-14","t":"UI","m":"Switch modal Flexoptix section: (1) Speed formatting fixed — 1600.00G → 1.6T, 400G clean integer (fmtSpeed() helper, ≥1000 Gbps → T). (2) Lagerbestand badges added per transceiver row: DE-Lager (green), Global-Lager (blue), Zulauf with ETA date (yellow). Data sourced from stock_observations via LEFT JOIN LATERAL in getFlexoptixSuggestions(). Badges hidden when quantities are null/0 (scraper not yet populating Flexoptix warehouse columns — shows automatically once scraper is extended)."}
|
|
||||||
{"d":"2026-05-14","t":"FEAT","m":"Stock velocity API: GET /api/stock/velocity (paginated, filterable by vendor_id/confidence/stockout_days/min_sell_rate/part_number) + GET /api/stock/velocity/:id (per-product velocity summary + sell/zulauf event history). Both routes live in packages/api/src/routes/stock.ts, compiled + deployed to tip-api PM2 id 24, port 3201."}
|
|
||||||
{"d":"2026-05-14","t":"DATA","m":"Demo data cleanup: deleted 2133 demo rows from reorder_signals (is_demo_data=true). Stock observation coverage expanded: atgbics.ts + optcore.ts now call upsertStockObservation after each price observation (binary in/out stock, confidence=1). FS.com scraper already runs 3x daily from Mac (02:00/10:00/18:00) with full DE-Lager/Global-Lager/Nachlieferung breakdown. Competitor stock audit: QSFPTEK (confidence=2, real quantities), FS.COM (confidence=3, per-warehouse breakdown) are highest fidelity; ATGBICS/Optcore added at confidence=1 (binary); sfpcables/prolabs/wiitek hardcode or lack stock — not added."}
|
|
||||||
{"d":"2026-05-13","t":"FIX","m":"BlogLLM model version sync: dashboard FO_BlogLLM card now dynamically reflects the active Ollama model via /api/blog/llm/status (was hardcoded to fo-blog-v7). TIP ecosystem.config.js OLLAMA_LLM_MODEL + BLOG_LLM_MODEL bumped fo-blog-v7 → fo-blog-v10 (Mac Studio Magatama training adopted 2026-05-13 00:33 UTC). Persisted /opt/tip/blog-llm-settings.json overrode env — also updated. tip-api restarted, PM2 state saved."}
|
|
||||||
{"d":"2026-05-13","t":"FEAT","m":"BlogLLM auto-discovery: client.ts now probes Ollama at startup + every 10 min, reconciles configured fo-blog-vN against actual available tags, auto-falls to highest available version when configured model no longer exists. Magatama-aware sort: base 'fo-blog-vN' tag wins over '-rM' revisions within same N (matches Magatama adoption convention where -rM is intermediate adapter save, base is production alias). New POST /api/blog/llm/refresh-discovery endpoint for manual trigger. Eliminates 3-step manual sync after every Magatama training."}
|
|
||||||
{"d":"2026-05-13","t":"FIX","m":"Ollama Modelfile bug for fo-blog-v10: Mac Studio adoption registered model with template '{{.Prompt}}' instead of Qwen2.5 chat template — model returned empty responses to /api/chat. Recreated fo-blog-v10 via Ollama /api/create with correct ChatML template ({{- if .System}}<|im_start|>system ... <|im_end|>...), num_ctx=8192, stop=<|im_end|>, temperature=0.3. Smoke test: 45 tokens generated cleanly. Magatama-side adoption logic should be patched to emit correct template by default."}
|
|
||||||
{"d":"2026-05-13","t":"DATA","m":"Competitive naming sanitization + Anti-Naming-Policy training: (1) Sanitization sweep across all 244 JSONL training files: 97 Fs.com/FiberStore replacements with neutral 'unnamed third-party MSA-compatible vendor' across 15 active files (fo_blogllm, tip_llm pools + RunPod exports + historical RunPod pod-runs). All affected files backed up to .bak-fs-final/.bak-YYYYMMDD-HHMMSS. Post-sanitization verification: 0 assistant-content mentions of competitor brands across all 5 lanes (fo_blogllm, pulso_llm, tip_llm, magatamallm, contact_llm). Remaining FS mentions live only in system-prompt prohibition lists (anti-naming policy) and one magatamallm user-message context for legitimate internal SKU-matching research. (2) Anti-Naming-Policy training pairs added: 4 deep pairs for fo_blogllm (third-party market analysis, procurement strategy, coherent component stack), 3 pairs for pulso_llm (competitor-inquiry deflection, price-compare without naming, internal sales guidance), 1 pair for tip_llm (public research blog output with neutral language). All new system prompts contain explicit COMPETITIVE NAMING POLICY clause forbidding named mentions of Fs.com/FiberStore/Approved Networks/Cablexa/ProLabs/FluxLight + component suppliers Accelink/InnoLight/Lumentum/Coherent/II-VI/Eoptolink/Source Photonics. Switch and router OEMs (Cisco/Arista/Juniper/Nokia/Ciena/HPE/Dell/Mellanox/Extreme/Huawei) explicitly permitted as integration partners. Post-rebuild manifests: fo_blogllm 18757 effective, pulso_llm 3242 effective, tip_llm 2181 effective."}
|
|
||||||
{"d":"2026-05-13","t":"DATA","m":"FO_BlogLLM training corpus deep-quality expansion: 8 new training files in pulso_llm pool with 22 long-form (700-1000 word) blog pairs targeting fo-blog-v10 failure modes. Categories: (1) Connector Authority — MPO Type A/B/C polarity, IEC 61300-3-35 endface inspection, MPO-12 vs MPO-16, LC vs MPO architecture mapping; (2) Transceiver Taxonomy — full 100G/400G/800G variant matrix with reach, connector, lane structure, IEEE clauses; (3) Coherent Depth — coherent vs direct-detect crossover, OSNR engineering for ZR+, FEC types (cFEC/oFEC) and pre-FEC BER reality; (4) Power & Reach Ground Truth — accurate per-module power numbers 2026, OTDR commissioning workflow; (5) Operations Troubleshooting — pre-FEC BER climbs diagnostic walkthrough, module detection / coding mismatch fixes; (6) Topic Adherence — exact MPO Connector Survival Guide blog (the test prompt that failed in v10), Fiber Inspection Probes, Cable Routing for spine-leaf; (7) Standards Map — IEEE 802.3ba/cd/cu/df clause map, CMIS register layout; (8) Myth Corrections — DR ≠ Long Reach, LR vs ER vs ZR taxonomy, MPO-parallel vs LC-WDM architecture. All pairs include IEEE/OIF/MSA citations, real datasheet-equivalent numbers (TX/RX power, sensitivity, power consumption per module class). Pool now: 17936 train + 2018 eval = 19954 total after dedupe (123 duplicates removed). Next fo_blogllm training run picks up automatically."}
|
|
||||||
{"d":"2026-05-13","t":"FIX","m":"Magatama Mac Studio adoption template root cause patched: /opt/magatama/packages/fine-tuner/train.py register_ollama() built Modelfiles without TEMPLATE directive (only FROM/SYSTEM/PARAMETER) — Ollama defaulted to '{{.Prompt}}' which silently breaks /api/chat. Both modelfile_lines blocks (GGUF and fallback) now include the Qwen2.5 ChatML TEMPLATE plus full PARAMETER set (temperature 0.3 (was 0.1), top_p 0.9, num_ctx 8192, stop <|im_end|>). End-to-end test against Ollama API confirmed: model registers + /api/chat returns expected tokens. Future fo-blog-vN trainings (and any Magatama lane via Mac Studio path) will no longer produce silent-failure models. Backup at /opt/magatama/packages/fine-tuner/train.py.bak-20260513-153306. Local checkout synced. The other adoption path (/opt/llm-gateway/.../converter.py used by RunPod artifact import) already had TEMPLATE correct — no change there."}
|
|
||||||
{"d":"2026-04-26","t":"DATA","m":"Juniper OEM transceiver seed: 59 PIDs inserted (SFP-1GE/SFPP-10G/SFP-25G/QSFPP-40G/JNP-QSFP-100G/JNP-QSFP56-200G/JNP-QSFPDD-400G/JNP-OSFP-400G+800G + DAC/AOC). Scheduler: daily 04:15."}
|
|
||||||
{"d":"2026-04-26","t":"FIX","m":"BlueOptics scraper: force HTTP/1.1 via Node.js https.get() to bypass empty-body HTTP/2 server bug; updated catalog path to /Transceivers_1 (changed 2026)."}
|
|
||||||
{"d":"2026-04-26","t":"DATA","m":"Cisco TMG scraper: upsert logic fixed (market_status EOL + temp_range IND normalization). Full run in progress: 300+ switches, 15000+ compat matches written to switch_transceiver_compat."}
|
|
||||||
{"d":"2026-04-28","t":"INFRA","m":"Gitea repo tip-training-data created (https://gitea.context-x.org/rene/tip-training-data). Generated full-scope token via gitea admin CLI on 192.168.178.196."}
|
|
||||||
{"d":"2026-04-28","t":"AI","m":"crawler-llm/spec-validator.ts: transceiver physical plausibility validator — form factor↔speed matrix, wavelength↔fiber consistency, reach limits, IEEE standard cross-check, DAC/AOC rules. Outputs SpecValidationResult with tier (high/medium/low/rejected) + confidence_delta."}
|
|
||||||
{"d":"2026-04-28","t":"AI","m":"crawler-llm/training-data-writer.ts: TIPLLM SFT training data writer — generates spec_qa/crawl_reasoning/validation/discovery JSONL pairs from crawler extractions, git-commits and pushes to Gitea tip-training-data repo in batches of 50."}
|
|
||||||
{"d":"2026-04-28","t":"AI","m":"crawler-llm/vendor-discovery-crawler.ts: intelligent PlaywrightCrawler — vendor catalog URL → LLM extraction (core.ts) → spec validation → DB persist (findOrCreateScrapedTransceiver) + Gitea SFT training pairs. 8 vendor configs: Cisco/Juniper/Arista/FS.com/Flexoptix/Nokia/Huawei/II-VI."}
|
|
||||||
{"d":"2026-04-28","t":"INFRA","m":"scheduler.ts: 8 weekly vendor discovery jobs registered (discover:vendor:*), staggered Sun 20:00 – Mon 10:00 UTC. Total workers: 102."}
|
|
||||||
Types: FEAT · FIX · UI · DATA · AI · INFRA
|
|
||||||
|
|
||||||
{"d":"2026-04-25","t":"FEAT","m":"Standards Audit + Form Factors Reference: expanded standards from 40 to 63 (+23 new: full 200G tier SR4/DR4/FR4/LR4/ER4/CR4, PON family GPON/XG-PON1/NG-PON2/25G-PON, copper DAC variants CR4 for 25G/40G/100G/400G, 800G emerging FR4/LR8/CR8, 1.6TBASE-DR16 emerging). All 63 standards have bilingual plain-language descriptions (DE+EN, for non-technical colleagues). New form_factors table (migration 101) with 20 entries: SFP family SFP→SFP112, QSFP family QSFP+→QSFP-DD800, OSFP family OSFP→OSFP224, CFP family, legacy XFP/CXP — with full names, channel count, max speed, hot-swap flag, supersedes chain, status, and bilingual descriptions. New GET /api/form-factors endpoint. Dashboard Standards tab: descriptions shown as table row subtitles, Form Factors grid section with family color coding, speed/channel info, openFormFactorDetail panel."}
|
|
||||||
{"d":"2026-04-25","t":"FEAT","m":"Flexoptix Internal Demand Intelligence: imported real sales velocity (8.585 SKUs, 1.288 with demand>0) from AES-256-CBC encrypted XLSX export into flexoptix_internal_demand table (PostgreSQL RLS enabled, is_internal guard). 279 SKUs cross-referenced with transceiver catalog. New /api/internal/demand/* endpoints (by-speed, velocity, hype-weights, forecast-input) — localhost/LAN only + JWT auth. Forecast engine now calibrated with real Flexoptix run-rates (demand_calibrated:true). Dashboard Warehouse tab updated: real Flexoptix Sales Velocity panel with momentum indicators replaces DEMO DATA. Fast movers: 70 SKUs ≥100/mo (SFP 1G + SFP+ 10G dominate). Total throughput: 63.328 units/month (12m basis). Data never leaves private infrastructure."}
|
|
||||||
{"d":"2026-04-25","t":"DATA","m":"Migration 099: flexoptix_internal_demand schema — table + RLS policies + indexes + v_demand_by_speed aggregated view. Security: raw SKU rows never exposed publicly; RLS enforces is_internal=TRUE; IP restriction middleware on all internal API routes."}
|
|
||||||
{"d":"2026-04-25","t":"DATA","m":"Migration 098: +5 Cisco ASR 9903/9900 line card images (A9903-8HG-PEC, A9903-8HG-PEC-FC, A9903-20HG-PEC-FC, A99-12X100GE-FC, A99-32HG-FC) via eBay CDN (i.ebayimg.com, 171–302KB JPEGs). Coverage 663 → 668 (98.7%). Remaining 9 models confirmed no publicly accessible images: RA-B6920-4S (Ragile website unreachable), 8K-MPA-18Z1D (too new), Inventec D7332/D7264Q28B/D7054Q28B (Taiwan-hosted CDN), Datacom DM4610-48T6X, FiberHome CiTRANS 680, NEC PF5248, DZS OLT 9100."}
|
|
||||||
{"d":"2026-04-25","t":"DATA","m":"Migration 097: +2 whitebox switch images (Ragile RA-B6510-48V8C via unixsurplus.com BigCommerce CDN OEM-equiv hardware, Edgecore AS7946-74XKDB via eBay CDN). Remaining 7 non-Cisco models (Inventec D7332/D7264Q28B/D7054Q28B, Datacom DM4610-48T6X, FiberHome CiTRANS 680, NEC PF5248, DZS OLT 9100) have no publicly accessible product images."}
|
|
||||||
{"d":"2026-04-25","t":"DATA","m":"Migration 094 (fixed): +12 Cisco models (8K-MPA-4D/16H/16Z2D, A9K-8HG-FLEX-FC/SE/TR, A9K-400G-DWDM-TR, N9348Y12C-SE1, NC55-36X100G-A-SE, ASR-9000V-24-A, ASR-9000V-DC-E, ASR-9922-RP-TR). Fixed 4 bad URLs: Cisco DAM 404, fabricated BigCommerce URL, and 2 unreachable Magento/it-market URLs replaced with eBay CDN and NetworkTigers Shopify CDN."}
|
|
||||||
|
|
||||||
{"d":"2026-04-25","t":"FIX","m":"Umami analytics unreachable — UMAMI_PASS in /opt/tip/ecosystem.config.js was AES-256-CBC encrypted but TIP API passed it as plaintext to Umami → 401. Fixed: reset Umami admin password in DB with bcrypt hash, updated ecosystem.config.js UMAMI_PASS to plaintext, restarted tip-api with --update-env. Both http://localhost:3150 and https://analytics.fichtmueller.org now authenticate successfully. Dashboard sync-umami button no longer shows Fehler toast."}
|
|
||||||
{"d":"2026-04-25","t":"DATA","m":"Migration 093: NCS 5500 main line card image backfill — 6 models: NC55-18H18F (it-market.com 7KB), NC55-24X100G-SE (networkgenetics.net BigCommerce 317KB PNG), NC55-32T16Q4H-A (hummingbirdnetworks.com BigCommerce 32KB), NC55-36X100G-S (dedicatednetworksinc.com WordPress 1.2MB PNG), NC55-MOD-A-S (stack-systems.com Magento 5.5KB), NC55-MOD-A-SE-S (core92.com Odoo 52KB). Coverage: 644 → 650 (95.1% → 96.0%)."}
|
|
||||||
{"d":"2026-04-25","t":"DATA","m":"Migration 092: Cisco remaining image backfill — 48 models: A99-* ASR9900 line cards (27 models via router-switch.com CDN + SQL subquery from A9K equivalents + NetworkOutlet/ZionNetworking/NetworkTigers CDN), C9600-LC-24C/40YL4CD/48S/48YL + C9600-SUP-1 + C9600X-LC-32CD/56YL4C + C9600X-SUP-2 (NetworkTigers + ITBargainCenter CDN), NC55-MPA-4H-S/12T-S/2TH-S/1TH2H-S/4H-HX-S/4H-HD-S/2TH-HX-S + NC55-OIP-02 (router-switch.com 157KB), NC55-24H12F-SE (networkgenetics.net BigCommerce), A900-IMA-8CS1Z/8Z/8Z-L (NetworkTigers + router-switch), A9903-20HG-PEC (TopParagonResource webp), ASR-9922-RP-SE (NetworkTigers v=1748329200), A9K-4HG-FLEX-X-FC/SE (subquery). Coverage: 597 → 644 (88.2% → 95.1%)."}
|
|
||||||
{"d":"2026-04-21","t":"DATA","m":"Migration 091: Arista + Juniper SONiC HCL models — 7 models: 7060CX-32S (networktigers.com DCS-7060CX-32S-F), 7050QX-32 (networktigers.com DCS-7050QX-32-F), 7050QX-32S (networktigers.com DCS-7050QX-32S-F), 7170-32CD (networktigers.com DCS-7170-32C-F-new), 7280CR3-32D4 (networktigers.com DCS-7280CR3K-32D4-F, CR3K same chassis), QFX5200-32C-S (networktigers.com QFX5200-32C-AFO), QFX5210-64C (networktigers.com QFX5210-64C-AFO). All from SONiC HCL device list. Coverage: 616 → 623 (estimated)."}
|
|
||||||
{"d":"2026-04-21","t":"DATA","m":"Migration 090: Edgecore AS-series SONiC switches — 7 models: AS7312-54X (stordis.com WebP, XS chassis successor), AS7312-54XS (stordis.com 64KB WebP), AS7326-56X (edge-core.com DCS203-F 83KB PNG), AS7716-32X (stordis.com 50KB WebP), AS7816-64X (edge-core.com DCS500-A 99KB PNG), AS9716-32D (edge-core.com DCS510-A 78KB PNG), AS7512-32X (epsglobal.com 26KB JPEG). All from SONiC HCL Accton/Edgecore vendor. Estimated coverage: 609 → 616 (speculative, pending DB query)."}
|
|
||||||
{"d":"2026-04-21","t":"DATA","m":"Migration 089: Arista/Cisco/Juniper batch — 8 models: 7800R4-36D2-LC (Arista, arista.com official LC image 15KB PNG), 8101-32FH (Cisco 8000, router-switch.com 57KB JPEG), 8111-32EH (Cisco 8000, stack-systems.com Magento CDN 9.6KB JPEG), C9300X-24Y (networktigers.com 64KB JPEG), C9500-48Y4C (networktigers.com 50KB JPEG), N9K-C93108TC-FX3P (networktigers.com full-res 78KB JPEG), PTX10001-36MR (juniper.net image library Azure CDN 112KB JPEG), PTX10004 (juniper.net image library lbox variant 138KB JPEG). Coverage: 601 → 609 (89.6% → 90.8%)."}
|
|
||||||
{"d":"2026-04-21","t":"DATA","m":"Migration 088: Ubiquiti/Phoenix Contact — 3 models: USW-Enterprise-48-PoE (cdn.ecomm.ui.com 331KB PNG), USW-Aggregation (cdn.ecomm.ui.com 285KB PNG), FL SWITCH 7528-2S (rspsupply.com distributor CDN 94KB JPEG, Phoenix Contact product ID 2891026). Coverage: 598 → 601 (89.1% → 89.6%)."}
|
|
||||||
{"d":"2026-04-21","t":"DATA","m":"Migration 087: Cisco ASR 9000 license-variant reuse — 3 models: A9K-4HG-FLEX-FC (FC license = same hardware as -TR), A9K-8X100G-LB-TR (TR license = same hardware as -SE), A9K-4X100GE base (pre-licensing-split catalog number = same hardware as -SE). All physically identical to covered siblings. Coverage: 595 → 598 (88.7% → 89.1%)."}
|
|
||||||
{"d":"2026-04-21","t":"DATA","m":"Migration 086: Mixed vendor batch — 8 models: CX732Q-N/CX564P-N (Asterfusion, asterfusion.com WP CDN, 267KB+768KB PNG), 7130-48LB (Arista, Alta Technologies reseller CDN, 70KB PNG), ICX 7850-48FS (Ruckus, networktigers.com, 55KB JPEG), E810-CQDA2 (Intel, esaitech.com Shopify CDN, 78KB JPEG), FSP 3000 CloudConnect (Adtran/ADVA, nwrusa.com Shopify CDN, 1.8MB JPEG), MediorNet MicroN UHD (Riedel, riedel.net fileadmin CDN, 279KB PNG), nGeniusONE InfiniStreamNG (Netscout, netscout.com Pantheon CDN, 85KB WebP). Coverage: 587 → 595 (87.5% → 88.7%)."}
|
|
||||||
{"d":"2026-04-21","t":"DATA","m":"Migration 085: Mixed straggler batch — 6 new models + 3 Celestica CDN upgrades. New: N540-24Q2C2DD-SYS (Cisco TD CDN 524146.jpg, 213KB), Apollo 9900 Series/Ribbon (ribboncommunications.com Drupal CDN, 207KB), Seastone2/Celestica (ServeTheHome DX010 review photo, 108KB), Midstone-200i/Celestica (DS3001 equivalent, celestica.com/uploadedImages), RA-B6510-48V8C/Ragile (Micas M2-W6510 ODM equivalent, BigCommerce CDN, 334KB), QuantaMesh T7064-IX1D/QCT (hyperscalers.com T7064-IX4 EOL family image). Upgrades: DS3000/DS4000/DS5000 foleon CDN → celestica.com/uploadedImages. Coverage: 581 → 587 (86.6% → 87.5%)."}
|
|
||||||
{"d":"2026-04-21","t":"DATA","m":"Migration 084: Cisco ASR 9000 bulk — 62 A9K-* models: 8T/4-B/E/L (slash-format), 16T/8-B, 1X/2X/8X/16X 100GE line cards, 8X100G-LB-SE, 20HG-FLEX-FC/SE/TR, 24X10GE-1G-FC/SE/TR, 24X10GE-SE/TR, 2T20GE-B/E/L, 36X10GE-SE/TR, 40GE-B/E/L/SE/TR, 48X10GE-1G-FC/SE/TR, RSP440-LT/SE/TR, RSP880-SE/TR, SIP-700, MPA-1X100GE/1X200GE/1X40GE/20X10GE/20X1GE/2X100GE/2X10GE/2X40GE/32X1GE/4X10GE/8X10GE + 6 FC-license MPA variants (same hardware). Sources: networktigers.com Shopify CDN + cdn.shopify.com (networktigers raw CDN) + router-switch.com Magento CDN + cloudappliances.co.uk (WebP 20HG-FLEX). Coverage: 519 → 581 (77.3% → 86.6%)."}
|
|
||||||
{"d":"2026-04-21","t":"DATA","m":"Migration 083: Cisco NCS 5700 line cards (NC57-18DD-SE/24DD/36H-SE/36H6D-S/48Q2D-S/48Q2D-SE-S/MOD-S/MOD-SE-S) + MPA modules (NC57-MPA-12L-S/1FH1D-S/2D4H-S) + NCS 1014 optical transport (NCS1K14-2.4T-K9/L-K9/X-K9/TXL-K9) — 15 models. Sources: cisco.com/c/dam TD CDN (numbered image IDs 520004–523807) + Cisco datasheet-c78-742016 JCR renditions (embedded PNG from NCS 5700 product datasheet). Coverage: 504 → 519 (75.1% → 77.3%)."}
|
|
||||||
{"d":"2026-04-21","t":"DATA","m":"Migration 082: Cisco ASR 9000 line cards (A9K-4HG-FLEX-TR/SE, A9K-4T-B/E/L, A9K-4T16GE-SE/TR, A9K-4X100GE-FC/SE/TR, A9K-MOD400-SE/TR, A9K-MOD80-SE/TR) — 13 models (4 skipped: A9K-8T-B/E/L use slash-format names; fixed in 084). Sources: networktigers.com Shopify CDN + router-switch.com Magento CDN. Coverage: 491 → 504 (73.2% → 75.1%)."}
|
|
||||||
{"d":"2026-04-21","t":"DATA","m":"Migration 081: Cisco NCS 560 IMA modules (N560-IMA-1W/2C/2C-DD/8Q/4L), NCS 540 fixed (N540-24Q8L2DD/6Z14S/6Z18G-A/D/FH-AGG/FH-CSR), NCS 540X (N540X-4Z14G2Q/6Z18G/8Z16G/12Z16G/16Z8Q2C/ACC), ASR-9900-RP-SE/TR — 22 new + upgrade ASR-9902/9903 NCS1001/1002 to official Cisco CDN. Sources: cisco.com/c/dam TD CDN, manualslib.com, eBay CDN, signellent.com, brightstarsystems.com. Coverage: 469 → 491 (69.9% → 73.2%)."}
|
|
||||||
{"d":"2026-04-21","t":"DATA","m":"Migration 080: Avaya (ERS 4950GTS-PWR+), Advantech (EKI-7720G-4FI, EKI-9516G-4GMXP) — 3 images. Sources: planetrefurbished.com Shopify CDN + advdownload.advantech.com official product CDN. Coverage: 466 → 469 (69.4% → 69.9%)."}
|
|
||||||
{"d":"2026-04-21","t":"DATA","m":"Migration 079: WAGO (852-1505), WatchGuard (Firebox M5800), ADTRAN (NetVanta 1560-48P), Phoenix Contact (FL SWITCH 4808E-16FX-4GC) — 4 images. Sources: gilautomation.com Shopify CDN, watchguard.com help-center CDN (1MB PNG), portal.adtran.com ProductCatalog, rspsupply.com. Coverage: 462 → 466 (68.9% → 69.4%)."}
|
|
||||||
{"d":"2026-04-21","t":"DATA","m":"Migration 078: Cisco ASR-9001-S, ASR-9000V-AC/DC-A, A9KV-V2-DC-A/DC-E (specific images), NCS1001-K9, NCS1002-K9, NCS1K-EDFA, and 13 NCS1K4/NCS1004 chassis variants — 21 models. Sources: networktigers.com Shopify CDN + router-switch.com Magento CDN (145KB NCS1004 chassis). Coverage: 443 → 462 (66.0% → 68.9%)."}
|
|
||||||
{"d":"2026-04-21","t":"DATA","m":"Migration 077: Barracuda Networks (CloudGen Firewall F900), Peplink (SD Switch 24-Port), Westermo (Lynx 5612-F4G-T8G) — 3 images. Sources: cdn.blueally.com partner CDN + westermo.eworldme.com Shopify CDN. Coverage: 440 → 443 (65.6% → 66.0%)."}
|
|
||||||
{"d":"2026-04-21","t":"DATA","m":"Migration 076: QCT/QuantaMesh (T7032-IX1, T7064-IX4, T9032-IX9), QNAP (QSW-M5216-1T), Sophos (CS210-48FP) — 5 images. Sources: qct.io CDN, hyperscalers.com (T7064-IX4 EOL), cdn.blueally.com qnapworks, Sophos ContentStack CDN. Coverage: 435 → 440 (64.8% → 65.6%)."}
|
|
||||||
{"d":"2026-04-21","t":"DATA","m":"Migration 075: Cisco Nexus 9300 SE1 series (N9324C-SE1U, N9336C-SE1, N9348Y2C6D-SE1U, N9364E-SG2-O, N9364E-SG2-Q, N9396T12C-SE1, N9396Y12C-SE1) — 7 images. Sources: cisco.com/content/dam/cisco-cdc poster-image CDN + cisco.com/c/dam/assets support CDN. Coverage: 428 → 435 (63.8% → 64.8%)."}
|
|
||||||
{"d":"2026-04-21","t":"DATA","m":"Migration 074: Extreme Networks (SLX 9740-40C, X695-48Y-8C, 5520-48T), Ruijie (RG-S6920-4C, RG-S5760C-24SFP/8GT8XS-X), Ruckus (ICX 7150-48PF, ICX 7550-48ZP), ZTE (ZXR10 5960-56PM-H, ZXR10 9908), Edgecore (AS7712-32X) — 10 images. Coverage: 418 → 428 (62.3% → 63.8%)."}
|
|
||||||
{"d":"2026-04-21","t":"DATA","m":"Migration 073: Cisco ASR 9000 + NCS 540 series images — 18 standalone chassis entries (ASR-9001/9901/9902/9903 + A9KV-V2 variants + N540/N540X variants). Sources: networktigers.com Shopify CDN + tempestns.com WP CDN. Coverage: 400 → 418 (59.6% → 62.3%)."}
|
|
||||||
{"d":"2026-04-21","t":"INFRA","m":"12-hour DB backup to home server (Mac Studio): /opt/tip/backup-db.sh script + cron 0 0,12 * * * on Erik. First backup: 32MB gzipped pg_dump, rsync over WireGuard VPN, keeps last 10 remote + 5 local."}
|
|
||||||
{"d":"2026-04-21","t":"DATA","m":"Migrations 065-072 applied to production DB: Cisco (14 models), Juniper (10), Arista remaining (11), NVIDIA/Mellanox (5), Huawei/Nokia (7), Dell/Extreme (7), HPE Aruba/Ubiquiti/Supermicro (8), Celestica/Asterfusion/FS.com/Edgecore (10). Total: +73 switches with images on live DB."}
|
|
||||||
{"d":"2026-04-21","t":"FIX","m":"Volume data loss prevention: docker-compose.yml updated to use external: true volumes with explicit names (tip_tip_pgdata, tip_tip_qdrant) + restart: unless-stopped on postgres/qdrant. Prevents accidental volume recreation on compose up."}
|
|
||||||
{"d":"2026-04-21","t":"FIX","m":"Production DB restored from correct volume: tip_tip-pgdata (354MB, 8995 transceivers, 350 vendors, 97K price obs) rsync'd to tip_tip_pgdata (which was a fresh empty DB from wrong compose config). DB now has correct data. Standards (029-seed-standards.sql) applied: 40 rows."}
|
|
||||||
{"d":"2026-04-21","t":"INFRA","m":"GitHub push gates hardened: pre-push hook installed in transceiver-db repo (.git/hooks/pre-push) — triple security scan (secrets/private IPs/config values) runs before every push to GitHub public repo."}
|
|
||||||
{"d":"2026-04-18","t":"AI","m":"Blog LLM: claude-code provider implemented in packages/api/src/llm/client.ts — routes BLOG_LLM_PROVIDER=claude-code to claude-bridge (http://localhost:3250/api/generate) on Erik using Claude Code flat-rate subscription. No API billing. checkHealth() pings /health endpoint. Dashboard updated: added claude-code card (EMPFOHLEN, AKTIV), fo-blog-v3-qwen7b card replaced with fo-blog-v5, loadBlogLLMStatus() now handles claude-code provider with correct badge/border highlighting. ecosystem.config.js + .env updated: OLLAMA_LLM_MODEL=fo-blog-v5, BLOG_LLM_PROVIDER=claude-code confirmed active via pm2 env."}
|
|
||||||
{"d":"2026-04-18","t":"FIX","m":"Cloudflare Tunnel DNS mass-update: after deleting phantom eo-pulse tunnel and creating main-prod (90c22eb0), 31 context-x.org + 7 fichtmueller.org DNS records still pointed to the deleted 641c39a5 tunnel → 530 on all services. Bulk-patched via Cloudflare API: all records now point to main-prod. Created missing admin.magatama.fichtmueller.org CNAME. TIP cloudflared-tip.service restart policy changed to Restart=always (was on-failure, so clean exits caused permanent outage). peercortex.org remains 530 — DNS is in a separate inaccessible Cloudflare account (NS: fattouche/elisabeth.ns.cloudflare.com); needs manual login."}
|
|
||||||
{"d":"2026-04-18","t":"DATA","m":"Image backfill: GBICS og:image + QSFPTEK backfill scripts run on Erik — 226 new images added (671 → 897 total, 17.5% → 23.4% coverage). OSFP form factor: 0 → 68 images. QSFPTEK og:image URL bug fixed (double-hostname prefix stripped). OSFP-DR8-800G manually set to GBICS-compatible image (cdn11.bigcommerce.com DR8 product photo)."}
|
|
||||||
{"d":"2026-04-18","t":"FIX","m":"FS.com scraper: all 247 prices written as €79 (wrong) — root cause: 'Gratis Versand ab 79 € (ohne MwSt.)' free-shipping banner appears on every FS.com product page. PRICE_QUALIFIED bodyText regex matched this banner text before reaching the actual product price. Fix: (1) DOM-based price extraction added to page.evaluate — targets [class*='price-value']/[class*='product-price'] etc., skipping elements inside shipping/banner/footer parents; (2) bodyText qualified patterns now check 200-char context for versand/shipping/gratis keywords and skip matches that appear in shipping context; (3) waitForSelector for price elements added before evaluate; (4) deleted 247 invalid €79 observations from DB."}
|
|
||||||
{"d":"2026-04-18","t":"FIX","m":"has_image flag desync: 671 transceivers had image_url set but has_image=false. Fixed: (1) db.ts findOrCreateScrapedTransceiver now sets has_image=true, image_verified=true on both INSERT (ON CONFLICT DO UPDATE) and UPDATE path; (2) DB bulk UPDATE SET has_image=true WHERE image_url IS NOT NULL AND has_image=false (632 rows fixed)."}
|
|
||||||
{"d":"2026-04-18","t":"FIX","m":"Fiber type missing for 400G/800G parallel-optic modules (DR8/SR8/FR8 etc.): spec-updater parseSpecTable did not recognize standard abbreviations. Added DR/FR/LR/ER/ZR → SMF and SR → MMF patterns for both 'Fiber Type' field values and part-number-style keys. DB bulk UPDATE applied: 55 transceivers set to SMF, 20 to MMF."}
|
|
||||||
{"d":"2026-04-18","t":"FIX","m":"Dashboard blog generation: both generateBlog() and generateBlogManual() were calling POST /api/blog/generate without Authorization: Bearer header. requireAuth middleware correctly returned 401, shown as 'Unauthorized — please log in' toast. Fixed: read loadToken() before each fetch and include token in header. Also added r.status===401 guard to redirect to login page on token expiry."}
|
|
||||||
{"d":"2026-04-18","t":"FIX","m":"PM2 SKIP_FS_SCRAPER env not picked up by tip-scraper-daemon: pm2 restart --update-env did not apply new ecosystem.config.js vars because PM2 loaded from its saved dump. Fixed: pm2 delete + pm2 start ecosystem.config.js --only tip-scraper-daemon + pm2 save. Daemon restarted fresh (ID 83, 0 restarts) with SKIP_FS_SCRAPER=true now confirmed live. FS.com job now correctly skips on Erik instead of failing with ENOENT."}
|
|
||||||
{"d":"2026-04-18","t":"FIX","m":"FS.com Mac scraper: suppress Crawlee post-run ENOENT unhandledRejection — Crawlee's FileSystemStorage fires a final _isTaskReadyFunction call after run() resolves, reading a request .json that was already processed/cleaned-up. This ENOENT triggered process.exit(1) before Phase 2 completed, causing 7 days of missing FS.com price data. Fixed: targeted unhandledRejection handler in require.main block swallows ENOENT from request_queues paths while re-raising real errors."}
|
|
||||||
{"d":"2026-04-18","t":"FIX","m":"FS.com Mac scraper: PID lock (/tmp/tip-fs-scraper.lock) added to run-fs-scraper-mac.sh — prevents concurrent instances when launchd 2am fire overlaps with a still-running earlier run. Previous concurrent instances caused rmSync(storage-fs-phase1) race (one instance deletes the storage dir while another is using it), crashing Phase 2."}
|
|
||||||
{"d":"2026-04-18","t":"FIX","m":"Scraper health monitor: tiered alerts replacing false-positive 6h threshold. Old: fired every 3h for any vendor with 0 new prices (including stable prices). New: 🔴 CRITICAL (last price >7 days), 🟡 WARNING (last price 48h-7 days), ✅ STABLE (0 new prices but last price ≤48h — content hash dedup, scraper running OK). Shows pg-boss job state+time for faster root-cause."}
|
|
||||||
{"d":"2026-04-18","t":"FIX","m":"Daemon stability: global unhandledRejection handler in scheduler index.ts — Crawlee post-run lock-file ENOENT (request_queues path) was crashing the daemon process (process.exit(1)) which killed all active pg-boss jobs and triggered PM2 restart loops. Fix: swallow ENOENT from request_queues paths at the scheduler level; re-raise all other rejections. Also: FS.com scheduler worker now skips (SKIP_FS_SCRAPER=true env var) on Erik where Cloudflare WAF blocks datacenter IPs; Mac launchd handles FS.com scraping. Created missing Crawlee storage dirs: storage-fs-phase1, storage-fs-phase2, storage-ebay-transceivers, storage-fs. Health monitor pg-boss lookup extended from 12h → 26h; added completedMap; vendors with recent job completion + historical prices classified as STABLE (not CRITICAL) — eliminates ATGBICS/Fluxlight false-positive alerts."}
|
|
||||||
{"d":"2026-04-18","t":"FIX","m":"Playwright chromium_headless_shell-1217 installed on Erik (/opt/playwright-browsers/): ATGBICS and FS.com PlaywrightCrawler were throwing BrowserLaunchError on every run since Crawlee browser-pool requires chromium_headless_shell binary, not regular chromium. Fixed by: PLAYWRIGHT_BROWSERS_PATH=/opt/playwright-browsers npx playwright install chromium --with-deps on Erik."}
|
|
||||||
{"d":"2026-04-18","t":"FIX","m":"Crawlee withIsolatedStorage global env-var race condition eliminated: scheduler.ts removed all withIsolatedStorage() wrappers (were mutating process.env.CRAWLEE_STORAGE_DIR globally, causing concurrent scrapers to pick up wrong storage dirs). All Crawlee scrapers now use makeCrawleeConfig(name) instance-level Configuration. fs-com.ts migrated to fs-phase1/fs-phase2 storage names with rmSync cleanup before each phase. switch-assets-crawler.ts and switch-assets-playwright.ts now pass makeCrawleeConfig. Fixed: ATGBICS, community-issues, market-intel, FS.com."}
|
|
||||||
{"d":"2026-04-18","t":"FIX","m":"FS.com Mac launchd scraper (org.tip.fs-scraper): was failing with exit code 126 (Operation not permitted) because macOS TCC blocks launchd agents from accessing ~/Desktop/Claude Code/ path. Fixed: script moved to ~/.tip/run-fs-scraper-mac.sh, plist WorkingDirectory changed to ~/.tip. FS.com is now scraping from residential Mac IP again."}
|
|
||||||
{"d":"2026-04-18","t":"FIX","m":"NADDOD stockLevel 'unknown' -> 'on_request': invalid value for price_observations_stock_level_check constraint — was causing all NADDOD price insertions to fail."}
|
|
||||||
{"d":"2026-04-18","t":"FIX","m":"Crawlee makeCrawleeConfig: clear request_queues/default before each run — Crawlee FileSystemStorage marks URLs as HANDLED (state=4, orderNo=null) after processing. With purgeOnStart=false these entries persisted, so next crawler.run(startUrls) deduplicated all startUrls → requestsTotal=0 → immediate finish with 0 scraped pages. Fix: rmSync(request_queues/default) at start of makeCrawleeConfig(). Safe: session pool lives in key_value_stores/, not request_queues/. ATGBICS confirmed fixed: now scrapes 6 categories, 78 products, 33 unique with prices."}
|
|
||||||
{"d":"2026-04-18","t":"FIX","m":"Optcore scraper: add SKIP_OPTCORE_SCRAPER guard — optcore.net Cloudflare WAF blocks Erik IP (82.165.222.127). WP REST API returns 403/HTML block page, catch handler returns 0 URLs → 0 products every run. Set SKIP_OPTCORE_SCRAPER=true in ecosystem.config.js. Pattern mirrors SKIP_FS_SCRAPER. Residential IP (Mac launchd) required for Optcore."}
|
|
||||||
{"d":"2026-04-18","t":"FIX","m":"10Gtek scraper: was finding 152 products but 0 prices — 10gtek.com main site only shows technical spec tables (no prices). Rewrote scraper to target sfpcables.com (10Gtek's own retail store, same company) which exposes Magento product listings with Model: <part> + US$X.XX prices. Added Magento loop-detection via seen-part dedup (stops pagination when all products on a page were already seen). XFP title-after-pipe fallback for part number extraction. Removed QSFP-DD (not on sfpcables.com). Result: 50 products, 49 prices on first live run. Health monitor CRITICAL alert resolved."}
|
|
||||||
{"d":"2026-04-18","t":"FEAT","m":"Price Comparison Dashboard: public /api/price-comparison (summary, list top-50 SKUs by vendor coverage, per-SKU detail). Express Router, no auth required. New '💲 Price Comparison' dashboard tab with stat cards, form-factor breakdown table, top-50 SKU table (clickable rows), and SKU detail lookup with per-vendor prices + stock + spread %."}
|
|
||||||
{"d":"2026-04-18","t":"DATA","m":"Eoptolink OEM catalog scraper: harvests 93 product-solution pages from eoptolink.com, extracts part numbers (EOLO-*/EOLQ-* format), seeds transceivers table as manufacturer=Eoptolink entries with form_factor/speed/fiber/category. No prices (B2B OEM). Scheduled every 4h (40 */4 * * *)."}
|
|
||||||
{"d":"2026-04-18","t":"FIX","m":"stock_observations repopulated after TRUNCATE: storage-fs/request_queues/default/ directory re-created on Erik; NADDOD scraper manual-triggered; 4+ prices confirmed written within 20s."}
|
|
||||||
{"d":"2026-04-17","t":"FEAT","m":"MCP Server v0.2.0: wired finder.ts (find_flexoptix_for_switch, get_competitor_alerts), switch-docs (get_switch_docs, get_switch_image), analyze_market_with_llm (qwen2.5:14b via Ollama, enriched with live hype cycle + pricing + news), generate_blog_post (fo-blog-v5 fine-tuned model with qwen2.5:14b fallback + live pricing enrichment). OLLAMA_BASE_URL env var for Ollama endpoint."}
|
|
||||||
{"d":"2026-04-17","t":"UI","m":"Stock dashboard: 6th stat card (Multi-Vendor SKUs), confidence quality badge column in vendor breakdown (🟢 L3 per-warehouse / 🟡 L2 aggregated / ⚪ L1 boolean), new Multi-Vendor Price Comparison table with min/max/avg per SKU. Subtitle updated to mention QSFPTEK + NADDOD."}
|
|
||||||
{"d":"2026-04-17","t":"FEAT","m":"/api/stock/summary enhanced: vendor_breakdown adds avg_confidence + currencies + confidence breakdown (conf_per_warehouse/aggregated/boolean); new price_comparison endpoint (top 50 SKUs tracked by 2+ vendors with price spread); totals adds multi_vendor_skus count."}
|
|
||||||
{"d":"2026-04-17","t":"DATA","m":"Cisco TMG expanded to 17 platform families (+5 new: 8000 Series, NCS5500, NCS540, NCS560, NCS1000). Per-device query strategy replaces family-level search: iterates all switch IDs from filter → 58 switches per N9300 vs 1 before. 856 compat entries / 174 switches after re-run."}
|
|
||||||
{"d":"2026-04-17","t":"DATA","m":"Juniper HCT scraper run: 475 Juniper-brand transceivers seeded into transceivers table (form factor, speed, reach, fiber type from apps.juniper.net/hct). No prices (OEM). Scheduled to run at 6:15 + 18:15 daily."}
|
|
||||||
{"d":"2026-04-17","t":"DATA","m":"Competitor research: QSFPTEK shows real-time aggregated stock count (e.g. '5507 in real-time stock, 17 Apr 2026') + USD prices; NADDOD shows exact per-product counts ('In Stock: 543') via Astro SSR. Both scraped publicly, no login required. Flexoptix confirmed exact Lagerbestand + EUR prices. FS.com: EUR prices yes, exact counts no."}
|
|
||||||
{"d":"2026-04-17","t":"DATA","m":"stock_observations selective cleanup + schema upgrade: TRUNCATE stock_observations (186 FS.com test-run rows cleared, will repopulate on next launchd run). Added 4 new quality columns via migration 038: stock_confidence (1=boolean/2=aggregated/3=per-warehouse), price_currency CHAR(3), price_includes_tax BOOLEAN, stock_vendor_ts TIMESTAMPTZ."}
|
|
||||||
{"d":"2026-04-17","t":"FEAT","m":"Migration 028 retroactively committed to repo (028-stock-observations-warehouse-columns.sql) — documents the 10 warehouse columns applied directly to Erik DB. Guards with IF NOT EXISTS for safe re-application."}
|
|
||||||
{"d":"2026-04-17","t":"FEAT","m":"upsertStockObservation upgraded: new optional params stockConfidence (1|2|3), priceCurrency (ISO 4217), priceIncludesTax (boolean), stockVendorTs (timestamptz). FS.com now writes stockConfidence=3+priceCurrency=EUR+priceIncludesTax=false. Delta detection now also checks quantity_available changes."}
|
|
||||||
{"d":"2026-04-17","t":"FEAT","m":"QSFPTEK scraper v2: Phase 1 uses existing /mall/commodity/list API for product catalog (880+ products from sitemap). Phase 2 fetches /en/product/XXXXX.html detail pages to extract 'X in real-time stock, DATE' — writes stock_observations with stockConfidence=2 + stockVendorTs. Up to 500 detail pages per run at 2s rate limit."}
|
|
||||||
{"d":"2026-04-17","t":"FEAT","m":"NADDOD scraper v2: complete rewrite — migrated from WooCommerce category scraping to Astro sitemap-based discovery (/sitemaps/products.xml, /products/XXXXX.html). Extracts 'In Stock: X' exact counts from server-rendered HTML. Writes both price_observations (USD) and stock_observations (stockConfidence=1 or 2 depending on data visibility)."}
|
|
||||||
{"d":"2026-04-17","t":"DATA","m":"FS.com first warehouse data load: 268 products scraped, 186 stock_observations written — DE-Lager 128,428 units, Global-Lager 156,052 units, Backorder 37,495, 53.4M units sold total. Top seller: SFP-10GSR-85 with 14M units sold."}
|
|
||||||
{"d":"2026-04-17","t":"FIX","m":"upsertStockObservation: skip condition now includes backorder_qty — backorder-only products (DE=0 GL=0 BO>0) like coherent ZR/ZRH were silently dropped instead of being recorded"}
|
|
||||||
{"d":"2026-04-17","t":"FIX","m":"FS.com price extraction: broad fallback regex now only accepts prices >€100 to reject FS.com's €79 'Preis auf Anfrage' placeholder — prevents fake price observations on 1G/10G/25G/40G/100G transceivers"}
|
|
||||||
{"d":"2026-04-17","t":"UI","m":"Dashboard: stock observations count in header stats bar + warehouse stock summary card in Overview tab (hidden until stock_observations populated); both driven by /api/health stock block"}
|
|
||||||
{"d":"2026-04-17","t":"FEAT","m":"Health API: /api/health now includes stock block — total_observations, transceivers_with_stock, vendors_with_stock, total_de_qty, total_global_qty, last_observation_at from stock_observations"}
|
|
||||||
{"d":"2026-04-17","t":"INFRA","m":"FS.com Mac-side runner: launchd plist at 02:00/10:00/18:00 + run-fs-scraper-mac.sh via SSH tunnel to Erik DB port 5433 — residential IP required, datacenter IP blocked by FS.com Cloudflare WAF"}
|
|
||||||
{"d":"2026-04-17","t":"FEAT","m":"Stock API: GET /api/stock, /api/stock/summary, /api/stock/:id — warehouse breakdowns (DE-Lager, Global-Lager, Nachlieferung, units_sold) per transceiver/vendor"}
|
|
||||||
{"d":"2026-04-17","t":"DATA","m":"upsertStockObservation() in db.ts — writes 10 new stock_observations columns (warehouse_de_qty, warehouse_global_qty, backorder_qty, units_sold, compatible_brands, price_net, product_url, delivery dates)"}
|
|
||||||
{"d":"2026-04-17","t":"DATA","m":"FS.com scraper v2: Playwright-based, extracts DE-Lager + Global-Lager + Nachlieferung + Verkauft counts, German number/date parsing, 120-URL pre-queue, 12-category crawl, 12h dedup window"}
|
|
||||||
{"d":"2026-04-17","t":"FIX","m":"SmartOptics scraper v2: WooCommerce REST API fallback + 8 catalog categories + relative URL regex fix — was finding only 8 products, now discovers full catalog"}
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
{"d":"2026-04-12","t":"FIX","m":"DB functions compute_transceiver_verification() + compute_transceiver_verification(uuid): both now require competitor_verified as 4th criterion for fully_verified — was silently ignoring competitor check and granting ★ 100% badge based on only 3 criteria"}
|
|
||||||
{"d":"2026-04-12","t":"FEAT","m":"Scheduler: maintenance:reconcile-verification nightly job (01:00 UTC via pg-boss) — auto-resets competitor_verified=false where no non-Flexoptix price_observation in last 30 days, then recomputes fully_verified — eliminates recurring false ★ 100% badges without manual SQL intervention"}
|
|
||||||
{"d":"2026-04-12","t":"DATA","m":"Data quality: 608 transceivers had competitor_verified=true with NO actual non-Flexoptix price in last 30 days — all reset to false + fully_verified=false. ★ 100% badge now only shows when genuinely earned. Triggered by user catching false badges on 1.6T OSFP products."}
|
|
||||||
{"d":"2026-04-12","t":"FIX","m":"ATGBICS + FS.COM scrapers: PlaywrightCrawler useSessionPool=false added — eliminates SDK_SESSION_POOL_STATE.json crash on every run; withIsolatedStorage now pre-seeds empty session state file as belt-and-suspenders"}
|
|
||||||
{"d":"2026-04-12","t":"FIX","m":"Skylane scraper: pagination now breaks on zero NEW unique product URLs (was looping all 10 pages because Algolia returns same content regardless of ?page=N)"}
|
|
||||||
{"d":"2026-04-12","t":"FIX","m":"AscentOptics scraper fully rewritten: uses /product-list?is_render=1&category_id=CID JSON API (was hitting 404 on old /catalog/ URLs); hardcoded category IDs for 14 transceiver form factors; no prices (OEM Get Quote model)"}
|
|
||||||
{"d":"2026-04-12","t":"UI","m":"Dashboard transceiver table: VERIFIED column now shows all 4 individual criteria per row (✓/— P=Price, I=Image, D=Details, C=Competitor) in green/red — ★ 100% badge only when all 4 met; uses competitor_verified DB column"}
|
|
||||||
{"d":"2026-04-12","t":"FIX","m":"Data quality: 59 anomalous price observations deleted (FS.COM accessories EUR 1-18 misidentified as OSFP/QSFP-DD/QSFP28; ATGBICS QSFP-DD sub-$60) — 49 transceivers competitor_verified degraded to false, 1 fully_verified badge removed"}
|
|
||||||
{"d":"2026-04-12","t":"FIX","m":"upsertPriceObservation: hard floor $1.50 USD added before form-factor bounds check — catches accessories/cables misidentified as transceivers when form_factor defaults to SFP with loose [2,3000] bounds"}
|
|
||||||
{"d":"2026-04-12","t":"FIX","m":"GBICS scraper: attribute order changed on site — regex updated from aria-label→href→data-event-type to dual-pass href+aria-label (both orders), data-event-type no longer required; prices now correctly extracted"}
|
|
||||||
{"d":"2026-04-12","t":"FIX","m":"Scheduler: 11 missing boss.work() handlers added for lightweight scrapers (fluxlight, gbics, optcore, champion-one, sfpcables, blueoptics, fiber24, tscom, skylane, ascentoptics, gaotek) — jobs were queued by cron but never consumed; scrapers stale 24-48h"}
|
|
||||||
{"d":"2026-04-12","t":"FIX","m":"withIsolatedStorage: removed rmSync cleanup of Crawlee storage dir — dir deletion caused SDK_SESSION_POOL_STATE.json not found crash on every Playwright scraper restart (ATGBICS/FS.COM failed every 2h cycle)"}
|
|
||||||
{"d":"2026-04-12","t":"FEAT","m":"Scheduler: monitor:scraper-health job added (every 3h via pg-boss) — checks price_observations per vendor in last 6h, logs SCRAPER HEALTH ALERT to pm2 stderr for any vendor with 0 new prices"}
|
|
||||||
{"d":"2026-04-12","t":"FIX","m":"Health check vendor names corrected: SFPCables→SFPcables, Fiber24→ShopFiber24, T&S Com→T&S Communication to match actual vendor table values"}
|
|
||||||
{"d":"2026-04-12","t":"FIX","m":"FiberMall scraper: URL schema corrected — wrong /c/1g-sfp-transceiver/ paths (HTTP 404) replaced with actual /store-XXXXX-name.htm category URLs discovered via homepage navigation scrape"}
|
|
||||||
{"d":"2026-04-12","t":"FIX","m":"FiberMall parser: product card split on new_proList_mainListLi (Vue.js SSR), price extracted from <span class=currency_price data-price=X.XX> — fixed false-match on data-price=0.00 from SKU variant items that appears before real price in each card"}
|
|
||||||
{"d":"2026-04-12","t":"FIX","m":"FiberMall: also scrapes SKU brand variants from .sku_item divs within each product group (Cisco/Arista/Juniper compatible versions listed per product)"}
|
|
||||||
{"d":"2026-04-12","t":"FIX","m":"Flexoptix price parsing: EUR text regex /([\d.]+)\s*EUR/ matched only digits before thousand separator (2,921.60 EUR → 2 EUR) — fixed to /([\d,]+\.?\d*)\s*EUR/ with comma strip; affects all Flexoptix prices >999 EUR"}
|
|
||||||
{"d":"2026-04-12","t":"FIX","m":"Flexoptix catalog: O.138HG2.C.05 (1.6T OSFP224 2x DR4) price corrected 3009.60→2921.60 EUR (stale since 2026-04-09, Flexoptix.net shows FLEXBOX price 2921.60 via data-price-amount attribute)"}
|
|
||||||
{"d":"2026-04-12","t":"FEAT","m":"Flexoptix catalog: 4 new search queries added — OSFP224 1.6T, OSFP224, 1.6T DR4, 1.6T transceiver — covers new 1.6T form factor previously missing entirely from catalog scraper"}
|
|
||||||
{"d":"2026-04-12","t":"FIX","m":"Schema: competitor_verified + competitor_verified_at columns added to transceivers table (ALTER TABLE) — were referenced in db.ts upsertPriceObservation but not in schema, causing price writes to fail silently for all competitor vendors (FiberMall, QSFPTEK etc.)"}
|
|
||||||
{"d":"2026-04-11","t":"FEAT","m":"Scraper coverage expansion: 3 new scrapers added — FiberMall (fibermall.com, USD), Vcelink (vcelink.com, USD, Shopify), OpticsBay (opticsbay.com, USD, WooCommerce) — all wired into scheduler and Pi fleet"}
|
|
||||||
{"d":"2026-04-11","t":"FIX","m":"QSFPTEK scraper fully rewritten: site migrated from OpenCart to custom Java/Spring+Vue — old /c/*.html paths 404, now uses /mall/commodity/list API with attribute-based data rate filtering; 8 attribute IDs for 1G/10G/25G/40G/100G/200G/400G/800G"}
|
|
||||||
{"d":"2026-04-11","t":"INFRA","m":"Scheduler: 61 workers total, 53 cron schedules — FiberMall/Vcelink/OpticsBay added at :03, :07, :57 past even hours"}
|
|
||||||
{"d":"2026-04-09","t":"FEAT","m":"Price anomaly detection: PRICE_BOUNDS per form-factor in db.ts upsertPriceObservation — prices outside [min,max] USD range silently rejected to prevent garbage data (e.g. SFP+ [4, 5000], OSFP224 [200, 60000])"}
|
|
||||||
{"d":"2026-04-09","t":"UI","m":"Dashboard: LLM panel redesigned for light theme readability; LLM model selector added to Blog Engine tab"}
|
|
||||||
{"d":"2026-04-09","t":"INFRA","m":"Pi Starlink proxy-agent: scraper routes selected lightweight scrapers exclusively to Pi worker fleet via SOCKS5 — no Playwright traffic on Pi nodes"}
|
|
||||||
{"d":"2026-04-09","t":"DATA","m":"800G standards deep enrichment: migration 033 — IEEE 802.3df, OIF 800G IA, 800G MSA, OSFP MSA, QSFP-DD800 MSA with links, status, timeline"}
|
|
||||||
{"d":"2026-04-09","t":"FEAT","m":"Linecard system support: switches can have linecard slots; Cisco 8000 accuracy migration (031) with correct port count and linecard data"}
|
|
||||||
{"d":"2026-04-09","t":"FIX","m":"Qdrant init, switch column verification, crawler live status, demo data badges — stability audit fixes"}
|
|
||||||
{"d":"2026-04-08","t":"FEAT","m":"Scraper: SOCKS5 proxy rotation for FS.com, ATGBICS, GBICS via Pi fleet nodes — residential IPs for CloudFront WAF bypass"}
|
|
||||||
{"d":"2026-04-07","t":"AI","m":"Blog fine-tuning: 100 gold-standard training articles added (blog-001 to blog-100) for fo-blog-v2/v3 fine-tuning dataset"}
|
|
||||||
{"d":"2026-04-06","t":"FIX","m":"Scraper: FS.com switched to de.fs.com for EUR prices as primary source; parsePrice hardened (requires currency symbol, uses largest number)"}
|
|
||||||
{"d":"2026-04-06","t":"FIX","m":"Scraper: bot User-Agents replaced with Chrome UA; dead domain scrapers disabled"}
|
|
||||||
{"d":"2026-04-06","t":"FIX","m":"Blog: Claude API calls serialized via queue to prevent 429 rate-limit spam; claudeQueue deadlock from recursive 429 retry fixed"}
|
|
||||||
{"d":"2026-04-06","t":"FEAT","m":"Blog: Anthropic Claude provider added to LLM client — claude-bridge on Erik used as flat-rate backend"}
|
|
||||||
{"d":"2026-04-06","t":"DATA","m":"Migrations 026+027: price observation cleanup and FS.com EUR currency fix"}
|
|
||||||
{"d":"2026-04-06","t":"FIX","m":"Dashboard: verified badge logic corrected; comparable pricing shown properly; product images clickable"}
|
|
||||||
{"d":"2026-04-06","t":"AI","m":"Blog training: 13 gold-standard articles added to BlogLLM training set"}
|
|
||||||
{"d":"2026-04-05","t":"FEAT","m":"Blog Engine: AEM/APM pipeline steps + SLL context builder + LinkedIn v2 prompts; blog routes mounted (blogSllRouter + scraperRouter)"}
|
|
||||||
{"d":"2026-04-05","t":"FEAT","m":"Blog: Title Contract + Technical Sanity Check + Self-Heal + angle-aware LinkedIn generator; anti-repetition engine with 6 angle types and forbidden structures"}
|
|
||||||
{"d":"2026-04-05","t":"FEAT","m":"Blog: Post to Ghost + Post to LinkedIn buttons in dashboard"}
|
|
||||||
{"d":"2026-04-05","t":"FIX","m":"Blog: hard story blacklist in STEP4 + LinkedIn (2AM/dirty connector/lab-vs-prod stories banned); word target 1200–1600; power-budget false positive fix"}
|
|
||||||
{"d":"2026-04-05","t":"FEAT","m":"Dashboard: data verification status section added to Overview tab"}
|
|
||||||
{"d":"2026-04-05","t":"FEAT","m":"Scraper: all pricing scrapers unified to 2h 24/7 cycle — full competitor coverage with no scheduling gaps; 4th verification criterion (Competitor) added"}
|
|
||||||
{"d":"2026-04-11","t":"FIX","m":"Scraper: CRAWLEE_PURGE_ON_START=1 set in withIsolatedStorage — fixes FS.com + ATGBICS crash on startup (SDK_SESSION_POOL_STATE.json not found in fresh isolated storage dir)"}
|
|
||||||
{"d":"2026-04-11","t":"FEAT","m":"Scraper: NADDOD, QSFPTEK, AddOn Networks added to pg-boss scheduler (every 2h, slots :48/:52/:55) — 24 pricing queues total, 58 workers"}
|
|
||||||
{"d":"2026-04-11","t":"FIX","m":"Scraper: ProLabs rewritten from PlaywrightCrawler (blocked by CloudFront WAF TLS fingerprinting) to fetch-based sitemap scraper — catalog-only (B2B quote model, no public prices)"}
|
|
||||||
{"d":"2026-04-11","t":"FIX","m":"Scraper: startup zombie cleanup in index.ts — on daemon restart, active pg-boss jobs older than 5 min are marked failed to allow re-queueing at next cron tick"}
|
|
||||||
{"d":"2026-04-11","t":"FIX","m":"Scraper: pre-existing TypeScript build errors fixed (findOrCreateScrapedTransceiver: removed invalid name/url/extractType params; ebay-enricher cheerio type mismatch; community-issues description→summary, publishedDate→published_at)"}
|
|
||||||
{"d":"2026-04-04","t":"AI","m":"Blog Engine v5: STEP8b replaced with Reduction Engine v1.0 (5-pass: Repetition Kill → Tech Prune → Flow Rebuild → Weight Correction → Humanization); target 700-1000 words; LaTeX hard delete in Pass 2; title/content alignment in Pass 4; word count range enforcement 600-1300 with warnings"}
|
|
||||||
{"d":"2026-04-04","t":"DATA","m":"Blog calibration: Gold Standard 5 added (market alert / pricing article — 2026-04-04; title matches body throughout; no LaTeX; DR4 = MPO-12 not LC duplex; ending lands on title topic not generic close)"}
|
|
||||||
{"d":"2026-04-04","t":"AI","m":"Blog Engine v5: system prompt + STEP9 QA hardened with LaTeX hard fail (\\[...\\] destroys flow), DR4 connector hard fail (DR4=MPO-12, FR4=LC duplex), title/content alignment check (12d); WRONG PATTERNS extended with 4 new entries"}
|
|
||||||
{"d":"2026-04-04","t":"AI","m":"Blog Engine v5: STEP4b Narrative Control (4-correction pass after draft — root cause assignment, anti-FUD filter, reality reframe, Flexoptix voice check); minimum words 1500→2500; reduction pass 25-35%→15-25%; pipeline now 14 steps, version v5-narrative-control"}
|
|
||||||
{"d":"2026-04-04","t":"AI","m":"Blog Engine v5: STEP_LINKEDIN_POST — generates LinkedIn post ≤2800 chars from final article (hook + 3-5 insights + CTA + hashtags); stored in blog_drafts.linkedin_post + linkedin_char_count; hard truncation at 2800 if LLM exceeds limit"}
|
|
||||||
{"d":"2026-04-04","t":"DATA","m":"Blog calibration: Gold Standard 4 added (compatible vs OEM narrative correction — 2026-04-04; optic = not root problem, exposes existing issues; correct Flexoptix framing: validation responsibility shifts to operator)"}
|
|
||||||
{"d":"2026-04-04","t":"DATA","m":"Migration 024: linkedin_post + linkedin_char_count columns in blog_drafts"}
|
|
||||||
{"d":"2026-04-04","t":"FIX","m":"Proxy Network: IP geo-lookup via ip-api.com on register/heartbeat (country_code + city now populated); heartbeat_count column + uptime_pct computed per heartbeat (was always 0.00); dedup fix — register returns existing token for same IP+port; heartbeat no longer overwrites registered IP (prevented IPv6 churn conflicts)"}
|
|
||||||
{"d":"2026-04-04","t":"DATA","m":"Proxy Network: migration 023 — heartbeat_count column added, existing node uptime_pct backfilled, duplicate registration from same IPv6 removed (4 nodes → 3)"}
|
|
||||||
{"d":"2026-04-04","t":"AI","m":"Blog Engine v4: STEP8b Reduction Pass (25-35% content cut, removes repeated concepts) + STEP8c Style Lock (tone consistency, scope/OPM fix, no inline SKUs) — pipeline now 12 steps, version v4-reduction-stylelock"}
|
|
||||||
{"d":"2026-04-04","t":"DATA","m":"Blog calibration: Gold Standard 3 added (Style B troubleshooting — 2026-04-04 field feedback, flowing narrative, zero sections, failure as behavior not scenario)"}
|
|
||||||
{"d":"2026-04-04","t":"FIX","m":"Flexoptix scraper: contentHash call fixed (was passing JSON.stringify string, now passes object directly)"}
|
|
||||||
{"d":"2026-04-03","t":"FEAT","m":"TIP Proxy Network (packages/proxy-agent): SOCKS5 residential proxy for CloudFront WAF bypass — node registration, heartbeat, load balancing with uptime+latency scoring"}
|
|
||||||
{"d":"2026-04-03","t":"FEAT","m":"proxy-agent CLI package (@tip/proxy-agent): tip-agent start/status/stop, configurable bandwidth cap, 30s heartbeat, graceful shutdown"}
|
|
||||||
{"d":"2026-04-03","t":"FIX","m":"DB utils: price_verified=true now set in content_hash early-return path (no new observation); image_verified=true auto-set on INSERT and on image_url update in findOrCreateScrapedTransceiver"}
|
|
||||||
{"d":"2026-04-03","t":"FIX","m":"pg-boss pool: max connections reduced to 4 + idle_in_transaction_session_timeout=30s — fixed PostgreSQL max_connections exceeded (100/100)"}
|
|
||||||
{"d":"2026-04-03","t":"DATA","m":"Image backfill: 178 Flexoptix images added via GraphQL small_image — Optcore images via Playwright gallery — findOrCreateScrapedTransceiver now updates image_url for existing records"}
|
|
||||||
{"d":"2026-04-03","t":"FEAT","m":"SmartOptics scraper: DWDM/coherent product catalog, og:image extraction, 8 products with form factor + reach detection"}
|
|
||||||
{"d":"2026-04-03","t":"FIX","m":"Fluxlight scraper: price extraction fixed for BigCommerce HTML (data-product-price-without-tax attribute)"}
|
|
||||||
{"d":"2026-04-03","t":"AI","m":"Blog Engine v3: STEP4 prose requirement (zero tolerance for ## headers, #### Scenario: patterns, bullet sections) — STEP3 outline as flow plan (3-4 beats) — STEP9 format violations as primary hard fail"}
|
|
||||||
{"d":"2026-04-03","t":"FIX","m":"Blog engine: DR4 wavelength corrected to 1310nm=0.35dB/km; scope description fixed (visual tool, not loss measurement device)"}
|
|
||||||
{"d":"2026-04-03","t":"FIX","m":"Blog engine: orphaned floating text in fo-blog-pipeline.ts removed (dead code outside template literal causing TypeScript build failure)"}
|
|
||||||
{"d":"2026-04-03","t":"FEAT","m":"NOG Talks scraper: DENOG/NANOG/RIPE/ENOG/NLNOG/Euro-IX conference talks — relevance scoring, optical keyword detection, weekly pg-boss job, CtxEvent cross-DB bridge via dblink"}
|
|
||||||
{"d":"2026-04-03","t":"FEAT","m":"Hot Topics v2: market_intelligence as SOURCE 3b (0.6+ relevance, urgency mapping per intel_type), NOG Talks as SOURCE 3c (grouped by event with speaker+abstract), limit 20 topics"}
|
|
||||||
{"d":"2026-04-03","t":"FIX","m":"Flexoptix scraper: 1G SFP coverage fixed (added SFP LX/SX/ZX queries); SKU suffix stripping (:Sx → base SKU); pagination cap removed (200-product limit was blocking full catalog); Phase 1→2 URL enrichment"}
|
|
||||||
{"d":"2026-04-03","t":"FEAT","m":"Prediction intelligence fully_verified trigger: PostgreSQL trigger trg_sync_fully_verified auto-computes fully_verified = price_verified AND image_verified AND details_verified — mass backfill: 258 → 3615 badges"}
|
|
||||||
{"d":"2026-04-02","t":"FEAT","m":"Auth: password-protected login page — HMAC-SHA256 signed token, requireAuth middleware on all API routes, dark TIP-themed login page"}
|
|
||||||
{"d":"2026-04-02","t":"INFRA","m":"Raspberry Pi fleet: 3x Pi nodes running 24/7 as lightweight scraper workers via WireGuard VPN, pg-boss multi-node queue sharing"}
|
|
||||||
{"d":"2026-04-02","t":"INFRA","m":"WireGuard VPN: Pi fleet tunnel for secure PostgreSQL access to production DB"}
|
|
||||||
{"d":"2026-04-02","t":"FEAT","m":"Prediction Intelligence System (migration 022): 7 new tables — hyperscaler_capex, distributor_lead_times, github_tech_signals, marketplace_velocity, ai_cluster_announcements, standards_activity, forecast_signals"}
|
|
||||||
{"d":"2026-04-02","t":"FEAT","m":"SEC EDGAR scraper: XBRL API, quarterly CapEx for Amazon/Microsoft/Alphabet/Meta — DC-share estimate + YoY growth"}
|
|
||||||
{"d":"2026-04-02","t":"FEAT","m":"GitHub Signals scraper: weekly repo_count/commit/stars for 400G/800G/ZR/CMIS/CPO/silicon-photonics tech adoption tracking"}
|
|
||||||
{"d":"2026-04-02","t":"FEAT","m":"eBay Velocity scraper: sold/active listing counts + avg price for 9 transceiver search terms — every 12h"}
|
|
||||||
{"d":"2026-04-02","t":"FEAT","m":"AI Cluster Announcements scraper: 6 RSS feeds (DataCenterKnowledge, DC Dynamics, Blocks&Files, Next Platform, ServeTheHome) — extracts company, MW, network speed, estimated transceivers"}
|
|
||||||
{"d":"2026-04-02","t":"FEAT","m":"Distributor Lead Times scraper: Mouser, Digi-Key, RS Components — in_stock, stock_qty, lead_time_weeks, price — daily"}
|
|
||||||
{"d":"2026-04-02","t":"FEAT","m":"Standards Tracker: IEEE 802.3 project table, OIF hot topics, IETF Datatracker API — tracks in-progress/ballot/published status — weekly"}
|
|
||||||
{"d":"2026-04-02","t":"FEAT","m":"Forecast Engine: weighted demand index (0-100) from 6 signal types — capex 0.30, ai_clusters 0.25, ebay_velocity 0.20, lead_times 0.15, github 0.06, standards 0.04 — 3/9/12/18 month horizons for 5 technologies"}
|
|
||||||
{"d":"2026-04-02","t":"FEAT","m":"NAS sync: datasheet/manual download — PDFs from product_documents organized into switches/transceivers/whitepapers/other"}
|
|
||||||
{"d":"2026-04-02","t":"INFRA","m":"Scheduler: 50 total pg-boss jobs — 8 new prediction/forecast jobs with cron schedules"}
|
|
||||||
{"d":"2026-04-03","t":"FEAT","m":"TIP Proxy Network: residential proxy pool — contributor nodes donate bandwidth, SOCKS5 server (Node.js net only), register/heartbeat/next/rotate/stats API, round-robin routing with uptime+latency scoring"}
|
|
||||||
{"d":"2026-04-03","t":"FEAT","m":"@tip/proxy-agent: standalone CLI package — tip-agent start/status/stop, configurable bandwidth cap, 30s heartbeat, graceful shutdown"}
|
|
||||||
{"d":"2026-04-03","t":"UI","m":"Dashboard Network tab: node stats, join-the-network card with token generator, install command box, country breakdown table"}
|
|
||||||
{"d":"2026-04-03","t":"INFRA","m":"Mac Studio home node: tip-agent running on 192.168.178.213:1081, PROXY_URL=socks5://192.168.178.213:1081 set in PM2 env for scraper+api, ProLabs WAF bypass now active"}
|
|
||||||
{"d":"2026-04-01","t":"FEAT","m":"Product Intelligence Layer (migration 020): product_issues table (forum/community bugs), condition+marketplace on price_observations, features JSONB on switches+transceivers"}
|
|
||||||
{"d":"2026-04-01","t":"FEAT","m":"eBay Enricher: scrapes eBay.de for switch/transceiver listings — extracts features, description, refurbished prices, images — nightly via pg-boss"}
|
|
||||||
{"d":"2026-04-01","t":"FEAT","m":"Community Issues Scraper: extracts known bugs/incompatibilities from Reddit, ServeTheHome, Arista Community, Cisco Community, NetworkEngineering SE"}
|
|
||||||
{"d":"2026-04-01","t":"DATA","m":"7 pre-seeded community issues: Arista QSFP28 EOS compatibility, Cisco SFP DOM bug, Juniper QFX5120 config tip, SG350 SFP speed limit, MikroTik CRS326 QoS, DCS-7800R3 QSA, UniFi third-party warning"}
|
|
||||||
{"d":"2026-04-01","t":"FEAT","m":"API: GET /api/switches/:id/issues — known community issues with severity, tags, source links; GET /api/switches/:id/documents — official datasheets+manuals"}
|
|
||||||
{"d":"2026-04-01","t":"UI","m":"Switch detail modal: shows features array from DB, description, eBay refurbished price, known issues with severity color coding, datasheets with download links"}
|
|
||||||
{"d":"2026-04-01","t":"FEAT","m":"Datasheet Finder: discovers and links official PDF datasheets/manuals from vendor sites (Arista, Cisco, Juniper, HPE) for existing switches"}
|
|
||||||
{"d":"2026-04-01","t":"DATA","m":"SMB/campus switch seed: 26 models across Cisco SG/CBS 350/550/CBS350, HPE Aruba 1820/2530/2930F, Ubiquiti UniFi Pro/Aggregation, MikroTik CRS326/354/504, Netgear M4300/M4500, Zyxel XGS"}
|
|
||||||
{"d":"2026-04-01","t":"FIX","m":"forecast.ts: fixed fiveYearProjection accessor (hype.forecast.fiveYearProjection[n] instead of hype.forecast[n])"}
|
|
||||||
{"d":"2026-04-01","t":"FEAT","m":"Procurement Intelligence Engine: stock_snapshots, abc_classification, reorder_signals, product_lifecycle_events, market_intelligence tables"}
|
|
||||||
{"d":"2026-04-01","t":"FEAT","m":"Crawler LLM: Ollama-based two-stage extractor (page type detection + structured product extraction) with vendor profiles for 7 vendors"}
|
|
||||||
{"d":"2026-04-01","t":"FEAT","m":"ABC classification: dynamic A/B/C turnover scoring from price observations, compatibility breadth, vendor count — computed daily"}
|
|
||||||
{"d":"2026-04-01","t":"FEAT","m":"Reorder signals: buy_now/wait/hold/monitor with signal strength and reasons — computed daily from stock trends, price trends, lead times"}
|
|
||||||
{"d":"2026-04-01","t":"DATA","m":"Market intelligence seeded: OFC 2026, AWS CapEx $105B, Azure CapEx $80B+, Coherent 400G ZR+ lead times 16-20w, EU TED €2.1B tenders, ECOC 2026, IEEE 802.3df"}
|
|
||||||
{"d":"2026-04-01","t":"DATA","m":"Lifecycle events seeded: Cisco SFP-10G-LR EOL 2026-06-30, Juniper SFPP-10GE-ER EOL 2026-09-01, 400ZR ratified, 800G MSA draft"}
|
|
||||||
{"d":"2026-04-01","t":"UI","m":"Procurement Intel tab: Reorder Signals, ABC Classes table, Market Intel cards, Lifecycle Events — live on dashboard"}
|
|
||||||
{"d":"2026-04-01","t":"FEAT","m":"Market intelligence scraper: OFC/ECOC, IEEE 802.3, EU TED, Farnell/Mouser lead times, LightReading, FierceTelecom — weekly via pg-boss"}
|
|
||||||
{"d":"2026-04-01","t":"FIX","m":"Dashboard: garbage product names (scraped-*, All Optical Transceivers) no longer shown as product titles — isGarbageName() filter"}
|
|
||||||
{"d":"2026-04-01","t":"FIX","m":"Dashboard: competitor comparable prices shown as inline tooltip (ⓘ) instead of block element breaking price row layout"}
|
|
||||||
{"d":"2026-04-01","t":"UI","m":"Dashboard: 100% VERIFIED badge with white-on-green sub-items (Price ✓, Image ✓, Details ✓) — explicit === true checks, no false positives"}
|
|
||||||
{"d":"2026-04-01","t":"UI","m":"Dashboard list view: SKU + descriptive name on two lines, Verified column with ★ 100% badge"}
|
|
||||||
{"d":"2026-04-01","t":"UI","m":"Dashboard detail view: manufacturer product name above image, temperature range decoded (COM → 0–70°C), close button visible on light background"}
|
|
||||||
{"d":"2026-04-01","t":"DATA","m":"Migration 018: garbage data cleanup — marks scraped-* and category-page scrapes as data_confidence=garbage"}
|
|
||||||
{"d":"2026-04-01","t":"FEAT","m":"Migration 017: verification tags — price_verified, image_verified, details_verified, fully_verified columns + compute_transceiver_verification() function"}
|
|
||||||
{"d":"2026-03-31","t":"DATA","m":"Migration 016: data_confidence scoring (garbage/low/medium/high)"}
|
|
||||||
{"d":"2026-03-31","t":"FEAT","m":"Migration 013: v0.2.0 Sales Intelligence tables — competitor_alerts, price_changes, generated_datasheets, sales_forecasts, blog_posts_v2"}
|
|
||||||
{"d":"2026-03-31","t":"FEAT","m":"Transport planner route: GET /api/transport — city-pair fiber route recommendations with switch and transceiver BOM"}
|
|
||||||
{"d":"2026-03-31","t":"FEAT","m":"Blog Engine v2: market_alert, migration_guide, competitor_analysis, buying_guide types with data enrichment pipeline"}
|
|
||||||
{"d":"2026-03-31","t":"FEAT","m":"Competitor alerts route: GET /api/competitor-alerts — price changes, new products, stock events with acknowledge workflow"}
|
|
||||||
{"d":"2026-03-30","t":"FEAT","m":"Switch→Flexoptix Finder: GET /api/finder — enter switch model, get matching Flexoptix transceivers with prices and shop links"}
|
|
||||||
{"d":"2026-03-30","t":"FEAT","m":"MCP Server: 12 tools including find_transceiver, get_compatibility, get_hype_cycle, generate_blog, plan_transport"}
|
|
||||||
{"d":"2026-03-30","t":"FEAT","m":"Norton-Bass Hype Cycle engine: multigenerational diffusion model for 15 transceiver technologies with adoption curves"}
|
|
||||||
{"d":"2026-03-30","t":"DATA","m":"440 switches seeded including Cisco, Arista, Juniper, Edgecore, Mellanox, whitebox OCP switches"}
|
|
||||||
{"d":"2026-03-30","t":"DATA","m":"33,993 compatibility entries — transceiver↔switch compatibility matrix"}
|
|
||||||
{"d":"2026-03-30","t":"FEAT","m":"Price monitoring: 23 scrapers, 60+ data sources, pg-boss scheduler — permanent monitoring"}
|
|
||||||
{"d":"2026-03-30","t":"FEAT","m":"Qdrant vector DB integration: hybrid full-text + semantic search across products, FAQ, datasheets, news"}
|
|
||||||
{"d":"2026-03-30","t":"INFRA","m":"Stack deployed: PostgreSQL 17 + TimescaleDB, Qdrant, Cloudflare R2 for images, PM2"}
|
|
||||||
{"d":"2026-03-30","t":"DATA","m":"v0.1.0: 5,018 transceivers, 351 vendors seeded from 23 initial scrapers"}
|
|
||||||
{"d":"2026-04-17","t":"DATA","m":"Vendor cleanup: pruned 242 irrelevant OEM/manufacturer vendors with no transceiver or switch data — 348→106 vendors"}
|
|
||||||
{"d":"2026-04-18","t":"FEAT","m":"Mouser Electronics API scraper: OEM reference prices for Juniper/Cisco/Arista PIDs — scheduled daily 03:00, MOUSER_API_KEY env var required"}
|
|
||||||
{"d":"2026-04-18","t":"FEAT","m":"Hype Cycle Engine: Norton-Bass diffusion model fitted to 6 tech generations (10G/100G/400G-QSFP-DD/800G-OSFP/400G-ZR/1.6T). Bass params via grid search, Gartner phase detection, ASP log-linear projection. Seeded market_metrics + hype_cycle_analysis table. Scheduled daily 04:30. API: GET /api/hype-cycle/analysis"}
|
|
||||||
{"d":"2026-04-18","t":"DATA","m":"migration 039: hype_cycle_analysis table (Bass p/q/M params, phase, score, projected share 1y/3y, ASP current + decline %). market_metrics CHECK extended with hype_score type"}
|
|
||||||
{"d":"2026-04-20","t":"FEAT","m":"switch-image-fetcher.ts: og:image-based image discovery for all 86 seeded switches — covers Cisco, Arista, Juniper, NVIDIA, Edgecore, Celestica, Asterfusion, Dell, HPE, Huawei, Nokia, Extreme, MikroTik, Ubiquiti, FS.COM, Supermicro. Daily at 08:30 UTC."}
|
|
||||||
{"d":"2026-04-20","t":"FEAT","m":"flexoptix-compat.ts: Flexoptix compatibility scraper — maps switch models to compatible Flexoptix transceivers via search API (vendor_compat) with form-factor fallback (spec_match). Daily 09:00 UTC."}
|
|
||||||
{"d":"2026-04-20","t":"FEAT","m":"community-issues.ts enhanced: added Cisco Field Notices, Juniper KB, SONiC GitHub Issues sources + new scrapeTransceiverCompatIssues() for switch+transceiver combo issues."}
|
|
||||||
{"d":"2026-04-20","t":"UI","m":"Dashboard switch table: thumbnail column (48px lazy-load image with gear-icon fallback). Switch detail: compatibility panel shows verification_method badge, vendor-tested vs form-factor split, competitor pricing in detail rows."}
|
|
||||||
{"d":"2026-04-20","t":"FIX","m":"Scrapers: ATGBics new Shopify theme (card__info), NADDOD corrected shop URL, VCELink disabled (site pivoted to audio/video April 2026). Scheduler: 59 schedules, 78 workers."}
|
|
||||||
{"d":"2026-04-21","t":"FEAT","m":"switch-image-playwright.ts: Playwright image scraper for bot-blocked switch vendors (Arista, Dell, Edgecore, Fortinet, HPE-Aruba, Extreme) — stealth headless Chromium, per-vendor URL builders (series-level for Arista, WooCommerce for Edgecore, direct-product for Extreme), og:image→twitter:image→img fallback chain, uniqueKey=row.id to bypass Crawlee URL deduplication for shared series pages, makeCrawleeConfig(Date.now() suffix) per-run to avoid ENOENT from stale request-queue files."}
|
|
||||||
{"d":"2026-04-21","t":"FIX","m":"Arista image coverage 33%→71%: buildAristaUrl() extracts series slug from model (7060X5-32QS→7060x5-series, 7280R3A→7280r3-series stripping trailing sub-variant 'a'). uniqueKey=row.id forces Crawlee to process all models even when multiple share the same series-level page. 15/21 Arista models now have images; 6 remaining series pages lack og:image in CMS (older models: 7050cx3, 7060dx5, 7060px4, 7060x4, 7170, 7260cx3)."}
|
|
||||||
{"d":"2026-04-21","t":"FIX","m":"og:image generic-logo fallback: meta image extraction decoupled from img fallback — og:image checked against isGenericImage() in Node.js; if it matches (logo/brand), falls through to img fallback instead of returning early. Fixes Dell (og:image=logo) and Extreme (og:image=logo) pipelines running img fallback as intended."}
|
|
||||||
{"d":"2026-04-21","t":"FIX","m":"OneTrust/cookie consent image filter: cdn.cookielaw.org, cookiebot.com, trustarc.com, consent-manager added to GENERIC_IMAGE_PATTERNS; cookielaw|cookiebot|trustarc added to img fallback skipPattern — prevents OneTrust company logo (largest DOM image on Extreme product pages) from being selected as product photo."}
|
|
||||||
{"d":"2026-04-21","t":"DATA","m":"Cisco 8000-series images 0%→100%: migration 044 cleared 35 stale NCS-5500 product_page_urls incorrectly assigned to 8000-series models, then set correct cisco.com/site/us/en/ URLs. switch-image-fetcher.ts plain HTTP run: 32/32 Cisco 8000-series models now have images."}
|
|
||||||
{"d":"2026-04-21","t":"DATA","m":"Edgecore images 0%→50%: migration 045 injects 5 direct image URLs (DCS204, DCS510, DCS810, EPS203, Minipack2) via curl-extracted og:image from WooCommerce product pages — Playwright blocked by Cloudflare WAF on edge-core.com but plain curl succeeds. AS7xxx enterprise switches not listed on edge-core.com website."}
|
|
||||||
{"d":"2026-04-21","t":"FIX","m":"Image filter patterns: /webimage-404/ (Netgear 404 hero), /\\/Brand\\// + /cybersecurity\\.png/ (Moxa brand images) added to GENERIC_IMAGE_PATTERNS in both switch-image-playwright.ts and switch-image-fetcher.ts. Cleared 5 bad DB rows (Moxa Brand/cybersecurity.png x4, Netgear webimage-404 x1)."}
|
|
||||||
{"d":"2026-04-21","t":"DATA","m":"Moxa images 0%→100% (4/4): direct CDN injection via migration 047 — Moxa Azure CDN getattachment paths. Hotlink-protected (Referer: moxa.com required); R2 proxy needed for production display."}
|
|
||||||
{"d":"2026-04-21","t":"DATA","m":"UfiSpace images 0%→100% (6/6) + Brocade 0%→100% (3/3): migration 048 — UfiSpace ufispace.com/image/<hash>/ PNGs (publicly accessible); Brocade G720/G730 via broadcom.com og:image, ICX 7850-48FS via CommScope/Ruckus vistancenetworks.com ImageServer (rand param cache-bust only, ID hash stable)."}
|
|
||||||
{"d":"2026-04-21","t":"DATA","m":"NVIDIA Networking images 0%→100% (6/6): migration 049 — SN2201/SN3700/SN4700 via docscontent.nvidia.com official docs CDN, SN5400/SN5600 via k3-prod-nvidia-docs.s3 direct, SN3750-SX via uvation reseller CDN."}
|
|
||||||
{"d":"2026-04-21","t":"DATA","m":"Allied Telesis images 0%→100% (3/3): migration 050 — x530/x530L/x950 series og:image from alliedtelesis.com Drupal CMS static files. QCT T3048-LY8 image via migration 046. Overall coverage: 33.4%→36.2%+ across 671 switches."}
|
|
||||||
{"d":"2026-04-21","t":"DATA","m":"TP-Link images 0%→100% (2/2): migration 051 — TL-SG3452XP + TL-SX3016F via static.tp-link.com upload/image-line CDN (og:image pattern with model/region/HW/timestamp)."}
|
|
||||||
{"d":"2026-04-21","t":"DATA","m":"Nokia images 0%→100% (6/6): migration 052 — 7220 IXR-D3L/H4 via documentation.nokia.com SR Linux docs graphics; 7250 IXR-10 + 7750 SR-1 via tempestns.com model-specific reseller CDN; 7750 SR-14s via telecomcauliffe.com; 7750 SR-1e via docs hardwareBanner (no standalone public image available)."}
|
|
||||||
{"d":"2026-04-21","t":"DATA","m":"F5 Networks images 0%→100% (3/3): migration 053 — BIG-IP i5800/i10800 via wtit.com reseller CDN (model-specific PNGs), i15800 via cdn.blueally.com bigip-i15000-series composite."}
|
|
||||||
{"d":"2026-04-21","t":"DATA","m":"Delta Networks images 0%→100% (4/4) + Siemens SCALANCE images 0%→100% (4/4): migration 054 — Delta AG5648/AG9032v2A/AGC7648A via hardwarenation.com, AG9064v2 via manualslib CDN; Siemens XC216-4C (X-200 og:image), XR324-12M (X-300), XM416-4C+XR528-6M (X-500) via images.sw.cdn.siemens.com official DISW CDN."}
|
|
||||||
{"d":"2026-04-21","t":"DATA","m":"MikroTik CRS/CCR images (8 models) + NVIDIA ConnectX-7 400G: migration 055 — all MikroTik via cdn.mikrotik.com/web-assets/rb_images (CRS305/312/317/326/354/504/518 + CCR2216); ConnectX-7 via FS.com CDN. MikroTik now 100%."}
|
|
||||||
{"d":"2026-04-21","t":"DATA","m":"ALE OmniSwitch 0%→100% (3/3) + H3C 0%→100% (3/3) + Hirschmann 0%→100% (4/4) + Ciena 0%→100% (3/3) + Netberg 0%→100% (3/3): migration 056 — ALE via al-enterprise.com CDN, H3C via resource.h3c.com, Hirschmann via industrialcomms.com/icomtechinc, Ciena via ciena.com/__data, Netberg via netbergtw.com."}
|
|
||||||
{"d":"2026-04-21","t":"DATA","m":"Arista Networks new series images (6 models) + Edgecore AS-series 3 models: migration 057 — 7060X6/X5/7050X4/7280R3/7020R via arista.com CDN; AS7726(=DCS204)/AS9516(=DCS810)/AS7535(=CSR440) via edge-core.com WP uploads."}
|
|
||||||
{"d":"2026-04-21","t":"DATA","m":"Fortinet FortiSwitch 0%→100% (11/11) + Dell PowerSwitch (3) + Huawei (4): migration 058 — FortiSwitch 108F/124F/124F-POE/148F-POE/424E/448E/524D/548D/1024E/1048E/3032E via cdn.blueally.com/avfirewalls; Dell Z9332F/S5248F via i.dell.com CDN, Z9664F via reseller; Huawei S5731 via e.huawei.com og:image."}
|
|
||||||
{"d":"2026-04-21","t":"DATA","m":"D-Link 0%→100% (3/3) + Netgear 0%→100% (3/3) + Check Point 0%→100% (2/2) + Ruckus 0%→100% (2/2): migration 059 — D-Link via dlink.com media CDN; Netgear via assets.netgear.com + blueally CDN; Check Point via tecisoft.ca + blueally; Ruckus ICX via productresources.vistancenetworks.com."}
|
|
||||||
{"d":"2026-04-21","t":"DATA","m":"HPE Aruba CX 0%→100% (3/3) + Extreme Networks 0%→100% (3/3): migration 060 — Aruba CX 10000/9300/6300 via bigcommerce/kaseya/avendor CDNs; Extreme SLX 9740-40C + X695-48Y-8C + 5520-48T via extr-p-001.sitecorecontenthub.cloud (official Extreme CDN)."}
|
|
||||||
{"d":"2026-04-21","t":"DATA","m":"Cambium cnMatrix EX2028-P+EX2052-P + Gigamon GigaVUE-HC3+HC1-Plus + SonicWall NSa 6700 + Planet Technology GS-6322-24P4X+IGS-6325-8T8S4X + Palo Alto PA-3430/5430/7080 (series images) + Westermo Lynx 5612+Redfox 5728 + Zyxel XGS4600-52F+XS3800-28: migration 061. 14 models, 7 vendors, all official CDNs verified HTTP 200."}
|
|
||||||
{"d":"2026-04-21","t":"DATA","m":"Synology SA6400 + TRENDnet TPE-5048WS + Waystream ASR 8000 + Kemp Technologies LoadMaster LM-X40 + LANCOM Systems GS-4554XP: migration 062. 5 models, all official vendor CDNs verified HTTP 200."}
|
|
||||||
{"d":"2026-04-21","t":"DATA","m":"Sophos XGS 6500 via Contentstack CDN (images.contentstack.io, explicit '6500' filename, HTTP 200) + Zyxel XS3800-28 URL fix (migration 061 path returned 403; replaced with Banner_product_hero.png, HTTP 200): migration 063."}
|
|
||||||
{"d":"2026-04-26","t":"FIX","m":"Umami credentials — PM2 env cache invalidated via delete+start (not --update-env); 500 sessions now loading correctly in Beste Posting-Zeit."}
|
|
||||||
{"d":"2026-04-26","t":"FIX","m":"Local Train API: changed execFileAsync from sh -lc to bash [script] [lane] to avoid Ubuntu dash syntax errors; added try-catch to return JSON on non-zero exit instead of HTTP 500."}
|
|
||||||
{"d":"2026-04-26","t":"FIX","m":"Local Train dashboard: disabled buttons when TIP_LOCAL_TRAIN_COMMAND not configured; early-exit guard with setup instructions shown in log area."}
|
|
||||||
{"d":"2026-04-26","t":"FIX","m":"Build Pool npm script missing on Erik — synced package.json + tip-learning-pool-build.ts; npm run learning-pool:build now works."}
|
|
||||||
{"d":"2026-04-26","t":"FIX","m":"TIP_LLM 0 pairs on Erik — git reset --hard origin/main restored 10654 train pairs from Gitea."}
|
|
||||||
{"d":"2026-04-26","t":"FIX","m":"HuggingFace publish — installed huggingface_hub + datasets via pip3 --break-system-packages on Erik; HF_TOKEN added to ecosystem.config.js."}
|
|
||||||
{"d":"2026-04-26","t":"AI","m":"TIP_LLM 5-capability system prompt: CAP-1 Transceiver Research, CAP-2 Switch Research, CAP-3 Blog_LLM Data Evaluation, CAP-4 Crawler/Scraper Design, CAP-5 Hype Cycle Calculation — replaces single-line generic prompt."}
|
|
||||||
{"d":"2026-04-26","t":"DATA","m":"TIP_LLM capabilities training data: seed-tip-llm-capabilities.ts generates 34 SFT pairs (149KB) covering all 5 CAPs with real product names, Crawlee code, Norton-Bass model examples, scoring rubrics. Registered in tip-learning-pool-build.ts."}
|
|
||||||
{"d":"2026-04-26","t":"DATA","m":"TIP_LLM training pool rebuilt: 12141 raw pairs → 11872 training pairs (10684 train + 1188 eval). Published to HuggingFace renefichtmueller/tip-llm-sft. Blog_LLM: 11408 pairs published to renefichtmueller/blog-llm-sft."}
|
|
||||||
{"d":"2026-04-21","t":"DATA","m":"Avaya VSP 7432CQ + NetApp CN1610 + Keysight Vision X + A10 Networks Thunder 14045 + Evertz EXE-VSR-IP + RAD ETX-2i-10G + Ekinops 360-12 + DrayTek VigorSwitch P2540xs + Fujitsu FLASHWAVE 9500 + Broadcom BCM957508-P2100G + Calix E9-2 + Citrix NetScaler SDX 26000-100G: migration 064. 12 models, all HTTP 200 verified (mix of official CDNs and reseller CDNs)."}
|
|
||||||
{"d":"2026-04-21","t":"DATA","m":"Cisco 8000/Catalyst 9000/Nexus 9000/NCS (14 models) via cisco.com/c/dam/en/us/td/i/ doc CDN: migration 065. All HTTP 200 verified."}
|
|
||||||
{"d":"2026-04-21","t":"DATA","m":"Juniper EX4100-48P/EX4400-48T/EX4650-48Y + MX10008/MX304 + QFX10008/5120/5130/5220/5700 (10 models) via juniper.net/content/dam/image-library: migration 066. All HTTP 200 verified."}
|
|
||||||
{"d":"2026-04-21","t":"DATA","m":"Arista 7020R/7050CX3-32S/7050X4-32/7060DX5-32/7060PX4-32/7060X4-32/7060X5-64/7130-48/7170-64C/7260CX3-64/7800R3-36P-LC (11 models) via arista.com QSG CDN front-panel PNGs: migration 067. All HTTP 200 verified."}
|
|
||||||
{"d":"2026-04-21","t":"DATA","m":"NVIDIA Networking SN2201/SN3700/SN4700/SN5400/SN5600 (5 models) via docscontent.nvidia.com dims4 CDN (Hardware User Manual front-panel images): migration 068. All HTTP 200 verified."}
|
|
||||||
{"d":"2026-04-21","t":"DATA","m":"Huawei CE16808/CE6866-48S8CQ/CE8851-32CQ8DQ/NE40E-X8A/S5735-L48T4X-A (5 models via ycict.net WP CDN) + Nokia 7220 IXR-D3L (documentation.nokia.com) + Nokia 7750 SR-14s (telecomcauliffe.com SR series): migration 069. 7 models, all HTTP 200 verified."}
|
|
||||||
{"d":"2026-04-21","t":"DATA","m":"Dell N3248TE-ON (networktigers) + S5248F-ON/S5296F-ON (i.dell.com Scene7 CDN) + Z9332F-ON/Z9664F-ON (expresscomputersystems Shopify) + Extreme Networks 8720-32C+X465-48P (sitecorecontenthub.cloud official CDN): migration 070. 7 models, all HTTP 200 verified."}
|
|
||||||
{"d":"2026-04-21","t":"DATA","m":"HPE Aruba CX 6300M-48G/8100-48Y6C/8360-32Y4C (blueally.com partner CDN) + Ubiquiti USW-EnterpriseXG-24/Pro-Aggregation/Pro-Max-48-PoE (cdn.ecomm.ui.com official) + Supermicro SSE-C4632SRB/SSE-T7132SR (wiredzone.com): migration 071. 8 models, all HTTP 200 verified."}
|
|
||||||
{"d":"2026-04-21","t":"DATA","m":"Celestica DS3000/DS4000/DS5000 (foleon.com Celestica CDN) + Asterfusion CX308P-48Y-N/CX532P-N/CX864E-N (asterfusion.com WP + cloudswit.ch) + FS.com N8560-32C/S5860-48SC (resource.fs.com) + Edgecore DCS810/EPS203 (edge-core.com WP): migration 072. 10 models, all HTTP 200 verified."}
|
|
||||||
{"d":"2026-04-26","t":"DATA","m":"OEM seed scrapers batch 1-20: keysight(25), sycamore(17), ekinops(18), adva(19), coriant(17), casa-systems(22), harmonic(23), solarflare(25), marvell(26), broadcom(23), calix-access(20), ribbon-comms(20), infinera-groove(20), ciena-waveserver(22), commscope(20), teleste(19), tejas-networks(19), ericsson-transport(20), adtran-ta(20), isolan(18). Scheduler daily 20:00-00:45."}
|
|
||||||
{"d":"2026-04-26","t":"DATA","m":"OEM seed scrapers batch 21-40: telco-systems(18), rad(20), comtrend(18), packetfront(18), edgewater-networks(16), corning(18), ofs(18), kontron(18), ipinfusion(18), telrad(16), siklu(16), ceragon(16), datang(16), viptela(16), versa-networks(16), vmware(16), cimc(18), qlogic(20), emulex(18), netapp(20). Scheduler daily 01:00-05:45."}
|
|
||||||
{"d":"2026-04-26","t":"DATA","m":"OEM seed scrapers batch 41-60: pure-storage(16), hpe-storage(20), ibm-storage(20), dell-storage(18), hitachi-vantara(16), aws(16), azure(16), google-cloud(16), meta(16), nokia-access(20), huawei-access(20), zte-access(18), calix-gigapoint(16), samsung-networks(16), nokia-airscale(16), ericsson-ran(16), mavenir(14), ixia(18), exfo-network(18), cumulus-networks(16). Scheduler daily 06:00-10:45."}
|
|
||||||
{"d":"2026-04-26","t":"DATA","m":"OEM seed scrapers batch 61-80: sonic(16), h3c(20), ruijie(17), centec(16), supermicro(18), cisco-meraki(18), cisco-catalyst(20), cisco-nexus(20), cisco-asr(20), juniper-mx(20), juniper-qfx(20), aruba-cx(18), extreme-campus(18), arista-7000(20), pica8(16), pluribus(14), drivenets(15), phoenix-contact(18), beckhoff(16), omron(16). Scheduler daily 11:00-15:45."}
|
|
||||||
{"d":"2026-04-26","t":"DATA","m":"OEM seed scrapers batch 81-84: abb(16), siemens-scalance(18), schneider(16), rockwell(16), belden(16). Industrial category. Scheduler daily 16:00-17:00."}
|
|
||||||
{"d":"2026-04-26","t":"FEAT","m":"tip-llm-guided.ts: Structured inference engine for tip-llm-v1. Hard JSON schema, per-field validation, 2-retry repair loop with diff prompt, safe default fallback (create_finding=false). Temperature 0.1→0.05 on retry. Routes: POST /api/tip-llm/infer|research-plan|extract|finding, GET /api/tip-llm/health."}
|
|
||||||
{"d":"2026-04-28","t":"FIX","m":"Product verification pipeline: image crawls now mark image_verified/image_verified_url, scraped product pages mark details_verified/details_source_url, maintenance reconcile backfills old product URLs/images/details, and --backfill-images exposes the existing image crawler via scraper CLI. Migration 102 reconciles existing data."}
|
|
||||||
{"d":"2026-04-28","t":"FIX","m":"Blog Engine Hot Topics: diversified ranking with refresh shuffle/source caps/already-created-topic demotion, plus richer LLM context briefings passed into topic expansion and master-draft context via custom_title/additional_context."}
|
|
||||||
{"d":"2026-04-29","t":"FEAT","m":"TIPLLM robot learning loop: verification robot controller writes status, TIPLLM plans, queue dry-runs/enqueues and crawler outcomes into the Gitea-backed TIP training pool; learning-pool build imports qa-pairs from TIP_TRAINING_REPO into the tip_llm lane. Removed hardcoded Gitea token fallback; existing git remotes or env tokens are used."}
|
|
||||||
@ -1,737 +0,0 @@
|
|||||||
# CODEX TASK: Flexoptix als Referenz-Katalog — Vollständige Equivalenz-Abdeckung
|
|
||||||
|
|
||||||
## Kontext & Problem
|
|
||||||
|
|
||||||
**Ziel:** Flexoptix-Katalog = absoluter Anker für ALLE Wettbewerbs-Preise in TIP.
|
|
||||||
Kein einziger Flexoptix-Artikel darf ohne Wettbewerbspreise sein.
|
|
||||||
|
|
||||||
**Aktueller Zustand (Stand 2026-05-13 — aus live DB-Analyse):**
|
|
||||||
|
|
||||||
```
|
|
||||||
Vendor | Products priced | FO matches (approved)
|
|
||||||
----------------|-----------------|----------------------
|
|
||||||
ATGBICS | 8.260 | 0 ← KRITISCH
|
|
||||||
NADDOD | 744 | 0 ← KRITISCH
|
|
||||||
ShopFiber24 | 312 | 0 ← KRITISCH
|
|
||||||
10Gtek | 49 | 0 ← KRITISCH
|
|
||||||
FiberMall | 304 | 1.011 ← OK
|
|
||||||
QSFPTEK | 206 | 162 ← OK
|
|
||||||
Fluxlight | 119 | 604 ← OK
|
|
||||||
SFPcables | 78 | 76 ← OK
|
|
||||||
GBICS | 72 | 133 ← OK
|
|
||||||
```
|
|
||||||
|
|
||||||
**Ursachen (diagnostiziert):**
|
|
||||||
|
|
||||||
1. **30-Tage-Filter im Matcher** — Der Query in `scheduler.ts` filtert Kandidaten via
|
|
||||||
`po.time > NOW() - INTERVAL '30 days'`. ATGBICS/NADDOD/10Gtek/ShopFiber24 haben
|
|
||||||
Preis-Observations, aber diese sind älter als 30 Tage → werden als Kandidaten
|
|
||||||
komplett ausgeschlossen.
|
|
||||||
|
|
||||||
2. **form_factor-Normalisierung fehlt** — Verschiedene Scraper schreiben unterschiedliche
|
|
||||||
Werte: `"SFP+"` vs `"SFP-Plus"` vs `"SFP+ (LC)"`. Der Matcher filtert exakt per
|
|
||||||
`t.form_factor = $1`, daher 0 Kandidaten bei abweichendem Format.
|
|
||||||
|
|
||||||
3. **Kein Initial-Reconcile** — Der Matcher läuft nur nightly für `competitor_verified = false`.
|
|
||||||
Es gibt keinen einmaligen Bulk-Reconcile der gesamten Katalog-Überschneidung.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Repository
|
|
||||||
|
|
||||||
```
|
|
||||||
Pfad (lokal): /Users/renefichtmueller/Desktop/Claude Code/github-repos/transceiver-db/
|
|
||||||
Pfad (Erik): /opt/tip/
|
|
||||||
Stack: TypeScript, Node.js, PostgreSQL 17, pg-boss
|
|
||||||
Haupt-Dateien: packages/scraper/src/scheduler.ts ← Matcher, pg-boss Jobs
|
|
||||||
packages/scraper/src/robots/ ← Neues Robot-Modul hier rein
|
|
||||||
sql/ ← Migrationen (aktuell 107 vorhanden)
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## SCHRITT 1: Diagnose-Migration — form_factor-Normalisierungsübersicht
|
|
||||||
|
|
||||||
**Erstelle Datei:** `sql/108-form-factor-normalization.sql`
|
|
||||||
|
|
||||||
```sql
|
|
||||||
-- Migration 108: form_factor normalisieren (alle Schreibweisen → canonical)
|
|
||||||
-- Zweck: Matcher findet Kandidaten nur bei exaktem form_factor-Match.
|
|
||||||
-- Verschiedene Scraper schreiben inkonsistente Werte.
|
|
||||||
|
|
||||||
-- 1. Canonical-Mapping anwenden
|
|
||||||
UPDATE transceivers SET
|
|
||||||
form_factor = CASE
|
|
||||||
-- SFP Varianten
|
|
||||||
WHEN UPPER(TRIM(form_factor)) IN ('SFP', 'SFP (LC)', 'SFP DDM', 'SFP MODULE', '1G SFP', 'GLC', 'MINI-GBIC')
|
|
||||||
THEN 'SFP'
|
|
||||||
-- SFP+ Varianten
|
|
||||||
WHEN UPPER(TRIM(form_factor)) IN ('SFP+', 'SFP+ (LC)', 'SFP-PLUS', 'SFP PLUS', 'SFP+ DDM',
|
|
||||||
'SFP+ MODULE', '10G SFP+', 'SFP+ OPTICAL', '10GSFP+',
|
|
||||||
'SFP+/SFP28 COMPATIBLE', 'SFP+(LC)')
|
|
||||||
THEN 'SFP+'
|
|
||||||
-- SFP28 Varianten (25G)
|
|
||||||
WHEN UPPER(TRIM(form_factor)) IN ('SFP28', 'SFP-28', 'SFP 28', '25G SFP28', 'SFP28 (LC)',
|
|
||||||
'SFP28 DDM', '25GSFP28')
|
|
||||||
THEN 'SFP28'
|
|
||||||
-- QSFP+ Varianten (40G)
|
|
||||||
WHEN UPPER(TRIM(form_factor)) IN ('QSFP+', 'QSFP-PLUS', 'QSFP PLUS', '40G QSFP+',
|
|
||||||
'QSFP+ (MPO)', 'QSFP+ MODULE', 'QSFP+ DDM', '40GQSFP+')
|
|
||||||
THEN 'QSFP+'
|
|
||||||
-- QSFP28 Varianten (100G)
|
|
||||||
WHEN UPPER(TRIM(form_factor)) IN ('QSFP28', 'QSFP-28', 'QSFP 28', '100G QSFP28',
|
|
||||||
'QSFP28 (LC)', 'QSFP28 MODULE', 'QSFP28 DDM', '100GQSFP28')
|
|
||||||
THEN 'QSFP28'
|
|
||||||
-- QSFP56 Varianten (200G)
|
|
||||||
WHEN UPPER(TRIM(form_factor)) IN ('QSFP56', 'QSFP-56', '200G QSFP56', 'QSFP56-DD')
|
|
||||||
THEN 'QSFP56'
|
|
||||||
-- QSFP-DD Varianten (400G)
|
|
||||||
WHEN UPPER(TRIM(form_factor)) IN ('QSFP-DD', 'QSFPDD', 'QSFP DD', '400G QSFP-DD',
|
|
||||||
'QSFP-DD MODULE', 'QSFP56-DD 400G')
|
|
||||||
THEN 'QSFP-DD'
|
|
||||||
-- QSFP-DD800 / 800G
|
|
||||||
WHEN UPPER(TRIM(form_factor)) IN ('QSFP-DD800', 'QSFP-DD 800G', '800G QSFP-DD', 'OSFP-RHS')
|
|
||||||
THEN 'QSFP-DD800'
|
|
||||||
-- OSFP Varianten
|
|
||||||
WHEN UPPER(TRIM(form_factor)) IN ('OSFP', 'OSFP MODULE', '400G OSFP', '800G OSFP')
|
|
||||||
THEN 'OSFP'
|
|
||||||
-- CFP Varianten
|
|
||||||
WHEN UPPER(TRIM(form_factor)) IN ('CFP', 'CFP2', 'CFP4', 'CFP-DCO', 'CFP2-DCO')
|
|
||||||
THEN UPPER(TRIM(form_factor))
|
|
||||||
-- XFP
|
|
||||||
WHEN UPPER(TRIM(form_factor)) IN ('XFP', '10G XFP', 'XFP DDM')
|
|
||||||
THEN 'XFP'
|
|
||||||
-- X2 / XENPAK
|
|
||||||
WHEN UPPER(TRIM(form_factor)) IN ('X2', 'XENPAK', 'X2 MODULE')
|
|
||||||
THEN UPPER(TRIM(form_factor))
|
|
||||||
-- DAC Cable-Typen (kein optisches Modul — form_factor trotzdem normalisieren)
|
|
||||||
WHEN UPPER(form_factor) LIKE '%DAC%' AND UPPER(form_factor) LIKE '%QSFP28%' THEN 'QSFP28-DAC'
|
|
||||||
WHEN UPPER(form_factor) LIKE '%DAC%' AND UPPER(form_factor) LIKE '%QSFP+%' THEN 'QSFP+-DAC'
|
|
||||||
WHEN UPPER(form_factor) LIKE '%DAC%' AND UPPER(form_factor) LIKE '%SFP28%' THEN 'SFP28-DAC'
|
|
||||||
WHEN UPPER(form_factor) LIKE '%DAC%' AND UPPER(form_factor) LIKE '%SFP+%' THEN 'SFP+-DAC'
|
|
||||||
WHEN UPPER(form_factor) LIKE '%AOC%' AND UPPER(form_factor) LIKE '%QSFP28%' THEN 'QSFP28-AOC'
|
|
||||||
WHEN UPPER(form_factor) LIKE '%AOC%' AND UPPER(form_factor) LIKE '%QSFP+%' THEN 'QSFP+-AOC'
|
|
||||||
WHEN UPPER(form_factor) LIKE '%AOC%' AND UPPER(form_factor) LIKE '%SFP28%' THEN 'SFP28-AOC'
|
|
||||||
WHEN UPPER(form_factor) LIKE '%AOC%' AND UPPER(form_factor) LIKE '%SFP+%' THEN 'SFP+-AOC'
|
|
||||||
ELSE form_factor -- unbekannte Werte unverändert lassen
|
|
||||||
END
|
|
||||||
WHERE form_factor IS NOT NULL;
|
|
||||||
|
|
||||||
-- 2. speed_gbps normalisieren (sicherstellen: keine String-Artefakte)
|
|
||||||
-- Manche Scraper speichern '10.0', '10.00', '1.0' statt '10', '1' → numerisch aber inkonsistent
|
|
||||||
-- Da speed_gbps NUMERIC ist, normalisieren auf saubere Dezimalstellen
|
|
||||||
UPDATE transceivers SET
|
|
||||||
speed_gbps = ROUND(speed_gbps::NUMERIC, 2)
|
|
||||||
WHERE speed_gbps IS NOT NULL;
|
|
||||||
|
|
||||||
-- 3. Loggable Übersicht: welche form_factor-Werte noch unbekannt sind
|
|
||||||
DO $$
|
|
||||||
DECLARE
|
|
||||||
rec RECORD;
|
|
||||||
BEGIN
|
|
||||||
RAISE NOTICE '=== Unbekannte form_factor Werte (keine Normalisierung angewendet) ===';
|
|
||||||
FOR rec IN
|
|
||||||
SELECT form_factor, COUNT(*) as cnt
|
|
||||||
FROM transceivers
|
|
||||||
WHERE form_factor NOT IN (
|
|
||||||
'SFP','SFP+','SFP28','QSFP+','QSFP28','QSFP56','QSFP-DD','QSFP-DD800','OSFP',
|
|
||||||
'CFP','CFP2','CFP4','CFP-DCO','CFP2-DCO','XFP','X2','XENPAK',
|
|
||||||
'SFP-DAC','SFP+-DAC','SFP28-DAC','QSFP+-DAC','QSFP28-DAC',
|
|
||||||
'SFP+-AOC','SFP28-AOC','QSFP+-AOC','QSFP28-AOC'
|
|
||||||
)
|
|
||||||
AND form_factor IS NOT NULL
|
|
||||||
GROUP BY form_factor ORDER BY cnt DESC LIMIT 30
|
|
||||||
LOOP
|
|
||||||
RAISE NOTICE ' %: % transceivers', rec.form_factor, rec.cnt;
|
|
||||||
END LOOP;
|
|
||||||
END;
|
|
||||||
$$;
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## SCHRITT 2: Full-Catalog-Reconcile-Robot
|
|
||||||
|
|
||||||
**Erstelle Datei:** `packages/scraper/src/robots/catalog-reconcile.ts`
|
|
||||||
|
|
||||||
Dieser Robot ist ein einmaliger Bulk-Matcher der ALLE Flexoptix-Produkte gegen
|
|
||||||
ALLE Wettbewerber abgleicht — ohne 30-Tage-Fenster, ohne `competitor_verified`-Filter.
|
|
||||||
Er nutzt die gleiche Logik wie der bestehende Nightly-Matcher, aber breiter.
|
|
||||||
|
|
||||||
```typescript
|
|
||||||
/**
|
|
||||||
* Catalog Reconcile Robot
|
|
||||||
*
|
|
||||||
* Vollständiger Bulk-Abgleich Flexoptix ↔ ALLE Wettbewerber.
|
|
||||||
*
|
|
||||||
* Unterschiede zum Nightly-Matcher (maintenance:find-equivalences):
|
|
||||||
* - Kein 30-Tage-Fenster für price_observations — alle Produkte mit JEMALS
|
|
||||||
* beobachteten Preisen werden als Kandidaten gewertet
|
|
||||||
* - Kein competitor_verified-Filter — auch bereits gematchte FX-Produkte
|
|
||||||
* bekommen neue Matches wenn neue Wettbewerberprodukte hinzukommen
|
|
||||||
* - Batch-Verarbeitung mit commit nach jeweils 100 Matches
|
|
||||||
* - Vollständiges Reporting am Ende
|
|
||||||
*
|
|
||||||
* Trigger: pg-boss Job "catalog:reconcile" (on-demand oder monatlich)
|
|
||||||
* Laufzeit: ~5–15 Minuten bei 1.000+ FX-Produkten
|
|
||||||
*/
|
|
||||||
|
|
||||||
import { pool } from "../utils/db";
|
|
||||||
|
|
||||||
export interface ReconcileResult {
|
|
||||||
flexoptixProcessed: number;
|
|
||||||
newAutoApproved: number;
|
|
||||||
newPending: number;
|
|
||||||
skippedLowConfidence: number;
|
|
||||||
skippedAlreadyMatched: number;
|
|
||||||
vendorBreakdown: Record<string, { autoApproved: number; pending: number }>;
|
|
||||||
durationMs: number;
|
|
||||||
}
|
|
||||||
|
|
||||||
// ── Konfigurations-Konstanten ────────────────────────────────────────────────
|
|
||||||
|
|
||||||
/** Minimum-Confidence für pending-Eintrag (unter diesem Schwellwert: ignorieren) */
|
|
||||||
const CONFIDENCE_MIN = 0.50;
|
|
||||||
|
|
||||||
/** Confidence-Schwellwert für auto_approved */
|
|
||||||
const CONFIDENCE_AUTO_APPROVE = 0.73;
|
|
||||||
|
|
||||||
/**
|
|
||||||
* Maximale Anzahl Tage seit letzter price_observation.
|
|
||||||
* NULL = kein Filter (alle Produkte mit mind. 1 Observation).
|
|
||||||
* Für Full-Reconcile: NULL.
|
|
||||||
*/
|
|
||||||
const MAX_PRICE_AGE_DAYS: number | null = null;
|
|
||||||
|
|
||||||
// ── Helper: Wellenlänge aus Text extrahieren ────────────────────────────────
|
|
||||||
|
|
||||||
function extractFirstNm(wavelengths: string | null): number | null {
|
|
||||||
if (!wavelengths) return null;
|
|
||||||
const m = wavelengths.match(/(\d{3,4})/);
|
|
||||||
return m ? parseInt(m[1], 10) : null;
|
|
||||||
}
|
|
||||||
|
|
||||||
// ── Haupt-Matching-Logik (identisch mit Nightly-Matcher) ────────────────────
|
|
||||||
|
|
||||||
function calcConfidence(
|
|
||||||
fx: { standard_name: string | null; fiber_type: string | null; reach_meters: number | null; wavelengths: string | null },
|
|
||||||
cand: { standard_name: string | null; fiber_type: string | null; reach_meters: number | null; wavelengths: string | null }
|
|
||||||
): { confidence: number; basis: string[] } {
|
|
||||||
// Max-Score: form_factor(25) + speed_gbps(20) + standard_name(30) +
|
|
||||||
// wavelength_nm(20) + fiber_type(10) + reach(10) = 115
|
|
||||||
// Beide form_factor und speed_gbps sind bereits durch den SQL-Filter gesichert.
|
|
||||||
let score = 0;
|
|
||||||
const basis: string[] = [];
|
|
||||||
|
|
||||||
score += 25; basis.push("form_factor");
|
|
||||||
score += 20; basis.push("speed_gbps");
|
|
||||||
|
|
||||||
if (
|
|
||||||
fx.standard_name && cand.standard_name &&
|
|
||||||
fx.standard_name.trim().toUpperCase() === cand.standard_name.trim().toUpperCase()
|
|
||||||
) {
|
|
||||||
score += 30; basis.push("standard_name");
|
|
||||||
}
|
|
||||||
|
|
||||||
const fxNm = extractFirstNm(fx.wavelengths);
|
|
||||||
const candNm = extractFirstNm(cand.wavelengths);
|
|
||||||
if (fxNm !== null && candNm !== null) {
|
|
||||||
if (Math.abs(fxNm - candNm) <= 15) {
|
|
||||||
score += 20; basis.push(`wavelength_${fxNm}nm`);
|
|
||||||
} else {
|
|
||||||
score -= 20;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
if (fx.fiber_type && cand.fiber_type) {
|
|
||||||
if (fx.fiber_type.trim().toUpperCase() === cand.fiber_type.trim().toUpperCase()) {
|
|
||||||
score += 10; basis.push("fiber_type");
|
|
||||||
} else {
|
|
||||||
score -= 15;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
if (fx.reach_meters && cand.reach_meters && fx.reach_meters > 0 && cand.reach_meters > 0) {
|
|
||||||
const diff = Math.abs(fx.reach_meters - cand.reach_meters);
|
|
||||||
const tolerance = Math.max(fx.reach_meters, 1) * 0.25;
|
|
||||||
if (diff <= tolerance) {
|
|
||||||
score += 10; basis.push("reach");
|
|
||||||
} else {
|
|
||||||
score -= 15;
|
|
||||||
}
|
|
||||||
} else if (!fx.reach_meters && !cand.reach_meters) {
|
|
||||||
score += 5; basis.push("reach_null");
|
|
||||||
}
|
|
||||||
|
|
||||||
const confidence = Math.max(0, Math.min(1, score / 115));
|
|
||||||
return { confidence, basis };
|
|
||||||
}
|
|
||||||
|
|
||||||
// ── Haupt-Funktion ───────────────────────────────────────────────────────────
|
|
||||||
|
|
||||||
export async function runCatalogReconcile(): Promise<ReconcileResult> {
|
|
||||||
const startMs = Date.now();
|
|
||||||
console.log("=== Catalog Reconcile Robot ===");
|
|
||||||
console.log(` Started: ${new Date().toISOString()}`);
|
|
||||||
console.log(` Mode: FULL (no 30-day window, all vendors)`);
|
|
||||||
|
|
||||||
const result: ReconcileResult = {
|
|
||||||
flexoptixProcessed: 0,
|
|
||||||
newAutoApproved: 0,
|
|
||||||
newPending: 0,
|
|
||||||
skippedLowConfidence: 0,
|
|
||||||
skippedAlreadyMatched: 0,
|
|
||||||
vendorBreakdown: {},
|
|
||||||
} as ReconcileResult;
|
|
||||||
|
|
||||||
// ── Alle Flexoptix-Produkte laden ─────────────────────────────────────────
|
|
||||||
// Kein competitor_verified-Filter → wir reconcilen ALLES
|
|
||||||
const { rows: fxProducts } = await pool.query<{
|
|
||||||
id: string;
|
|
||||||
part_number: string;
|
|
||||||
standard_name: string | null;
|
|
||||||
form_factor: string | null;
|
|
||||||
speed_gbps: string | null;
|
|
||||||
fiber_type: string | null;
|
|
||||||
reach_meters: number | null;
|
|
||||||
wavelengths: string | null;
|
|
||||||
}>(`
|
|
||||||
SELECT t.id, t.part_number, t.standard_name, t.form_factor,
|
|
||||||
t.speed_gbps, t.fiber_type, t.reach_meters, t.wavelengths
|
|
||||||
FROM transceivers t
|
|
||||||
JOIN vendors v ON v.id = t.vendor_id
|
|
||||||
WHERE UPPER(v.name) LIKE '%FLEXOPTIX%'
|
|
||||||
AND t.form_factor IS NOT NULL
|
|
||||||
AND t.speed_gbps IS NOT NULL
|
|
||||||
ORDER BY t.part_number
|
|
||||||
`);
|
|
||||||
|
|
||||||
result.flexoptixProcessed = fxProducts.length;
|
|
||||||
console.log(` Flexoptix products to process: ${fxProducts.length}`);
|
|
||||||
|
|
||||||
const priceAgeFilter = MAX_PRICE_AGE_DAYS !== null
|
|
||||||
? `AND po.time > NOW() - INTERVAL '${MAX_PRICE_AGE_DAYS} days'`
|
|
||||||
: "";
|
|
||||||
|
|
||||||
for (const fx of fxProducts) {
|
|
||||||
if (!fx.form_factor || !fx.speed_gbps) continue;
|
|
||||||
|
|
||||||
// ── Wettbewerber-Kandidaten für dieses FX-Produkt ──────────────────────
|
|
||||||
// Kandidaten = alle Wettbewerber mit gleichem form_factor und speed_gbps
|
|
||||||
// die mindestens 1 price_observation haben (kein Zeitlimit)
|
|
||||||
const { rows: candidates } = await pool.query<{
|
|
||||||
competitor_id: string;
|
|
||||||
part_number: string;
|
|
||||||
standard_name: string | null;
|
|
||||||
form_factor: string | null;
|
|
||||||
speed_gbps: string | null;
|
|
||||||
fiber_type: string | null;
|
|
||||||
reach_meters: number | null;
|
|
||||||
wavelengths: string | null;
|
|
||||||
vendor_name: string;
|
|
||||||
last_price: Date | null;
|
|
||||||
price_count: string;
|
|
||||||
}>(`
|
|
||||||
SELECT t.id AS competitor_id, t.part_number, t.standard_name,
|
|
||||||
t.form_factor, t.speed_gbps, t.fiber_type, t.reach_meters,
|
|
||||||
t.wavelengths, v.name AS vendor_name,
|
|
||||||
MAX(po.time) AS last_price, COUNT(DISTINCT po.id) AS price_count
|
|
||||||
FROM transceivers t
|
|
||||||
JOIN vendors v ON v.id = t.vendor_id
|
|
||||||
JOIN price_observations po ON po.transceiver_id = t.id
|
|
||||||
WHERE UPPER(v.name) NOT LIKE '%FLEXOPTIX%'
|
|
||||||
AND v.is_competitor = true
|
|
||||||
${priceAgeFilter}
|
|
||||||
AND UPPER(t.form_factor) = UPPER($1)
|
|
||||||
AND ROUND(t.speed_gbps::NUMERIC, 2) = ROUND($2::NUMERIC, 2)
|
|
||||||
AND t.id != $3
|
|
||||||
GROUP BY t.id, t.part_number, t.standard_name, t.form_factor,
|
|
||||||
t.speed_gbps, t.fiber_type, t.reach_meters, t.wavelengths, v.name
|
|
||||||
HAVING COUNT(DISTINCT po.id) >= 1
|
|
||||||
`, [fx.form_factor, fx.speed_gbps, fx.id]);
|
|
||||||
|
|
||||||
for (const cand of candidates) {
|
|
||||||
const { confidence, basis } = calcConfidence(fx, cand);
|
|
||||||
|
|
||||||
if (confidence < CONFIDENCE_MIN) {
|
|
||||||
result.skippedLowConfidence++;
|
|
||||||
continue;
|
|
||||||
}
|
|
||||||
|
|
||||||
const status = confidence >= CONFIDENCE_AUTO_APPROVE ? "auto_approved" : "pending";
|
|
||||||
const notes =
|
|
||||||
`${fx.part_number} ↔ ${cand.part_number} (${cand.vendor_name}) | ` +
|
|
||||||
`basis: ${basis.join(", ")} | reach: ${fx.reach_meters}m vs ${cand.reach_meters}m | ` +
|
|
||||||
`wavelength: ${fx.wavelengths ?? "?"} vs ${cand.wavelengths ?? "?"} | ` +
|
|
||||||
`last_price: ${cand.last_price?.toISOString() ?? "never"} | ` +
|
|
||||||
`source: catalog-reconcile`;
|
|
||||||
|
|
||||||
// Upsert — bereits approved/rejected Einträge nicht überschreiben
|
|
||||||
const { rowCount } = await pool.query(`
|
|
||||||
INSERT INTO transceiver_equivalences
|
|
||||||
(flexoptix_id, competitor_id, confidence, match_basis, match_notes, status)
|
|
||||||
VALUES ($1, $2, $3, $4, $5, $6)
|
|
||||||
ON CONFLICT (flexoptix_id, competitor_id) DO UPDATE SET
|
|
||||||
confidence = EXCLUDED.confidence,
|
|
||||||
match_basis = EXCLUDED.match_basis,
|
|
||||||
match_notes = EXCLUDED.match_notes,
|
|
||||||
updated_at = NOW()
|
|
||||||
WHERE transceiver_equivalences.status NOT IN ('approved', 'rejected', 'auto_approved')
|
|
||||||
`, [fx.id, cand.competitor_id, confidence, basis, notes, status]);
|
|
||||||
|
|
||||||
const wasInsertOrUpdate = (rowCount ?? 0) > 0;
|
|
||||||
if (!wasInsertOrUpdate) {
|
|
||||||
result.skippedAlreadyMatched++;
|
|
||||||
continue;
|
|
||||||
}
|
|
||||||
|
|
||||||
// Vendor-Breakdown tracken
|
|
||||||
if (!result.vendorBreakdown[cand.vendor_name]) {
|
|
||||||
result.vendorBreakdown[cand.vendor_name] = { autoApproved: 0, pending: 0 };
|
|
||||||
}
|
|
||||||
|
|
||||||
if (status === "auto_approved") {
|
|
||||||
result.newAutoApproved++;
|
|
||||||
result.vendorBreakdown[cand.vendor_name].autoApproved++;
|
|
||||||
|
|
||||||
// competitor_verified auf FX-Produkt setzen
|
|
||||||
await pool.query(`
|
|
||||||
UPDATE transceivers
|
|
||||||
SET competitor_verified = true,
|
|
||||||
competitor_verified_at = NOW(),
|
|
||||||
competitor_status = 'matched',
|
|
||||||
competitor_status_updated_at = NOW()
|
|
||||||
WHERE id = $1 AND competitor_verified = false
|
|
||||||
`, [fx.id]);
|
|
||||||
} else {
|
|
||||||
result.newPending++;
|
|
||||||
result.vendorBreakdown[cand.vendor_name].pending++;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
result.durationMs = Date.now() - startMs;
|
|
||||||
|
|
||||||
// ── Abschluss-Report ───────────────────────────────────────────────────────
|
|
||||||
console.log("\n=== Catalog Reconcile Results ===");
|
|
||||||
console.log(` Flexoptix processed: ${result.flexoptixProcessed}`);
|
|
||||||
console.log(` New auto_approved: ${result.newAutoApproved}`);
|
|
||||||
console.log(` New pending: ${result.newPending}`);
|
|
||||||
console.log(` Skipped (low confidence): ${result.skippedLowConfidence}`);
|
|
||||||
console.log(` Skipped (already matched): ${result.skippedAlreadyMatched}`);
|
|
||||||
console.log(` Duration: ${(result.durationMs / 1000).toFixed(1)}s`);
|
|
||||||
console.log("\n Vendor Breakdown:");
|
|
||||||
for (const [vendor, counts] of Object.entries(result.vendorBreakdown).sort(
|
|
||||||
(a, b) => (b[1].autoApproved + b[1].pending) - (a[1].autoApproved + a[1].pending)
|
|
||||||
)) {
|
|
||||||
console.log(` ${vendor.padEnd(20)} auto_approved=${counts.autoApproved} pending=${counts.pending}`);
|
|
||||||
}
|
|
||||||
console.log("=================================\n");
|
|
||||||
|
|
||||||
return result;
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## SCHRITT 3: scheduler.ts — neuen Job registrieren
|
|
||||||
|
|
||||||
**Ändere in:** `packages/scraper/src/scheduler.ts`
|
|
||||||
|
|
||||||
### 3a: Queue in der Pre-Create-Liste hinzufügen
|
|
||||||
|
|
||||||
Suche nach dem Block mit `"sync:flexoptix-catalog"` in der queues-Liste und füge hinzu:
|
|
||||||
|
|
||||||
```typescript
|
|
||||||
"catalog:reconcile",
|
|
||||||
```
|
|
||||||
|
|
||||||
### 3b: Worker registrieren
|
|
||||||
|
|
||||||
Im Block wo `boss.work()` aufgerufen wird, nach dem `sync:flexoptix-catalog` Worker einfügen:
|
|
||||||
|
|
||||||
```typescript
|
|
||||||
// ── Catalog Reconcile — Full Bulk Match ─────────────────────────────────────
|
|
||||||
await boss.work("catalog:reconcile", async () => {
|
|
||||||
const ts = new Date().toISOString();
|
|
||||||
console.log(`[${ts}] Running: Full Catalog Reconcile (Flexoptix ↔ ALL competitors)`);
|
|
||||||
const { runCatalogReconcile } = await import("./robots/catalog-reconcile");
|
|
||||||
const result = await runCatalogReconcile();
|
|
||||||
console.log(`[catalog:reconcile] Done: ${result.newAutoApproved} auto_approved, ${result.newPending} pending`);
|
|
||||||
});
|
|
||||||
```
|
|
||||||
|
|
||||||
### 3c: Schedule registrieren
|
|
||||||
|
|
||||||
Nach dem `sync:flexoptix-catalog` Schedule:
|
|
||||||
|
|
||||||
```typescript
|
|
||||||
// Catalog Reconcile: monatlich Sonntag 04:00 UTC
|
|
||||||
// + on-demand via POST /api/maintenance/run { job: "catalog:reconcile" }
|
|
||||||
await boss.schedule("catalog:reconcile", "0 4 1 * *", {}, {
|
|
||||||
retryLimit: 2,
|
|
||||||
expireInSeconds: 3600,
|
|
||||||
});
|
|
||||||
```
|
|
||||||
|
|
||||||
### 3d: Nightly-Matcher erweitern (30-Tage-Filter lockern)
|
|
||||||
|
|
||||||
Im bestehenden `maintenance:find-equivalences` Worker, den Kandidaten-Query ändern:
|
|
||||||
|
|
||||||
**Vorher:**
|
|
||||||
```sql
|
|
||||||
JOIN price_observations po ON po.transceiver_id = t.id
|
|
||||||
WHERE UPPER(v.name) NOT LIKE '%FLEXOPTIX%'
|
|
||||||
AND po.time > NOW() - INTERVAL '30 days'
|
|
||||||
AND t.form_factor = $1
|
|
||||||
AND t.speed_gbps = $2
|
|
||||||
```
|
|
||||||
|
|
||||||
**Nachher (90-Tage-Fenster + UPPER-normalisiert):**
|
|
||||||
```sql
|
|
||||||
JOIN price_observations po ON po.transceiver_id = t.id
|
|
||||||
WHERE UPPER(v.name) NOT LIKE '%FLEXOPTIX%'
|
|
||||||
AND po.time > NOW() - INTERVAL '90 days'
|
|
||||||
AND UPPER(t.form_factor) = UPPER($1)
|
|
||||||
AND ROUND(t.speed_gbps::NUMERIC, 2) = ROUND($2::NUMERIC, 2)
|
|
||||||
AND t.id != $3
|
|
||||||
```
|
|
||||||
|
|
||||||
Diese Änderung bewirkt, dass ATGBICS/NADDOD/10Gtek/ShopFiber24 — die zuletzt vor >30
|
|
||||||
aber <90 Tagen gescrapt wurden — ab sofort als Kandidaten erscheinen.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## SCHRITT 4: API-Endpunkt — On-Demand Reconcile
|
|
||||||
|
|
||||||
**Ändere in:** `packages/api/src/routes/` (existierende maintenance-Route oder neue Datei)
|
|
||||||
|
|
||||||
Suche nach einer existierenden maintenance/admin Route-Datei. Falls vorhanden,
|
|
||||||
dort einen neuen Endpunkt einfügen. Falls nicht vorhanden, erstelle:
|
|
||||||
`packages/api/src/routes/maintenance.ts`
|
|
||||||
|
|
||||||
```typescript
|
|
||||||
import { FastifyPluginAsync } from "fastify";
|
|
||||||
import PgBoss from "pg-boss";
|
|
||||||
|
|
||||||
export const maintenanceRoutes: FastifyPluginAsync = async (fastify) => {
|
|
||||||
// POST /api/maintenance/run — startet einen pg-boss Job on-demand
|
|
||||||
fastify.post<{ Body: { job: string } }>("/run", {
|
|
||||||
schema: {
|
|
||||||
body: {
|
|
||||||
type: "object",
|
|
||||||
required: ["job"],
|
|
||||||
properties: {
|
|
||||||
job: {
|
|
||||||
type: "string",
|
|
||||||
enum: [
|
|
||||||
"catalog:reconcile",
|
|
||||||
"maintenance:find-equivalences",
|
|
||||||
"sync:flexoptix-catalog",
|
|
||||||
"enrich:wavelength",
|
|
||||||
],
|
|
||||||
},
|
|
||||||
},
|
|
||||||
},
|
|
||||||
},
|
|
||||||
}, async (request, reply) => {
|
|
||||||
const boss = fastify.boss as PgBoss; // boss muss als Fastify-Decoration verfügbar sein
|
|
||||||
|
|
||||||
const jobId = await boss.send(request.body.job, {});
|
|
||||||
return reply.code(202).send({
|
|
||||||
success: true,
|
|
||||||
job: request.body.job,
|
|
||||||
jobId,
|
|
||||||
message: `Job ${request.body.job} enqueued`,
|
|
||||||
});
|
|
||||||
});
|
|
||||||
|
|
||||||
// GET /api/maintenance/equivalences/summary — Abdeckungs-Report
|
|
||||||
fastify.get("/equivalences/summary", async (_request, reply) => {
|
|
||||||
const { pool } = await import("../utils/db");
|
|
||||||
|
|
||||||
const { rows } = await pool.query(`
|
|
||||||
SELECT
|
|
||||||
v.name AS vendor,
|
|
||||||
COUNT(DISTINCT t.id)::int AS products_with_prices,
|
|
||||||
COUNT(DISTINCT te.flexoptix_id)::int AS fo_matches_approved,
|
|
||||||
ROUND(
|
|
||||||
100.0 * COUNT(DISTINCT te.flexoptix_id) / NULLIF(
|
|
||||||
(SELECT COUNT(*) FROM transceivers fx
|
|
||||||
JOIN vendors fxv ON fxv.id = fx.vendor_id
|
|
||||||
WHERE UPPER(fxv.name) LIKE '%FLEXOPTIX%'),
|
|
||||||
0
|
|
||||||
), 1
|
|
||||||
)::float AS fo_coverage_pct
|
|
||||||
FROM vendors v
|
|
||||||
JOIN transceivers t ON t.vendor_id = v.id
|
|
||||||
JOIN price_observations po ON po.transceiver_id = t.id
|
|
||||||
LEFT JOIN transceiver_equivalences te ON te.competitor_id = t.id
|
|
||||||
AND te.status IN ('approved','auto_approved')
|
|
||||||
WHERE v.is_competitor = true
|
|
||||||
AND UPPER(v.name) NOT LIKE '%FLEXOPTIX%'
|
|
||||||
GROUP BY v.id, v.name
|
|
||||||
HAVING COUNT(DISTINCT t.id) >= 5
|
|
||||||
ORDER BY fo_matches_approved DESC, products_with_prices DESC
|
|
||||||
`);
|
|
||||||
|
|
||||||
return reply.send({ success: true, data: rows });
|
|
||||||
});
|
|
||||||
};
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## SCHRITT 5: Migration für Nightly-Matcher-Bugfix
|
|
||||||
|
|
||||||
**Erstelle Datei:** `sql/109-fix-nightly-matcher-time-window.sql`
|
|
||||||
|
|
||||||
```sql
|
|
||||||
-- Migration 109: Dokumentiert den 30→90 Tage Bugfix im Nightly-Matcher
|
|
||||||
-- Dieser SQL ist reine Dokumentation — die eigentliche Änderung ist in scheduler.ts
|
|
||||||
|
|
||||||
COMMENT ON TABLE transceiver_equivalences IS
|
|
||||||
'Flexoptix-zentrierter Equivalenz-Graph: flexoptix_id = Referenz-Anker,
|
|
||||||
competitor_id = äquivalentes Konkurrenzprodukt.
|
|
||||||
Status: pending (review nötig), auto_approved (Confidence ≥0.73),
|
|
||||||
approved (manuell), rejected (explizit ausgeschlossen).
|
|
||||||
KRITISCH: Matcher nutzt 90-Tage-Fenster (war: 30 Tage) damit Vendors
|
|
||||||
mit seltener Preisbeobachtung (ATGBICS/NADDOD/10Gtek/ShopFiber24) gefunden werden.';
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## SCHRITT 6: One-Shot Reconcile nach Deployment ausführen
|
|
||||||
|
|
||||||
**Nachdem alles deployed ist**, einmalig auf Erik ausführen:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Auf Erik: Migration 108 ausführen
|
|
||||||
psql -U tip -d transceiver_intelligence -f /opt/tip/sql/108-form-factor-normalization.sql
|
|
||||||
psql -U tip -d transceiver_intelligence -f /opt/tip/sql/109-fix-nightly-matcher-time-window.sql
|
|
||||||
|
|
||||||
# PM2 Daemon neu starten
|
|
||||||
pm2 restart tip-scraper-daemon --update-env
|
|
||||||
|
|
||||||
# 2 Minuten warten bis pg-boss Workers registriert
|
|
||||||
sleep 120
|
|
||||||
|
|
||||||
# Full Catalog Reconcile on-demand triggern
|
|
||||||
curl -s -X POST http://localhost:3001/api/maintenance/run \
|
|
||||||
-H "Content-Type: application/json" \
|
|
||||||
-d '{"job": "catalog:reconcile"}' | jq .
|
|
||||||
|
|
||||||
# Coverage-Report abrufen
|
|
||||||
curl -s http://localhost:3001/api/maintenance/equivalences/summary | jq '.data | .[:10]'
|
|
||||||
```
|
|
||||||
|
|
||||||
**Erwartetes Ergebnis des Reconcile:**
|
|
||||||
```
|
|
||||||
ATGBICS: ~500-2.000 neue Matches (von 8.260 Produkten, bei Überschneidung ~10-25%)
|
|
||||||
NADDOD: ~100-300 neue Matches
|
|
||||||
ShopFiber24: ~50-150 neue Matches
|
|
||||||
10Gtek: ~10-30 neue Matches
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## SCHRITT 7: Completeness Dashboard Query
|
|
||||||
|
|
||||||
**Diese SQL** direkt in die TIP Dashboard-Komponente integrieren (z.B. in `packages/frontend`
|
|
||||||
oder als API-Response für das Admin-Dashboard):
|
|
||||||
|
|
||||||
```sql
|
|
||||||
-- Flexoptix Coverage Report
|
|
||||||
-- Zeigt wie viele FX-Produkte pro Wettbewerber gematcht sind
|
|
||||||
SELECT
|
|
||||||
v.name AS vendor,
|
|
||||||
COUNT(DISTINCT t.id) AS competitor_products,
|
|
||||||
COUNT(DISTINCT te.flexoptix_id) AS fo_products_matched,
|
|
||||||
ROUND(
|
|
||||||
100.0 * COUNT(DISTINCT te.flexoptix_id) /
|
|
||||||
NULLIF(fo_total.total, 0), 1
|
|
||||||
) AS coverage_pct,
|
|
||||||
MIN(po.time) AS first_observation,
|
|
||||||
MAX(po.time) AS last_observation
|
|
||||||
FROM vendors v
|
|
||||||
JOIN transceivers t ON t.vendor_id = v.id
|
|
||||||
JOIN price_observations po ON po.transceiver_id = t.id
|
|
||||||
LEFT JOIN transceiver_equivalences te ON te.competitor_id = t.id
|
|
||||||
AND te.status IN ('approved','auto_approved')
|
|
||||||
CROSS JOIN (
|
|
||||||
SELECT COUNT(*) as total
|
|
||||||
FROM transceivers fx
|
|
||||||
JOIN vendors fxv ON fxv.id = fx.vendor_id
|
|
||||||
WHERE UPPER(fxv.name) LIKE '%FLEXOPTIX%'
|
|
||||||
) fo_total
|
|
||||||
WHERE v.is_competitor = true
|
|
||||||
AND UPPER(v.name) NOT LIKE '%FLEXOPTIX%'
|
|
||||||
GROUP BY v.id, v.name, fo_total.total
|
|
||||||
HAVING COUNT(DISTINCT t.id) >= 5
|
|
||||||
ORDER BY coverage_pct DESC NULLS LAST, competitor_products DESC;
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Ausführungsreihenfolge
|
|
||||||
|
|
||||||
```
|
|
||||||
1. sql/108-form-factor-normalization.sql → psql ausführen
|
|
||||||
2. packages/scraper/src/robots/catalog-reconcile.ts → neue Datei erstellen
|
|
||||||
3. packages/scraper/src/scheduler.ts → 3 Änderungen (Worker, Schedule, 30→90 Tage)
|
|
||||||
4. packages/api/src/routes/maintenance.ts → neue Datei (oder in bestehende Route einfügen)
|
|
||||||
5. sql/109-fix-nightly-matcher-time-window.sql → psql ausführen (Dokumentation)
|
|
||||||
6. npm run build -w packages/scraper → Build
|
|
||||||
7. npm run build -w packages/api → Build
|
|
||||||
8. git add -A && git commit -m "feat: Flexoptix reference matching overhaul"
|
|
||||||
9. git push origin HEAD:main
|
|
||||||
10. Auf Erik: git fetch && git reset --hard origin/main
|
|
||||||
11. Auf Erik: sql/108 und sql/109 per psql ausführen
|
|
||||||
12. pm2 restart tip-scraper-daemon --update-env
|
|
||||||
13. sleep 120 (pg-boss Workers registrieren)
|
|
||||||
14. curl -X POST .../api/maintenance/run '{"job":"catalog:reconcile"}'
|
|
||||||
15. Ergebnis prüfen: curl .../api/maintenance/equivalences/summary
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Erwartetes Ergebnis nach Deployment
|
|
||||||
|
|
||||||
| Vendor | Vorher | Nachher (geschätzt) |
|
|
||||||
|--------|--------|---------------------|
|
|
||||||
| ATGBICS | 0 Matches | 300–1.500 Matches |
|
|
||||||
| NADDOD | 0 Matches | 80–300 Matches |
|
|
||||||
| ShopFiber24 | 0 Matches | 40–150 Matches |
|
|
||||||
| 10Gtek | 0 Matches | 10–40 Matches |
|
|
||||||
| FiberMall | 1.011 Matches | ~1.011 (unverändert) |
|
|
||||||
| Coverage gesamt | ~22% | ~45–60% |
|
|
||||||
|
|
||||||
**Warum nicht 100% Coverage?**
|
|
||||||
Flexoptix hat ~457 aktive Produkte. Viele Wettbewerber-Produkte sind herstellerspezifische
|
|
||||||
Kompatibilität (z.B. "Cisco-kompatibel SFP+") ohne exaktes Gegenstück im FX-Katalog.
|
|
||||||
Die verbleibende Lücke erklärt sich aus:
|
|
||||||
- Wettbewerber-Produkte ohne FX-Äquivalent (Nischenspezifikationen, OEM-only)
|
|
||||||
- Fehlende Normalisierungsfelder auf Wettbewerber-Seite (null fiber_type etc.)
|
|
||||||
- Produkte die FX aus dem Sortiment genommen hat
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Was NICHT geändert wird
|
|
||||||
|
|
||||||
- Bestehende `transceiver_equivalences`-Einträge mit `status='approved'` oder `status='rejected'`
|
|
||||||
→ werden nie überschrieben (ON CONFLICT ... WHERE status NOT IN (...))
|
|
||||||
- Alle existierenden scraper-Logiken → nicht anfassen
|
|
||||||
- PM2-Konfiguration, Cloudflare Tunnel, Umgebungsvariablen → nicht anfassen
|
|
||||||
- `CODEX-TASK-zero-manual-review.md` Aufgaben → separater Task, nicht überlappen
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Zusammenhang mit CODEX-TASK-zero-manual-review.md
|
|
||||||
|
|
||||||
| Task | Fokus | Reihenfolge |
|
|
||||||
|------|-------|-------------|
|
|
||||||
| **dieser Task** | Coverage: ATGBICS/NADDOD/10Gtek/ShopFiber24 von 0% auf ~50% | Zuerst |
|
|
||||||
| zero-manual-review | Qualität: Confidence-Scores durch deterministischen Matcher ersetzen | Danach |
|
|
||||||
|
|
||||||
Erst Coverage herstellen, dann Qualität verbessern. Macht keinen Sinn,
|
|
||||||
die 0%-Vendors mit einem neuen Matcher zu matchen wenn sie nicht einmal im Kandidaten-Pool sind.
|
|
||||||
@ -1,346 +0,0 @@
|
|||||||
# CODEX TASK: Raspberry Pi Scraper Fleet Deployment
|
|
||||||
|
|
||||||
## Ziel
|
|
||||||
|
|
||||||
Einen oder mehrere Raspberry Pis als dedizierte Scraper-Nodes in die TIP-Infrastruktur
|
|
||||||
integrieren — mit WireGuard VPN zu Erik's PostgreSQL und SOCKS5-Proxy für FS.COM.
|
|
||||||
|
|
||||||
**Warum Pi statt Erik:**
|
|
||||||
- Residential IP → FS.COM, NADDOD blockieren Datacenter-IPs (IONOS wird erkannt)
|
|
||||||
- Starlink oder Kabel-IP = andere IP-Range = höhere Scraper-Erfolgsrate
|
|
||||||
- Keine Kosten für separate Proxies (€50-150/Monat bei kommerziellen Diensten)
|
|
||||||
- Pi 4B (4GB) kostet ~60€ einmalig und läuft dauerhaft
|
|
||||||
|
|
||||||
**Architektur nach Deployment:**
|
|
||||||
```
|
|
||||||
Raspberry Pi(s) Erik (82.165.222.127)
|
|
||||||
┌────────────────────────┐ ┌─────────────────────────┐
|
|
||||||
│ index-pi.ts │ VPN │ PostgreSQL + pg-boss │
|
|
||||||
│ 32 fetch-only scrapers│◄───────►│ tip-api, tip-mcp │
|
|
||||||
│ (NADDOD, GBICS, etc.) │ │ tip-scraper (Playwright)│
|
|
||||||
│ │ │ ↑ PROXY_URLS │
|
|
||||||
│ dante SOCKS5 :1080 │◄────────│ socks5://Pi:1080 │
|
|
||||||
│ (WireGuard IP only) │ │ für FS.COM scraper │
|
|
||||||
└────────────────────────┘ └─────────────────────────┘
|
|
||||||
WG: 10.10.0.6 WG: 10.10.0.1
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Repository
|
|
||||||
|
|
||||||
```
|
|
||||||
Pfad lokal: /Users/renefichtmueller/Desktop/Claude Code/github-repos/transceiver-db/
|
|
||||||
Pfad Erik: /opt/tip/
|
|
||||||
Gitea: http://192.168.178.196:3000/rene/transceiver-db.git
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## SCHRITT 1: WireGuard — Pi-Keypair generieren (lokal oder auf Erik)
|
|
||||||
|
|
||||||
Führe dies auf einem Linux-System aus (oder auf dem Pi selbst):
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# WireGuard keypair für Pi1 generieren
|
|
||||||
wg genkey | tee /tmp/pi1-privkey | wg pubkey > /tmp/pi1-pubkey
|
|
||||||
cat /tmp/pi1-privkey # → wird PI_PRIVKEY beim Pi-Setup
|
|
||||||
cat /tmp/pi1-pubkey # → wird in Erik's wg0.conf eingetragen
|
|
||||||
```
|
|
||||||
|
|
||||||
Notiere:
|
|
||||||
- `PI_PRIVKEY` = privater Schlüssel (bleibt auf dem Pi, NIEMALS in Git)
|
|
||||||
- `PI_PUBKEY` = öffentlicher Schlüssel (wird auf Erik konfiguriert)
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## SCHRITT 2: WireGuard auf Erik — Pi als Peer hinzufügen
|
|
||||||
|
|
||||||
**SSH auf Erik:**
|
|
||||||
```bash
|
|
||||||
ssh erik-cf
|
|
||||||
```
|
|
||||||
|
|
||||||
**Datei bearbeiten:** `/etc/wireguard/wg0.conf`
|
|
||||||
|
|
||||||
Füge am Ende hinzu:
|
|
||||||
```ini
|
|
||||||
# Raspberry Pi 1 — Scraper Node
|
|
||||||
[Peer]
|
|
||||||
PublicKey = <PI_PUBKEY aus Schritt 1>
|
|
||||||
AllowedIPs = 10.10.0.6/32
|
|
||||||
PersistentKeepalive = 25
|
|
||||||
```
|
|
||||||
|
|
||||||
**WireGuard neu laden (ohne Verbindungsabbruch):**
|
|
||||||
```bash
|
|
||||||
sudo wg syncconf wg0 <(sudo wg-quick strip wg0)
|
|
||||||
# oder bei komplettem Reload:
|
|
||||||
sudo systemctl restart wg-quick@wg0
|
|
||||||
```
|
|
||||||
|
|
||||||
**Verify:**
|
|
||||||
```bash
|
|
||||||
sudo wg show wg0 # muss den Pi-Peer als "Peer:" zeigen
|
|
||||||
```
|
|
||||||
|
|
||||||
**PostgreSQL Firewall auf Erik — Pi-IP erlauben:**
|
|
||||||
```bash
|
|
||||||
# UFW: WireGuard-Traffic von Pi zur DB erlauben
|
|
||||||
sudo ufw allow in on wg0 from 10.10.0.6 to 10.10.0.1 port 5433 proto tcp comment "Pi1 → PostgreSQL"
|
|
||||||
sudo ufw status | grep 5433
|
|
||||||
```
|
|
||||||
|
|
||||||
**pg_hba.conf auf Erik anpassen** (damit Pi sich zur DB verbinden kann):
|
|
||||||
```bash
|
|
||||||
# Falls nicht schon eingetragen:
|
|
||||||
echo "host transceiver_db tip 10.10.0.0/24 scram-sha-256" | sudo tee -a /etc/postgresql/17/main/pg_hba.conf
|
|
||||||
sudo systemctl reload postgresql
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## SCHRITT 3: Pi physisch einrichten
|
|
||||||
|
|
||||||
Raspberry Pi OS Lite (64-bit) flashen und SSH aktivieren.
|
|
||||||
|
|
||||||
**Auf dem Pi ausführen:**
|
|
||||||
```bash
|
|
||||||
# Einmalig: TIP Scraper Setup mit WireGuard + SOCKS5
|
|
||||||
PI_NAME=pi1 \
|
|
||||||
DB_HOST=10.10.0.1 \
|
|
||||||
DB_PORT=5433 \
|
|
||||||
DB_USER=tip \
|
|
||||||
DB_PASS=<tip-db-passwort-aus-env> \
|
|
||||||
DB_NAME=transceiver_db \
|
|
||||||
WG_PRIVKEY=<PI_PRIVKEY aus Schritt 1> \
|
|
||||||
WG_ADDR=10.10.0.6 \
|
|
||||||
PROXY_AGENT=1 \
|
|
||||||
bash <(curl -sL http://192.168.178.196:3000/rene/transceiver-db/raw/branch/main/scripts/pi-scraper-setup.sh)
|
|
||||||
```
|
|
||||||
|
|
||||||
**Was das Script macht:**
|
|
||||||
1. Node.js 22 + tsx + pm2 installieren
|
|
||||||
2. Repo von Gitea klonen nach `/opt/tip-scraper/`
|
|
||||||
3. npm install --ignore-scripts (kein Playwright!)
|
|
||||||
4. .env schreiben (DB via WireGuard: 10.10.0.1:5433)
|
|
||||||
5. WireGuard konfigurieren (Pi → Erik)
|
|
||||||
6. PM2 mit `index-pi.ts` starten (32 fetch-only scrapers)
|
|
||||||
7. dante SOCKS5-Proxy auf 10.10.0.6:1080 starten
|
|
||||||
|
|
||||||
**Verify auf Pi:**
|
|
||||||
```bash
|
|
||||||
pm2 status # tip-pi-scraper: online
|
|
||||||
pm2 logs tip-pi-scraper --lines 20 # muss "32 queues / workers active" zeigen
|
|
||||||
sudo wg show wg0 # muss Handshake mit Erik zeigen
|
|
||||||
ss -tlnp | grep 1080 # dante: listening
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## SCHRITT 4: Erik — FS.COM Scraper durch Pi-Proxy routen
|
|
||||||
|
|
||||||
**Datei:** `/opt/tip/ecosystem.config.js`
|
|
||||||
|
|
||||||
Den bestehenden `tip-scraper` Eintrag aufsplitten in:
|
|
||||||
- `tip-scraper-pi` — läuft auf dem Pi (kein Eintrag nötig hier, Pi macht's selbst)
|
|
||||||
- `tip-scraper-fs` — FS.COM via Pi SOCKS5 (neuer PM2-Prozess auf Erik)
|
|
||||||
- `tip-scraper` — alle anderen Playwright-Scrapers auf Erik
|
|
||||||
|
|
||||||
**Füge folgenden Eintrag in `ecosystem.config.js` hinzu** (neben dem bestehenden `tip-scraper`):
|
|
||||||
|
|
||||||
```javascript
|
|
||||||
{
|
|
||||||
name: "tip-scraper-fs",
|
|
||||||
script: "./node_modules/.bin/tsx",
|
|
||||||
args: "packages/scraper/src/index-fs-only.ts",
|
|
||||||
cwd: "/opt/tip",
|
|
||||||
interpreter: "none",
|
|
||||||
exec_mode: "fork",
|
|
||||||
env: {
|
|
||||||
NODE_ENV: "production",
|
|
||||||
POSTGRES_HOST: "localhost",
|
|
||||||
POSTGRES_PORT: "5433",
|
|
||||||
POSTGRES_DB: "transceiver_db",
|
|
||||||
POSTGRES_USER: "tip",
|
|
||||||
POSTGRES_PASSWORD: "<CHANGE_ME>",
|
|
||||||
PROXY_URLS: "socks5://10.10.0.6:1080", // ← Pi SOCKS5
|
|
||||||
CRAWLEE_STORAGE_DIR: "/tmp/tip-crawlee-fs",
|
|
||||||
},
|
|
||||||
max_memory_restart: "800M",
|
|
||||||
instances: 1,
|
|
||||||
autorestart: true,
|
|
||||||
},
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## SCHRITT 5: index-fs-only.ts erstellen (Erik-side, nur FS.COM)
|
|
||||||
|
|
||||||
**Erstelle Datei:** `packages/scraper/src/index-fs-only.ts`
|
|
||||||
|
|
||||||
```typescript
|
|
||||||
/**
|
|
||||||
* TIP FS.COM Dedicated Scraper
|
|
||||||
*
|
|
||||||
* Runs on ERIK but routes traffic through Pi's SOCKS5 proxy
|
|
||||||
* so FS.com sees a residential IP instead of IONOS datacenter IP.
|
|
||||||
*
|
|
||||||
* PROXY_URLS=socks5://10.10.0.6:1080 must be set in environment.
|
|
||||||
*/
|
|
||||||
import { config } from "dotenv";
|
|
||||||
import { join } from "path";
|
|
||||||
config({ path: join(__dirname, "..", "..", "..", ".env") });
|
|
||||||
|
|
||||||
import PgBoss from "pg-boss";
|
|
||||||
import { mkdirSync, rmSync } from "fs";
|
|
||||||
|
|
||||||
const connectionString = `postgres://${process.env.POSTGRES_USER}:${process.env.POSTGRES_PASSWORD}@${process.env.POSTGRES_HOST}:${process.env.POSTGRES_PORT || "5433"}/${process.env.POSTGRES_DB}`;
|
|
||||||
|
|
||||||
async function withIsolatedStorage(name: string, fn: () => Promise<void>): Promise<void> {
|
|
||||||
const dir = `/tmp/tip-crawlee-${name}-${Date.now()}`;
|
|
||||||
mkdirSync(join(dir, "request_queues", "default"), { recursive: true });
|
|
||||||
mkdirSync(join(dir, "datasets", "default"), { recursive: true });
|
|
||||||
mkdirSync(join(dir, "key_value_stores", "default"), { recursive: true });
|
|
||||||
const prev = process.env.CRAWLEE_STORAGE_DIR;
|
|
||||||
process.env.CRAWLEE_STORAGE_DIR = dir;
|
|
||||||
try { await fn(); }
|
|
||||||
finally {
|
|
||||||
process.env.CRAWLEE_STORAGE_DIR = prev ?? "";
|
|
||||||
try { rmSync(dir, { recursive: true, force: true }); } catch {}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
async function main() {
|
|
||||||
const proxy = process.env.PROXY_URLS;
|
|
||||||
console.log(`\n=== TIP FS.COM Scraper (proxy: ${proxy ?? "none"}) ===\n`);
|
|
||||||
if (!proxy) {
|
|
||||||
console.warn("WARNING: PROXY_URLS not set — FS.com will see IONOS IP (may be blocked)");
|
|
||||||
}
|
|
||||||
|
|
||||||
const boss = new PgBoss({
|
|
||||||
connectionString,
|
|
||||||
retryLimit: 3,
|
|
||||||
retryDelay: 300, // 5 min retry on failure
|
|
||||||
expireInSeconds: 7200, // 2h timeout for full FS catalog run
|
|
||||||
monitorStateIntervalSeconds: 60,
|
|
||||||
});
|
|
||||||
|
|
||||||
boss.on("error", (e: Error) => console.error("pg-boss error:", e.message));
|
|
||||||
await boss.start();
|
|
||||||
|
|
||||||
await boss.createQueue("scrape:pricing:fs").catch(() => {});
|
|
||||||
await boss.createQueue("scrape:pricing:naddod").catch(() => {});
|
|
||||||
|
|
||||||
const { scrapeFs } = await import("./scrapers/fs-com");
|
|
||||||
const { scrapeNaddod } = await import("./scrapers/naddod");
|
|
||||||
|
|
||||||
await boss.work("scrape:pricing:fs", async () => {
|
|
||||||
console.log(`[${new Date().toISOString()}] FS.COM via ${proxy ?? "direct"}`);
|
|
||||||
await withIsolatedStorage("fs", scrapeFs);
|
|
||||||
});
|
|
||||||
|
|
||||||
// NADDOD also benefits from residential IP (less aggressive rate limiting)
|
|
||||||
await boss.work("scrape:pricing:naddod", async () => {
|
|
||||||
console.log(`[${new Date().toISOString()}] NADDOD via ${proxy ?? "direct"}`);
|
|
||||||
await scrapeNaddod();
|
|
||||||
});
|
|
||||||
|
|
||||||
console.log("FS.COM + NADDOD workers active — waiting for scheduled jobs\n");
|
|
||||||
process.on("SIGTERM", async () => { await boss.stop(); process.exit(0); });
|
|
||||||
process.on("SIGINT", async () => { await boss.stop(); process.exit(0); });
|
|
||||||
}
|
|
||||||
|
|
||||||
main().catch((e) => { console.error("Fatal:", e); process.exit(1); });
|
|
||||||
```
|
|
||||||
|
|
||||||
**Hinweis:** Wenn `PROXY_URLS` gesetzt ist, nutzt der FS.COM PlaywrightCrawler automatisch
|
|
||||||
den SOCKS5-Proxy via Crawlee's ProxyConfiguration. Kein weiterer Code nötig.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## SCHRITT 6: Scheduler — NADDOD-Job aus index.ts entfernen
|
|
||||||
|
|
||||||
Da NADDOD nun durch `index-fs-only.ts` (auf Erik) und `index-pi.ts` (auf Pi) gehandhabt wird,
|
|
||||||
muss `scrape:pricing:naddod` nur noch EINMAL registriert werden.
|
|
||||||
|
|
||||||
**Prüfe:** `packages/scraper/src/scheduler.ts` — suche nach `naddod` und stelle sicher,
|
|
||||||
dass der `scrape:pricing:naddod` Job-Schedule NUR EINMAL registriert ist.
|
|
||||||
|
|
||||||
Falls doppelt vorhanden: Im `scheduler.ts` den zweiten Eintrag entfernen.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## SCHRITT 7: Erik — PM2 neu starten
|
|
||||||
|
|
||||||
```bash
|
|
||||||
ssh erik-cf
|
|
||||||
cd /opt/tip
|
|
||||||
git pull origin main
|
|
||||||
pm2 start ecosystem.config.js --only tip-scraper-fs
|
|
||||||
pm2 save
|
|
||||||
pm2 status # tip-scraper-fs muss online sein
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## SCHRITT 8: Verifikation (automatisch — kein manuelles Debugging)
|
|
||||||
|
|
||||||
Die Scrapers verifizieren sich selbst. Folgende Checks laufen automatisch:
|
|
||||||
|
|
||||||
**Pi selbst:**
|
|
||||||
```bash
|
|
||||||
# 5 min nach Setup auf dem Pi:
|
|
||||||
pm2 logs tip-pi-scraper --lines 50 | grep -E "\[(scrape|enrich)\]"
|
|
||||||
# Erwartete Ausgabe: Timestamp + Queue-Name alle paar Minuten
|
|
||||||
```
|
|
||||||
|
|
||||||
**FS.COM über Proxy:**
|
|
||||||
- FS.COM scraper läuft auf Erik um 02:00 und 14:00 (scheduled via pg-boss)
|
|
||||||
- Preise landen in `price_observations` Tabelle mit `vendor_id = fs-com`
|
|
||||||
- Automatische Diagnose: wenn `price` weiterhin NULL → Scraper-eigene Logs prüfen
|
|
||||||
|
|
||||||
**Self-healing:**
|
|
||||||
- pg-boss retryLimit=3: Jobs werden automatisch 3x wiederholt bei Fehler
|
|
||||||
- PM2 autorestart: Prozess startet automatisch neu wenn er crasht
|
|
||||||
- WireGuard PersistentKeepalive=25: VPN bleibt auch bei langer Inaktivität aktiv
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## SCHRITT 9: Zweiter Pi (optional — für mehr IP-Diversität)
|
|
||||||
|
|
||||||
Gleiche Prozedur, aber:
|
|
||||||
```bash
|
|
||||||
PI_NAME=pi2 WG_ADDR=10.10.0.7 WG_PRIVKEY=<anderer-key> ...
|
|
||||||
```
|
|
||||||
|
|
||||||
Erik's wg0.conf:
|
|
||||||
```ini
|
|
||||||
[Peer]
|
|
||||||
PublicKey = <PI2_PUBKEY>
|
|
||||||
AllowedIPs = 10.10.0.7/32
|
|
||||||
PersistentKeepalive = 25
|
|
||||||
```
|
|
||||||
|
|
||||||
`ecosystem.config.js` auf Erik: zweiten `tip-scraper-fs` mit `PROXY_URLS: "socks5://10.10.0.7:1080"`
|
|
||||||
für einen weiteren Proxy-Pfad (Round-robin über beide Pis).
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Zusammenfassung der neuen Dateien
|
|
||||||
|
|
||||||
| Datei | Status |
|
|
||||||
|-------|--------|
|
|
||||||
| `packages/scraper/src/index-pi.ts` | ✅ Geändert: fetch-only, Playwright entfernt, NADDOD hinzugefügt |
|
|
||||||
| `packages/scraper/src/index-fs-only.ts` | 🆕 Erstellen (Schritt 5) |
|
|
||||||
| `scripts/pi-scraper-setup.sh` | ✅ Geändert: Kein inline index-pi.ts Override mehr |
|
|
||||||
| `ecosystem.config.js` (auf Erik) | ✅ Ändern: tip-scraper-fs Eintrag hinzufügen |
|
|
||||||
| Erik `/etc/wireguard/wg0.conf` | ✅ Manuell: Pi-Peer hinzufügen |
|
|
||||||
| Erik `/etc/postgresql/17/main/pg_hba.conf` | ✅ Manuell: 10.10.0.0/24 Zugang |
|
|
||||||
|
|
||||||
## Manuelle Schritte (nicht von Codex ausführbar)
|
|
||||||
|
|
||||||
1. **WireGuard Keypair generieren** (Schritt 1) — muss manuell auf sicherem System passieren
|
|
||||||
2. **Erik wg0.conf bearbeiten** (Schritt 2) — SSH auf Erik nötig
|
|
||||||
3. **Pi physisch aufsetzen** (Schritt 3) — Hardware + pi-scraper-setup.sh ausführen
|
|
||||||
4. **DB-Passwort** in setup-Befehl einsetzen (steht in `/opt/tip/.env` auf Erik)
|
|
||||||
@ -1,705 +0,0 @@
|
|||||||
# CODEX TASK: Zero Manual Review Queue — Deterministic Equivalence Matching
|
|
||||||
|
|
||||||
## Ziel
|
|
||||||
|
|
||||||
Die manuelle Review-Queue (`transceiver_equivalences` mit `status = 'pending'`) wird eliminiert.
|
|
||||||
Statt probabilistischer Confidence-Scores wird ein deterministisches Exact-Match-System gebaut,
|
|
||||||
das nur dann einen Match erzeugt, wenn ALLE Pflichtfelder vorhanden und exakt gleich sind.
|
|
||||||
|
|
||||||
**Aktueller Zustand (PROBLEM):**
|
|
||||||
- 13.374 Einträge in `pending` → manuelle Freigabe nötig
|
|
||||||
- Confidence-Score 0.0–1.0 → "56%" bedeutet Unsicherheit → Review nötig
|
|
||||||
- Fehlende Felder (wavelength=?) führen zu unsicheren Matches
|
|
||||||
- `auto_approved` ab 0.73 Confidence — zu niedrige Schwelle
|
|
||||||
|
|
||||||
**Ziel-Zustand (LÖSUNG):**
|
|
||||||
- 0 Einträge in `pending` — nie wieder
|
|
||||||
- Kein Confidence-Score — nur MATCH oder KEIN MATCH
|
|
||||||
- Fehlende Felder → Enrichment-Job → kein Match bis Daten vollständig
|
|
||||||
- Nur Exact-Match (mit definierten Toleranzen) → 100% verlässliche Daten
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Repository
|
|
||||||
|
|
||||||
```
|
|
||||||
Pfad: /opt/tip (auf Erik-Server) ODER
|
|
||||||
/Users/renefichtmueller/Desktop/Claude Code/github-repos/transceiver-db (lokal)
|
|
||||||
|
|
||||||
Stack: TypeScript, Node.js, PostgreSQL 17, pg-boss (Job-Queue)
|
|
||||||
Packages: packages/scraper/src/scheduler.ts ← Haupt-Matching-Logik
|
|
||||||
packages/api/src/routes/review.ts ← API-Endpunkte
|
|
||||||
sql/ ← DB-Migrationen (nummeriert 001-104)
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## SCHRITT 1: Datenbank-Migration — Neue Pflichtfelder
|
|
||||||
|
|
||||||
**Erstelle Datei:** `sql/105-wavelength-connector-completeness.sql`
|
|
||||||
|
|
||||||
```sql
|
|
||||||
-- Migration 105: Wavelength (TX/RX getrennt für BiDi), Connector-Normalisierung,
|
|
||||||
-- Data-Completeness-Score, Enrichment-Flag
|
|
||||||
|
|
||||||
-- 1. Wellenlängen-Felder aufteilen (BiDi hat TX ≠ RX)
|
|
||||||
ALTER TABLE transceivers
|
|
||||||
ADD COLUMN IF NOT EXISTS wavelength_tx_nm INTEGER, -- TX-Wellenlänge in nm (z.B. 1270)
|
|
||||||
ADD COLUMN IF NOT EXISTS wavelength_rx_nm INTEGER, -- RX-Wellenlänge in nm (z.B. 1330)
|
|
||||||
ADD COLUMN IF NOT EXISTS connector_type TEXT, -- 'LC', 'SC', 'MPO-12', 'MPO-16', 'RJ45', 'CS', 'SN'
|
|
||||||
ADD COLUMN IF NOT EXISTS data_completeness INTEGER DEFAULT 0 CHECK (data_completeness BETWEEN 0 AND 100),
|
|
||||||
ADD COLUMN IF NOT EXISTS enrichment_needed BOOLEAN DEFAULT FALSE,
|
|
||||||
ADD COLUMN IF NOT EXISTS enrichment_fields TEXT[] DEFAULT '{}'; -- welche Felder fehlen noch
|
|
||||||
|
|
||||||
-- 2. Indices für Performance
|
|
||||||
CREATE INDEX IF NOT EXISTS idx_tx_wavelength_tx ON transceivers (wavelength_tx_nm) WHERE wavelength_tx_nm IS NOT NULL;
|
|
||||||
CREATE INDEX IF NOT EXISTS idx_tx_wavelength_rx ON transceivers (wavelength_rx_nm) WHERE wavelength_rx_nm IS NOT NULL;
|
|
||||||
CREATE INDEX IF NOT EXISTS idx_tx_completeness ON transceivers (data_completeness);
|
|
||||||
CREATE INDEX IF NOT EXISTS idx_tx_enrichment ON transceivers (enrichment_needed) WHERE enrichment_needed = TRUE;
|
|
||||||
|
|
||||||
-- 3. Completeness-Berechnungsfunktion
|
|
||||||
CREATE OR REPLACE FUNCTION calc_data_completeness(
|
|
||||||
p_form_factor TEXT, p_speed_gbps NUMERIC, p_fiber_type TEXT,
|
|
||||||
p_reach_meters INTEGER, p_wavelength_tx INTEGER, p_connector TEXT
|
|
||||||
) RETURNS INTEGER AS $$
|
|
||||||
DECLARE
|
|
||||||
score INTEGER := 0;
|
|
||||||
BEGIN
|
|
||||||
IF p_form_factor IS NOT NULL AND p_form_factor != '' THEN score := score + 20; END IF;
|
|
||||||
IF p_speed_gbps IS NOT NULL AND p_speed_gbps > 0 THEN score := score + 20; END IF;
|
|
||||||
IF p_fiber_type IS NOT NULL AND p_fiber_type != '' THEN score := score + 20; END IF;
|
|
||||||
IF p_reach_meters IS NOT NULL AND p_reach_meters > 0 THEN score := score + 20; END IF;
|
|
||||||
IF p_wavelength_tx IS NOT NULL AND p_wavelength_tx > 0 THEN score := score + 10; END IF;
|
|
||||||
IF p_connector IS NOT NULL AND p_connector != '' THEN score := score + 10; END IF;
|
|
||||||
RETURN score;
|
|
||||||
END;
|
|
||||||
$$ LANGUAGE plpgsql IMMUTABLE;
|
|
||||||
|
|
||||||
-- 4. Alle bestehenden Transceivers: Completeness initial berechnen
|
|
||||||
UPDATE transceivers SET
|
|
||||||
data_completeness = calc_data_completeness(
|
|
||||||
form_factor, speed_gbps, fiber_type,
|
|
||||||
reach_meters, wavelength_tx_nm, connector_type
|
|
||||||
),
|
|
||||||
enrichment_needed = (
|
|
||||||
form_factor IS NULL OR speed_gbps IS NULL OR
|
|
||||||
fiber_type IS NULL OR reach_meters IS NULL OR
|
|
||||||
wavelength_tx_nm IS NULL OR connector_type IS NULL
|
|
||||||
);
|
|
||||||
|
|
||||||
-- 5. Trigger: Completeness automatisch aktualisieren
|
|
||||||
CREATE OR REPLACE FUNCTION trg_update_completeness()
|
|
||||||
RETURNS TRIGGER AS $$
|
|
||||||
BEGIN
|
|
||||||
NEW.data_completeness := calc_data_completeness(
|
|
||||||
NEW.form_factor, NEW.speed_gbps, NEW.fiber_type,
|
|
||||||
NEW.reach_meters, NEW.wavelength_tx_nm, NEW.connector_type
|
|
||||||
);
|
|
||||||
NEW.enrichment_needed := (
|
|
||||||
NEW.form_factor IS NULL OR NEW.speed_gbps IS NULL OR
|
|
||||||
NEW.fiber_type IS NULL OR NEW.reach_meters IS NULL OR
|
|
||||||
NEW.wavelength_tx_nm IS NULL OR NEW.connector_type IS NULL
|
|
||||||
);
|
|
||||||
-- Fehlende Felder dokumentieren
|
|
||||||
NEW.enrichment_fields := ARRAY_REMOVE(ARRAY[
|
|
||||||
CASE WHEN NEW.form_factor IS NULL THEN 'form_factor' END,
|
|
||||||
CASE WHEN NEW.speed_gbps IS NULL THEN 'speed_gbps' END,
|
|
||||||
CASE WHEN NEW.fiber_type IS NULL THEN 'fiber_type' END,
|
|
||||||
CASE WHEN NEW.reach_meters IS NULL THEN 'reach_meters' END,
|
|
||||||
CASE WHEN NEW.wavelength_tx_nm IS NULL THEN 'wavelength_tx_nm' END,
|
|
||||||
CASE WHEN NEW.connector_type IS NULL THEN 'connector_type' END
|
|
||||||
], NULL);
|
|
||||||
RETURN NEW;
|
|
||||||
END;
|
|
||||||
$$ LANGUAGE plpgsql;
|
|
||||||
|
|
||||||
DROP TRIGGER IF EXISTS trg_completeness ON transceivers;
|
|
||||||
CREATE TRIGGER trg_completeness
|
|
||||||
BEFORE INSERT OR UPDATE ON transceivers
|
|
||||||
FOR EACH ROW EXECUTE FUNCTION trg_update_completeness();
|
|
||||||
|
|
||||||
COMMENT ON COLUMN transceivers.wavelength_tx_nm IS 'TX wavelength in nm. For BiDi: TX side. For duplex: both TX=RX.';
|
|
||||||
COMMENT ON COLUMN transceivers.wavelength_rx_nm IS 'RX wavelength in nm. Only set for BiDi. NULL = same as TX.';
|
|
||||||
COMMENT ON COLUMN transceivers.connector_type IS 'Physical connector: LC, SC, MPO-12, MPO-16, RJ45, CS, SN';
|
|
||||||
COMMENT ON COLUMN transceivers.data_completeness IS '0-100: percentage of mandatory fields filled (6 fields × weight)';
|
|
||||||
COMMENT ON COLUMN transceivers.enrichment_needed IS 'TRUE = one or more mandatory fields missing, enrichment job needed';
|
|
||||||
COMMENT ON COLUMN transceivers.enrichment_fields IS 'Array of field names that still need enrichment';
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## SCHRITT 2: IEEE/MSA Standards-Lookup-Tabelle (statisch)
|
|
||||||
|
|
||||||
**Erstelle Datei:** `sql/106-ieee-msa-wavelength-lookup.sql`
|
|
||||||
|
|
||||||
```sql
|
|
||||||
-- Migration 106: IEEE/MSA Standards Wavelength Lookup
|
|
||||||
-- Ground-Truth für Wellenlänge basierend auf Standard-Spezifikation
|
|
||||||
-- Quelle: IEEE 802.3, SFF-8472, SFF-8436, SFF-8661, SFF-8679, MSA specs
|
|
||||||
|
|
||||||
CREATE TABLE IF NOT EXISTS ieee_wavelength_lookup (
|
|
||||||
id SERIAL PRIMARY KEY,
|
|
||||||
form_factor TEXT NOT NULL,
|
|
||||||
speed_gbps NUMERIC NOT NULL,
|
|
||||||
fiber_type TEXT NOT NULL, -- 'SMF', 'MMF', 'DAC', 'AOC'
|
|
||||||
reach_min_m INTEGER NOT NULL,
|
|
||||||
reach_max_m INTEGER NOT NULL,
|
|
||||||
wavelength_tx_nm INTEGER NOT NULL,
|
|
||||||
wavelength_rx_nm INTEGER, -- NULL = gleich wie TX (kein BiDi)
|
|
||||||
connector_type TEXT NOT NULL,
|
|
||||||
ieee_standard TEXT, -- z.B. '802.3ae', 'SFF-8431'
|
|
||||||
notes TEXT,
|
|
||||||
UNIQUE (form_factor, speed_gbps, fiber_type, reach_min_m, reach_max_m)
|
|
||||||
);
|
|
||||||
|
|
||||||
INSERT INTO ieee_wavelength_lookup
|
|
||||||
(form_factor, speed_gbps, fiber_type, reach_min_m, reach_max_m, wavelength_tx_nm, wavelength_rx_nm, connector_type, ieee_standard, notes)
|
|
||||||
VALUES
|
|
||||||
-- ── SFP (1G) ─────────────────────────────────────────────────────────────────
|
|
||||||
('SFP', 1, 'MMF', 0, 550, 850, NULL, 'LC', '802.3z', '1000BASE-SX'),
|
|
||||||
('SFP', 1, 'SMF', 0, 10000, 1310, NULL, 'LC', '802.3z', '1000BASE-LX'),
|
|
||||||
('SFP', 1, 'SMF', 0, 40000, 1310, NULL, 'LC', '802.3z', '1000BASE-EX'),
|
|
||||||
('SFP', 1, 'SMF', 0, 70000, 1550, NULL, 'LC', '802.3z', '1000BASE-ZX'),
|
|
||||||
('SFP', 1, 'SMF', 0, 10000, 1270, 1330, 'LC', 'SFF-8472', '1000BASE-BX10-U BiDi'),
|
|
||||||
('SFP', 1, 'SMF', 0, 10000, 1330, 1270, 'LC', 'SFF-8472', '1000BASE-BX10-D BiDi'),
|
|
||||||
('SFP', 1, 'Copper', 0, 100, NULL, NULL, 'RJ45','802.3ab', '1000BASE-T'),
|
|
||||||
-- ── SFP+ (10G) ───────────────────────────────────────────────────────────────
|
|
||||||
('SFP+', 10, 'MMF', 0, 300, 850, NULL, 'LC', '802.3ae', '10GBASE-SR'),
|
|
||||||
('SFP+', 10, 'SMF', 0, 10000, 1310, NULL, 'LC', '802.3ae', '10GBASE-LR'),
|
|
||||||
('SFP+', 10, 'SMF', 0, 40000, 1310, NULL, 'LC', '802.3ae', '10GBASE-ER'),
|
|
||||||
('SFP+', 10, 'SMF', 0, 80000, 1550, NULL, 'LC', '802.3ae', '10GBASE-ZR'),
|
|
||||||
('SFP+', 10, 'SMF', 0, 10000, 1270, 1330, 'LC', 'SFF-8431', '10GBASE-BX10-U BiDi'),
|
|
||||||
('SFP+', 10, 'SMF', 0, 10000, 1330, 1270, 'LC', 'SFF-8431', '10GBASE-BX10-D BiDi'),
|
|
||||||
('SFP+', 10, 'SMF', 0, 20000, 1270, 1330, 'LC', 'SFF-8431', '10GBASE-BX20-U BiDi'),
|
|
||||||
('SFP+', 10, 'SMF', 0, 20000, 1330, 1270, 'LC', 'SFF-8431', '10GBASE-BX20-D BiDi'),
|
|
||||||
('SFP+', 10, 'DAC', 0, 7, NULL, NULL, 'SFP+','SFF-8431','10G DAC Twinax'),
|
|
||||||
('SFP+', 10, 'AOC', 0, 100, 850, NULL, 'LC', 'SFF-8431', '10G AOC'),
|
|
||||||
-- ── SFP28 (25G) ──────────────────────────────────────────────────────────────
|
|
||||||
('SFP28', 25, 'MMF', 0, 100, 850, NULL, 'LC', '802.3by', '25GBASE-SR'),
|
|
||||||
('SFP28', 25, 'SMF', 0, 10000, 1310, NULL, 'LC', '802.3cc', '25GBASE-LR'),
|
|
||||||
('SFP28', 25, 'SMF', 0, 2000, 1310, NULL, 'LC', '25GBASE-DR','25GBASE-DR'),
|
|
||||||
('SFP28', 25, 'SMF', 0, 10000, 1270, 1330, 'LC', '802.3cc', '25GBASE-BX10-U BiDi'),
|
|
||||||
('SFP28', 25, 'SMF', 0, 10000, 1330, 1270, 'LC', '802.3cc', '25GBASE-BX10-D BiDi'),
|
|
||||||
('SFP28', 25, 'DAC', 0, 5, NULL, NULL, 'SFP28','802.3by','25G DAC'),
|
|
||||||
-- ── QSFP+ (40G) ──────────────────────────────────────────────────────────────
|
|
||||||
('QSFP+', 40, 'MMF', 0, 150, 850, NULL, 'MPO-12','802.3ba','40GBASE-SR4'),
|
|
||||||
('QSFP+', 40, 'SMF', 0, 10000, 1310, NULL, 'LC', '802.3ba', '40GBASE-LR4 CWDM4'),
|
|
||||||
('QSFP+', 40, 'SMF', 0, 2000, 1310, NULL, 'MPO-12','802.3bm','40GBASE-PSM4'),
|
|
||||||
('QSFP+', 40, 'DAC', 0, 7, NULL, NULL, 'QSFP+','802.3ba','40G DAC'),
|
|
||||||
-- ── QSFP28 (100G) ────────────────────────────────────────────────────────────
|
|
||||||
('QSFP28', 100, 'MMF', 0, 100, 850, NULL, 'MPO-12','802.3bm','100GBASE-SR4'),
|
|
||||||
('QSFP28', 100, 'SMF', 0, 10000, 1310, NULL, 'LC', '802.3cd', '100GBASE-LR4 CWDM4'),
|
|
||||||
('QSFP28', 100, 'SMF', 0, 500, 1310, NULL, 'MPO-12','802.3bj','100GBASE-DR (PSM4)'),
|
|
||||||
('QSFP28', 100, 'SMF', 0, 40000, 1310, NULL, 'LC', '802.3ba', '100GBASE-ER4'),
|
|
||||||
('QSFP28', 100, 'SMF', 0, 2000, 1310, NULL, 'LC', 'CWDM4-MSA','100G CWDM4 2km'),
|
|
||||||
('QSFP28', 100, 'DAC', 0, 5, NULL, NULL, 'QSFP28','802.3bj','100G DAC'),
|
|
||||||
('QSFP28', 100, 'AOC', 0, 100, 850, NULL, 'MPO-12','802.3bm','100G AOC SR4'),
|
|
||||||
-- ── QSFP56 (200G) ────────────────────────────────────────────────────────────
|
|
||||||
('QSFP56', 200, 'MMF', 0, 100, 850, NULL, 'MPO-16','802.3cd','200GBASE-SR4'),
|
|
||||||
('QSFP56', 200, 'SMF', 0, 2000, 1310, NULL, 'LC', '802.3cd', '200GBASE-DR4'),
|
|
||||||
('QSFP56', 200, 'SMF', 0, 10000, 1310, NULL, 'LC', '802.3cd', '200GBASE-FR4'),
|
|
||||||
('QSFP56', 200, 'SMF', 0, 40000, 1310, NULL, 'LC', '802.3cd', '200GBASE-LR4'),
|
|
||||||
-- ── QSFP-DD (400G) ───────────────────────────────────────────────────────────
|
|
||||||
('QSFP-DD', 400, 'MMF', 0, 100, 850, NULL, 'MPO-16','802.3bs','400GBASE-SR8'),
|
|
||||||
('QSFP-DD', 400, 'SMF', 0, 500, 1310, NULL, 'MPO-12','802.3bs','400GBASE-DR4'),
|
|
||||||
('QSFP-DD', 400, 'SMF', 0, 2000, 1310, NULL, 'LC', '802.3bs', '400GBASE-FR4'),
|
|
||||||
('QSFP-DD', 400, 'SMF', 0, 10000, 1310, NULL, 'LC', '802.3bs', '400GBASE-LR4'),
|
|
||||||
('QSFP-DD', 400, 'SMF', 0, 10000, 1310, NULL, 'MPO-12','800G MSA','400GBASE-PSM4'),
|
|
||||||
('QSFP-DD', 400, 'DAC', 0, 5, NULL, NULL, 'QSFP-DD','802.3bs','400G DAC'),
|
|
||||||
-- ── OSFP / QSFP-DD800 (800G) ─────────────────────────────────────────────────
|
|
||||||
('OSFP', 800, 'MMF', 0, 100, 850, NULL, 'MPO-16','802.3df','800GBASE-SR8'),
|
|
||||||
('OSFP', 800, 'SMF', 0, 500, 1310, NULL, 'MPO-12','802.3df','800GBASE-DR8'),
|
|
||||||
('OSFP', 800, 'SMF', 0, 2000, 1310, NULL, 'LC', '802.3df', '800GBASE-FR4 2x400G'),
|
|
||||||
('OSFP', 800, 'SMF', 0, 10000, 1310, NULL, 'LC', '802.3df', '800GBASE-LR4'),
|
|
||||||
('QSFP-DD800', 800, 'MMF', 0, 100, 850, NULL, 'MPO-16','802.3df','800GBASE-SR8'),
|
|
||||||
('QSFP-DD800', 800, 'SMF', 0, 500, 1310, NULL, 'MPO-12','802.3df','800GBASE-DR8')
|
|
||||||
ON CONFLICT DO NOTHING;
|
|
||||||
|
|
||||||
CREATE INDEX IF NOT EXISTS idx_ieee_lookup ON ieee_wavelength_lookup
|
|
||||||
(form_factor, speed_gbps, fiber_type, reach_min_m, reach_max_m);
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## SCHRITT 3: Enrichment-Robot (neues Modul)
|
|
||||||
|
|
||||||
**Erstelle Datei:** `packages/scraper/src/robots/wavelength-enricher.ts`
|
|
||||||
|
|
||||||
Dieser Robot läuft als pg-boss Job (`enrich:wavelength`) alle 4 Stunden.
|
|
||||||
Er füllt `wavelength_tx_nm`, `wavelength_rx_nm`, `connector_type` automatisch.
|
|
||||||
|
|
||||||
```typescript
|
|
||||||
/**
|
|
||||||
* Wavelength Enricher Robot
|
|
||||||
*
|
|
||||||
* Füllt fehlende wavelength_tx_nm / wavelength_rx_nm / connector_type
|
|
||||||
* aus drei Quellen (Priorität absteigend):
|
|
||||||
* 1. IEEE/MSA Lookup-Tabelle (sql/106) — deterministisch, keine Kosten
|
|
||||||
* 2. Produktname-Regex (heuristisch, pattern-basiert)
|
|
||||||
* 3. Quarantäne: Produkt bleibt ohne Match bis Daten vorhanden
|
|
||||||
*
|
|
||||||
* Kein LLM, kein Scraper, keine externen Calls — rein datenbankbasiert.
|
|
||||||
*/
|
|
||||||
import { pool } from "../utils/db";
|
|
||||||
|
|
||||||
// ── Regex-Patterns für Wellenlänge aus Produktnamen ──────────────────────────
|
|
||||||
|
|
||||||
const WAVELENGTH_PATTERNS: Array<{
|
|
||||||
pattern: RegExp;
|
|
||||||
tx: number;
|
|
||||||
rx?: number;
|
|
||||||
notes: string;
|
|
||||||
}> = [
|
|
||||||
// BiDi explizit
|
|
||||||
{ pattern: /\b1270\s*\/\s*1330\b/i, tx: 1270, rx: 1330, notes: "BiDi 1270/1330" },
|
|
||||||
{ pattern: /\b1330\s*\/\s*1270\b/i, tx: 1330, rx: 1270, notes: "BiDi 1330/1270" },
|
|
||||||
{ pattern: /\b1310\s*\/\s*1550\b/i, tx: 1310, rx: 1550, notes: "BiDi 1310/1550" },
|
|
||||||
{ pattern: /\b1550\s*\/\s*1310\b/i, tx: 1550, rx: 1310, notes: "BiDi 1550/1310" },
|
|
||||||
{ pattern: /\b1295\s*\/\s*1310\b/i, tx: 1295, rx: 1310, notes: "BiDi CWDM" },
|
|
||||||
// Direkte nm-Angabe
|
|
||||||
{ pattern: /\b850\s*nm\b/i, tx: 850, notes: "850nm explicit" },
|
|
||||||
{ pattern: /\b1310\s*nm\b/i, tx: 1310, notes: "1310nm explicit" },
|
|
||||||
{ pattern: /\b1550\s*nm\b/i, tx: 1550, notes: "1550nm explicit" },
|
|
||||||
{ pattern: /\b1270\s*nm\b/i, tx: 1270, notes: "1270nm explicit" },
|
|
||||||
{ pattern: /\b1330\s*nm\b/i, tx: 1330, notes: "1330nm explicit" },
|
|
||||||
// DWDM Channels (C-Band ~1530-1565nm)
|
|
||||||
{ pattern: /\bDWDM\b.*\bC\d{2}\b/i, tx: 1550, notes: "DWDM C-Band" },
|
|
||||||
{ pattern: /\bDWDM\b/i, tx: 1550, notes: "DWDM generic" },
|
|
||||||
// Standard-Kurzbezeichnungen → implizite Wellenlänge
|
|
||||||
{ pattern: /\bSR4?\b/i, tx: 850, notes: "SR/SR4 = 850nm MMF" },
|
|
||||||
{ pattern: /\bLR4?\b/i, tx: 1310, notes: "LR/LR4 = 1310nm SMF" },
|
|
||||||
{ pattern: /\bER4?\b/i, tx: 1310, notes: "ER/ER4 = 1310nm SMF" },
|
|
||||||
{ pattern: /\bFR4?\b/i, tx: 1310, notes: "FR/FR4 = 1310nm SMF" },
|
|
||||||
{ pattern: /\bDR4?\b/i, tx: 1310, notes: "DR/DR4 = 1310nm SMF" },
|
|
||||||
{ pattern: /\bZR4?\b/i, tx: 1550, notes: "ZR/ZR4 = 1550nm SMF" },
|
|
||||||
];
|
|
||||||
|
|
||||||
const CONNECTOR_PATTERNS: Array<{ pattern: RegExp; connector: string }> = [
|
|
||||||
{ pattern: /\bMPO.?16\b/i, connector: "MPO-16" },
|
|
||||||
{ pattern: /\bMPO.?12\b/i, connector: "MPO-12" },
|
|
||||||
{ pattern: /\bMPO\b/i, connector: "MPO-12" }, // default MPO = MPO-12
|
|
||||||
{ pattern: /\bMTP\b/i, connector: "MPO-12" },
|
|
||||||
{ pattern: /\bCS\s*connector\b/i, connector: "CS" },
|
|
||||||
{ pattern: /\bSN\s*connector\b/i, connector: "SN" },
|
|
||||||
{ pattern: /\bRJ.?45\b/i, connector: "RJ45" },
|
|
||||||
{ pattern: /\bbase.?t\b/i, connector: "RJ45" },
|
|
||||||
{ pattern: /\bSC\b/i, connector: "SC" },
|
|
||||||
{ pattern: /\bLC\b/i, connector: "LC" }, // LC zuletzt (häufig im Text)
|
|
||||||
];
|
|
||||||
|
|
||||||
// DAC/AOC haben keinen Fiber-Connector
|
|
||||||
const DAC_AOC_PATTERN = /\bDAC\b|\bAOC\b|\btwinax\b/i;
|
|
||||||
|
|
||||||
function extractWavelengthFromName(name: string): { tx: number; rx?: number; notes: string } | null {
|
|
||||||
for (const p of WAVELENGTH_PATTERNS) {
|
|
||||||
if (p.pattern.test(name)) {
|
|
||||||
return { tx: p.tx, rx: p.rx, notes: p.notes };
|
|
||||||
}
|
|
||||||
}
|
|
||||||
return null;
|
|
||||||
}
|
|
||||||
|
|
||||||
function extractConnectorFromName(name: string): string | null {
|
|
||||||
if (DAC_AOC_PATTERN.test(name)) return "DAC/AOC";
|
|
||||||
for (const p of CONNECTOR_PATTERNS) {
|
|
||||||
if (p.pattern.test(name)) return p.connector;
|
|
||||||
}
|
|
||||||
return null;
|
|
||||||
}
|
|
||||||
|
|
||||||
export async function runWavelengthEnricher(): Promise<void> {
|
|
||||||
console.log("=== Wavelength Enricher Robot ===");
|
|
||||||
|
|
||||||
// Alle Transceivers mit fehlenden Pflichtfeldern
|
|
||||||
const { rows: transceivers } = await pool.query<{
|
|
||||||
id: string;
|
|
||||||
standard_name: string;
|
|
||||||
part_number: string;
|
|
||||||
form_factor: string;
|
|
||||||
speed_gbps: number;
|
|
||||||
fiber_type: string;
|
|
||||||
reach_meters: number;
|
|
||||||
wavelength_tx_nm: number | null;
|
|
||||||
wavelength_rx_nm: number | null;
|
|
||||||
connector_type: string | null;
|
|
||||||
}>(`
|
|
||||||
SELECT id, standard_name, part_number, form_factor, speed_gbps,
|
|
||||||
fiber_type, reach_meters, wavelength_tx_nm, wavelength_rx_nm, connector_type
|
|
||||||
FROM transceivers
|
|
||||||
WHERE enrichment_needed = TRUE
|
|
||||||
ORDER BY data_completeness DESC -- Produkte mit mehr Daten zuerst
|
|
||||||
LIMIT 5000
|
|
||||||
`);
|
|
||||||
|
|
||||||
let fromIeee = 0;
|
|
||||||
let fromRegex = 0;
|
|
||||||
let stillMissing = 0;
|
|
||||||
|
|
||||||
for (const t of transceivers) {
|
|
||||||
let txNm = t.wavelength_tx_nm;
|
|
||||||
let rxNm = t.wavelength_rx_nm;
|
|
||||||
let connector = t.connector_type;
|
|
||||||
let source = "";
|
|
||||||
|
|
||||||
// ── Quelle 1: IEEE/MSA Lookup ───────────────────────────────────────────
|
|
||||||
if (txNm === null && t.form_factor && t.speed_gbps && t.fiber_type && t.reach_meters) {
|
|
||||||
const { rows: ieee } = await pool.query<{
|
|
||||||
wavelength_tx_nm: number;
|
|
||||||
wavelength_rx_nm: number | null;
|
|
||||||
connector_type: string;
|
|
||||||
}>(`
|
|
||||||
SELECT wavelength_tx_nm, wavelength_rx_nm, connector_type
|
|
||||||
FROM ieee_wavelength_lookup
|
|
||||||
WHERE form_factor = $1
|
|
||||||
AND speed_gbps = $2
|
|
||||||
AND fiber_type = $3
|
|
||||||
AND reach_min_m <= $4
|
|
||||||
AND reach_max_m >= $4
|
|
||||||
LIMIT 1
|
|
||||||
`, [t.form_factor, t.speed_gbps, t.fiber_type, t.reach_meters]);
|
|
||||||
|
|
||||||
if (ieee.length > 0) {
|
|
||||||
txNm = ieee[0].wavelength_tx_nm;
|
|
||||||
rxNm = ieee[0].wavelength_rx_nm ?? null;
|
|
||||||
if (!connector) connector = ieee[0].connector_type;
|
|
||||||
source = "ieee_lookup";
|
|
||||||
fromIeee++;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// ── Quelle 2: Produktname-Regex ─────────────────────────────────────────
|
|
||||||
const nameForExtraction = [t.standard_name, t.part_number].filter(Boolean).join(" ");
|
|
||||||
|
|
||||||
if (txNm === null && nameForExtraction) {
|
|
||||||
const extracted = extractWavelengthFromName(nameForExtraction);
|
|
||||||
if (extracted) {
|
|
||||||
txNm = extracted.tx;
|
|
||||||
rxNm = extracted.rx ?? null;
|
|
||||||
source = `regex:${extracted.notes}`;
|
|
||||||
fromRegex++;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
if (connector === null && nameForExtraction) {
|
|
||||||
const extractedConn = extractConnectorFromName(nameForExtraction);
|
|
||||||
if (extractedConn && extractedConn !== "DAC/AOC") connector = extractedConn;
|
|
||||||
}
|
|
||||||
|
|
||||||
// ── Update wenn etwas gefunden ──────────────────────────────────────────
|
|
||||||
if (txNm !== null || connector !== null) {
|
|
||||||
await pool.query(`
|
|
||||||
UPDATE transceivers SET
|
|
||||||
wavelength_tx_nm = COALESCE($1, wavelength_tx_nm),
|
|
||||||
wavelength_rx_nm = COALESCE($2, wavelength_rx_nm),
|
|
||||||
connector_type = COALESCE($3, connector_type),
|
|
||||||
updated_at = NOW()
|
|
||||||
WHERE id = $4
|
|
||||||
`, [txNm, rxNm, connector, t.id]);
|
|
||||||
} else {
|
|
||||||
stillMissing++;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
console.log(` IEEE Lookup: ${fromIeee} enriched`);
|
|
||||||
console.log(` Regex Extract: ${fromRegex} enriched`);
|
|
||||||
console.log(` Still missing: ${stillMissing} (Quarantäne bis Daten verfügbar)`);
|
|
||||||
console.log("=== Wavelength Enricher Complete ===");
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## SCHRITT 4: Neuer deterministischer Matcher
|
|
||||||
|
|
||||||
**Ändere in:** `packages/scraper/src/scheduler.ts`
|
|
||||||
|
|
||||||
Suche den Block `// fiber_type match` (ca. Zeile 2780) und ersetze die gesamte
|
|
||||||
Matching-Funktion `maintenance:find-equivalences` durch folgende Logik:
|
|
||||||
|
|
||||||
```typescript
|
|
||||||
// ── NEUE deterministische Match-Logik ────────────────────────────────────────
|
|
||||||
// Kein Confidence-Score mehr. Nur MATCH oder KEIN MATCH.
|
|
||||||
// Voraussetzung: ALLE 6 Pflichtfelder müssen bekannt sein.
|
|
||||||
|
|
||||||
// Pflichtfelder prüfen — fehlt auch nur eines → kein Match
|
|
||||||
const fxComplete = fx.form_factor && fx.speed_gbps && fx.fiber_type &&
|
|
||||||
fx.reach_meters && fx.wavelength_tx_nm && fx.connector_type;
|
|
||||||
const candComplete = cand.form_factor && cand.speed_gbps && cand.fiber_type &&
|
|
||||||
cand.reach_meters && cand.wavelength_tx_nm && cand.connector_type;
|
|
||||||
|
|
||||||
if (!fxComplete || !candComplete) {
|
|
||||||
// Fehlende Daten → Enrichment-Queue, kein Match-Versuch
|
|
||||||
incompleteCount++;
|
|
||||||
continue;
|
|
||||||
}
|
|
||||||
|
|
||||||
// ── Exact Match (mit definierten Toleranzen) ──────────────────────────────────
|
|
||||||
const formFactorMatch = fx.form_factor.trim().toUpperCase() === cand.form_factor.trim().toUpperCase();
|
|
||||||
if (!formFactorMatch) continue; // Hart: falsches Gehäuse = nicht kompatibel
|
|
||||||
|
|
||||||
const speedMatch = Math.abs(Number(fx.speed_gbps) - Number(cand.speed_gbps)) < 0.1;
|
|
||||||
if (!speedMatch) continue; // Hart: 10G ≠ 25G
|
|
||||||
|
|
||||||
const fiberMatch = fx.fiber_type.trim().toUpperCase() === cand.fiber_type.trim().toUpperCase();
|
|
||||||
if (!fiberMatch) continue; // Hart: SMF ≠ MMF = komplett anderes Produkt
|
|
||||||
|
|
||||||
// Reach: ±10% Toleranz (Herstellervarianz bei Kabelqualität)
|
|
||||||
const reachRatio = Math.abs(fx.reach_meters - cand.reach_meters) / Math.max(fx.reach_meters, 1);
|
|
||||||
if (reachRatio > 0.10) continue;
|
|
||||||
|
|
||||||
// Wellenlänge: ±5nm Toleranz (Herstellervarianz innerhalb Spec)
|
|
||||||
const wlTxDiff = Math.abs((fx.wavelength_tx_nm ?? 0) - (cand.wavelength_tx_nm ?? 0));
|
|
||||||
if (wlTxDiff > 5) continue;
|
|
||||||
|
|
||||||
// BiDi RX nur prüfen wenn einer von beiden BiDi ist
|
|
||||||
const fxHasBidi = fx.wavelength_rx_nm != null;
|
|
||||||
const candHasBidi = cand.wavelength_rx_nm != null;
|
|
||||||
if (fxHasBidi || candHasBidi) {
|
|
||||||
const wlRxDiff = Math.abs((fx.wavelength_rx_nm ?? 0) - (cand.wavelength_rx_nm ?? 0));
|
|
||||||
if (wlRxDiff > 5) continue;
|
|
||||||
}
|
|
||||||
|
|
||||||
// Connector: Exact Match (LC ≠ SC ≠ MPO-12)
|
|
||||||
const connMatch = fx.connector_type.trim().toUpperCase() === cand.connector_type.trim().toUpperCase();
|
|
||||||
if (!connMatch) continue;
|
|
||||||
|
|
||||||
// ── Alle Tests bestanden → MATCH (100% sicher) ───────────────────────────────
|
|
||||||
const matchBasis = ['form_factor', 'speed_gbps', 'fiber_type', 'reach', 'wavelength_tx', 'connector'];
|
|
||||||
const notes = `${fx.part_number} ↔ ${cand.part_number} (${cand.vendor_name}) | ` +
|
|
||||||
`basis: ${matchBasis.join(', ')} | DETERMINISTIC MATCH`;
|
|
||||||
|
|
||||||
// Direkt auto_approved — kein pending mehr
|
|
||||||
await pool.query(`
|
|
||||||
INSERT INTO transceiver_equivalences
|
|
||||||
(flexoptix_id, competitor_id, confidence, match_basis, match_notes, status)
|
|
||||||
VALUES ($1, $2, 1.0, $3, $4, 'auto_approved')
|
|
||||||
ON CONFLICT (flexoptix_id, competitor_id) DO UPDATE SET
|
|
||||||
confidence = 1.0,
|
|
||||||
match_basis = EXCLUDED.match_basis,
|
|
||||||
match_notes = EXCLUDED.match_notes,
|
|
||||||
status = 'auto_approved',
|
|
||||||
updated_at = NOW()
|
|
||||||
WHERE transceiver_equivalences.status = 'pending'
|
|
||||||
`, [fx.id, cand.competitor_id, matchBasis, notes]);
|
|
||||||
|
|
||||||
// competitor_verified setzen
|
|
||||||
await pool.query(`
|
|
||||||
UPDATE transceivers
|
|
||||||
SET competitor_verified = true,
|
|
||||||
competitor_verified_at = NOW(),
|
|
||||||
competitor_status = 'matched',
|
|
||||||
competitor_status_updated_at = NOW()
|
|
||||||
WHERE id = $1 AND competitor_verified = false
|
|
||||||
`, [fx.id]);
|
|
||||||
|
|
||||||
matchedCount++;
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## SCHRITT 5: Pending-Queue bereinigen
|
|
||||||
|
|
||||||
**Erstelle Datei:** `sql/107-clear-pending-queue.sql`
|
|
||||||
|
|
||||||
```sql
|
|
||||||
-- Migration 107: Pending Queue bereinigen
|
|
||||||
-- Alle 'pending' Einträge die KEINEN deterministischen Match haben → reject
|
|
||||||
-- Alle 'pending' Einträge die jetzt deterministische Matches wären → werden
|
|
||||||
-- durch den neuen Matcher bei nächstem Run neu erzeugt (als auto_approved)
|
|
||||||
|
|
||||||
-- Schritt 1: Alle pending-Einträge rejcten (veraltete unsichere Matches)
|
|
||||||
UPDATE transceiver_equivalences
|
|
||||||
SET status = 'rejected',
|
|
||||||
reject_reason = 'Superseded by deterministic matcher — confidence-based match removed',
|
|
||||||
reviewed_at = NOW(),
|
|
||||||
reviewed_by = 'system:migration-107'
|
|
||||||
WHERE status = 'pending';
|
|
||||||
|
|
||||||
-- Schritt 2: Statistik loggen
|
|
||||||
DO $$
|
|
||||||
DECLARE
|
|
||||||
pending_count INTEGER;
|
|
||||||
approved_count INTEGER;
|
|
||||||
rejected_count INTEGER;
|
|
||||||
BEGIN
|
|
||||||
SELECT COUNT(*) INTO pending_count FROM transceiver_equivalences WHERE status = 'pending';
|
|
||||||
SELECT COUNT(*) INTO approved_count FROM transceiver_equivalences WHERE status IN ('approved', 'auto_approved');
|
|
||||||
SELECT COUNT(*) INTO rejected_count FROM transceiver_equivalences WHERE status = 'rejected';
|
|
||||||
RAISE NOTICE 'After migration 107: pending=%, approved=%, rejected=%',
|
|
||||||
pending_count, approved_count, rejected_count;
|
|
||||||
END;
|
|
||||||
$$;
|
|
||||||
|
|
||||||
-- Schritt 3: Index für deterministischen Matcher optimieren
|
|
||||||
CREATE INDEX IF NOT EXISTS idx_eq_deterministic ON transceiver_equivalences
|
|
||||||
(flexoptix_id, competitor_id, status)
|
|
||||||
WHERE status = 'auto_approved';
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## SCHRITT 6: pg-boss Job für Enrichment-Robot registrieren
|
|
||||||
|
|
||||||
**Ändere in:** `packages/scraper/src/scheduler.ts`
|
|
||||||
|
|
||||||
Im Block wo Jobs registriert werden, hinzufügen:
|
|
||||||
|
|
||||||
```typescript
|
|
||||||
// Wavelength Enricher — läuft alle 4 Stunden
|
|
||||||
await boss.schedule('enrich:wavelength', '0 */4 * * *', {}, {});
|
|
||||||
|
|
||||||
// Handler registrieren
|
|
||||||
boss.work('enrich:wavelength', async () => {
|
|
||||||
await runWavelengthEnricher();
|
|
||||||
});
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## SCHRITT 7: API — Manual Review UI deaktivieren
|
|
||||||
|
|
||||||
**Ändere in:** `packages/api/src/routes/review.ts`
|
|
||||||
|
|
||||||
Den POST-Endpunkt `/equivalences/:id/approve` mit einem Guard versehen:
|
|
||||||
|
|
||||||
```typescript
|
|
||||||
reviewRouter.post("/equivalences/:id/approve", async (req, res) => {
|
|
||||||
// Manual approval ist deaktiviert — deterministischer Matcher übernimmt
|
|
||||||
res.status(410).json({
|
|
||||||
error: "Manual approval disabled",
|
|
||||||
message: "The system now uses deterministic matching. No manual review needed.",
|
|
||||||
info: "Matches are auto-approved when all 6 mandatory fields match exactly."
|
|
||||||
});
|
|
||||||
});
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## SCHRITT 8: Transceivers-Query für Matcher aktualisieren
|
|
||||||
|
|
||||||
**Ändere in:** `packages/scraper/src/scheduler.ts`
|
|
||||||
|
|
||||||
Den SQL-Query der Flexoptix-Transceivers für den Matcher aktualisieren,
|
|
||||||
damit wavelength_tx_nm und connector_type mitgeladen werden:
|
|
||||||
|
|
||||||
```sql
|
|
||||||
-- Flexoptix-Transceivers für Matcher laden
|
|
||||||
SELECT
|
|
||||||
t.id, t.part_number, t.standard_name, t.form_factor,
|
|
||||||
t.speed_gbps, t.fiber_type, t.reach_meters, t.wavelengths,
|
|
||||||
t.wavelength_tx_nm, -- NEU
|
|
||||||
t.wavelength_rx_nm, -- NEU
|
|
||||||
t.connector_type, -- NEU (war vorher 'connector')
|
|
||||||
t.data_completeness, -- NEU
|
|
||||||
t.enrichment_needed -- NEU
|
|
||||||
FROM transceivers t
|
|
||||||
JOIN vendors v ON v.id = t.vendor_id
|
|
||||||
WHERE v.name = 'Flexoptix'
|
|
||||||
AND t.enrichment_needed = FALSE -- NUR vollständige Datensätze matchen
|
|
||||||
AND t.data_completeness >= 80
|
|
||||||
ORDER BY t.part_number
|
|
||||||
```
|
|
||||||
|
|
||||||
Gleiches für den Kandidaten-Query (Competitor-Transceivers).
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## SCHRITT 9: Bestehende wavelengths-Spalte migrieren
|
|
||||||
|
|
||||||
Die existierende Spalte `transceivers.wavelengths` (TEXT, z.B. "1310nm" oder "1270/1330nm")
|
|
||||||
in die neuen numerischen Spalten überführen:
|
|
||||||
|
|
||||||
**Erstelle Datei:** `sql/108-migrate-wavelengths-text-to-int.sql`
|
|
||||||
|
|
||||||
```sql
|
|
||||||
-- Migration 108: wavelengths TEXT → wavelength_tx_nm / wavelength_rx_nm INTEGER
|
|
||||||
|
|
||||||
UPDATE transceivers SET
|
|
||||||
wavelength_tx_nm = CASE
|
|
||||||
WHEN wavelengths ~ '^\s*850' THEN 850
|
|
||||||
WHEN wavelengths ~ '^\s*1270' THEN 1270
|
|
||||||
WHEN wavelengths ~ '^\s*1310' THEN 1310
|
|
||||||
WHEN wavelengths ~ '^\s*1330' THEN 1330
|
|
||||||
WHEN wavelengths ~ '^\s*1490' THEN 1490
|
|
||||||
WHEN wavelengths ~ '^\s*1550' THEN 1550
|
|
||||||
WHEN wavelengths ~ '^\s*1270\s*/\s*1330' THEN 1270
|
|
||||||
WHEN wavelengths ~ '^\s*1330\s*/\s*1270' THEN 1330
|
|
||||||
ELSE NULL
|
|
||||||
END,
|
|
||||||
wavelength_rx_nm = CASE
|
|
||||||
WHEN wavelengths ~ '1270\s*/\s*1330' THEN 1330
|
|
||||||
WHEN wavelengths ~ '1330\s*/\s*1270' THEN 1270
|
|
||||||
WHEN wavelengths ~ '1310\s*/\s*1550' THEN 1550
|
|
||||||
WHEN wavelengths ~ '1550\s*/\s*1310' THEN 1310
|
|
||||||
ELSE NULL
|
|
||||||
END
|
|
||||||
WHERE wavelengths IS NOT NULL
|
|
||||||
AND wavelength_tx_nm IS NULL;
|
|
||||||
|
|
||||||
-- Connector aus alter connector-Spalte übernehmen (falls vorhanden)
|
|
||||||
UPDATE transceivers SET
|
|
||||||
connector_type = CASE connector
|
|
||||||
WHEN 'LC' THEN 'LC'
|
|
||||||
WHEN 'SC' THEN 'SC'
|
|
||||||
WHEN 'MPO' THEN 'MPO-12'
|
|
||||||
WHEN 'MPO-12' THEN 'MPO-12'
|
|
||||||
WHEN 'MPO-16' THEN 'MPO-16'
|
|
||||||
WHEN 'RJ45' THEN 'RJ45'
|
|
||||||
ELSE connector
|
|
||||||
END
|
|
||||||
WHERE connector IS NOT NULL AND connector_type IS NULL;
|
|
||||||
|
|
||||||
-- Completeness neu berechnen nach Migration
|
|
||||||
UPDATE transceivers SET
|
|
||||||
data_completeness = calc_data_completeness(
|
|
||||||
form_factor, speed_gbps, fiber_type,
|
|
||||||
reach_meters, wavelength_tx_nm, connector_type
|
|
||||||
),
|
|
||||||
enrichment_needed = (
|
|
||||||
form_factor IS NULL OR speed_gbps IS NULL OR
|
|
||||||
fiber_type IS NULL OR reach_meters IS NULL OR
|
|
||||||
wavelength_tx_nm IS NULL OR connector_type IS NULL
|
|
||||||
);
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Ausführungsreihenfolge für Codex
|
|
||||||
|
|
||||||
```
|
|
||||||
1. sql/105-wavelength-connector-completeness.sql → psql ausführen
|
|
||||||
2. sql/106-ieee-msa-wavelength-lookup.sql → psql ausführen
|
|
||||||
3. sql/108-migrate-wavelengths-text-to-int.sql → psql ausführen
|
|
||||||
4. packages/scraper/src/robots/wavelength-enricher.ts → neue Datei erstellen
|
|
||||||
5. packages/scraper/src/scheduler.ts → Matcher-Logik ersetzen (Schritt 4)
|
|
||||||
6. packages/scraper/src/scheduler.ts → Job registrieren (Schritt 6)
|
|
||||||
7. packages/api/src/routes/review.ts → Approve-Endpunkt deaktivieren (Schritt 7)
|
|
||||||
8. npm run build -w packages/scraper → Build
|
|
||||||
9. npm run build -w packages/api → Build
|
|
||||||
10. sql/107-clear-pending-queue.sql → psql ausführen (ZULETZT — bereinigt Queue)
|
|
||||||
11. pm2 restart tip-scraper-daemon → Daemon neu starten
|
|
||||||
12. Enricher manuell triggern: POST /api/review/run-matcher
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Erwartetes Ergebnis nach Deployment
|
|
||||||
|
|
||||||
| Metrik | Vorher | Nachher |
|
|
||||||
|--------|--------|---------|
|
|
||||||
| Pending Queue | 13.374 | 0 |
|
|
||||||
| Confidence-Score | 0.0–1.0 (fuzzy) | entfällt |
|
|
||||||
| Match-Typ | probabilistisch | deterministisch |
|
|
||||||
| Manuelle Freigaben/Tag | ~50-200 | 0 |
|
|
||||||
| False-Positive-Rate | ~15% | ~0% |
|
|
||||||
| Transceivers mit Wellenlänge | ~30% | >85% (nach Enricher) |
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Nicht ändern (außerhalb dieses Tasks)
|
|
||||||
|
|
||||||
- `packages/scraper/src/scrapers/*` — Scraper-Logik bleibt unverändert
|
|
||||||
- `packages/api/src/routes/` (außer review.ts) — API-Endpunkte bleiben
|
|
||||||
- `sql/001-036` — Bestehende Migrationen nicht anfassen
|
|
||||||
- Alle `packages/core/src/` Typen — nur erweitern, nicht löschen
|
|
||||||
- PM2-Konfiguration — nicht anfassen
|
|
||||||
13
README.md
13
README.md
@ -127,19 +127,6 @@ All data comes from publicly available sources:
|
|||||||
- Multi-Source Agreements (100G CWDM4 MSA, 100G PSM4 MSA, 100G Lambda MSA, OpenZR+)
|
- Multi-Source Agreements (100G CWDM4 MSA, 100G PSM4 MSA, 100G Lambda MSA, OpenZR+)
|
||||||
- Vendor datasheets and public documentation
|
- Vendor datasheets and public documentation
|
||||||
|
|
||||||
## Flexoptix Catalog Import
|
|
||||||
|
|
||||||
Private TIP deployments can import the normalized Flexoptix shop catalog produced
|
|
||||||
by Magatama/Pulso:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
npm run flexoptix:catalog:import -- --dry-run
|
|
||||||
TIP_DB_PASS=... npm run flexoptix:catalog:import
|
|
||||||
```
|
|
||||||
|
|
||||||
See [docs/FLEXOPTIX_CATALOG_IMPORT.md](docs/FLEXOPTIX_CATALOG_IMPORT.md) for the
|
|
||||||
full producer/import workflow and safety rules.
|
|
||||||
|
|
||||||
## Contributing
|
## Contributing
|
||||||
|
|
||||||
Contributions welcome. To add a new transceiver:
|
Contributions welcome. To add a new transceiver:
|
||||||
|
|||||||
@ -1,53 +0,0 @@
|
|||||||
# BlogLLM Training Data — Flexoptix Reference Articles
|
|
||||||
|
|
||||||
Gold-standard blog posts generated by Claude Sonnet (claude-sonnet-4-20250514) following the strict FO Blog Pipeline rules. These serve as reference examples for fine-tuning and training the BlogLLM.
|
|
||||||
|
|
||||||
## Articles
|
|
||||||
|
|
||||||
| File | Title | Type | Score |
|
|
||||||
|------|-------|------|-------|
|
|
||||||
| blog-001-400g-dr4-price-war.md | 400G DR4 Prices Are Moving... | market_alert | 9/10 |
|
|
||||||
| blog-002-vendor-lock-in-optics.md | The Hidden Tax in Your Transceiver Budget | comparison | 9/10 |
|
|
||||||
| blog-003-silicon-photonics.md | Silicon Photonics Is Shipping... | technology_deep_dive | 9/10 |
|
|
||||||
| blog-004-400g-migration-fiber-plant.md | Your 100G Fiber Plant Is Not Ready for 400G | tutorial | 9/10 |
|
|
||||||
| blog-005-coherent-400zr-reality.md | 400ZR Is Not What the Vendor Presentations Said | technology_deep_dive | 9/10 |
|
|
||||||
| blog-006-dom-diagnostics.md | Reading DOM Data Correctly | tutorial | 9/10 |
|
|
||||||
| blog-007-800g-readiness.md | 800G Is Shipping. Your Infrastructure Probably Isn't Ready. | hype_cycle | 9/10 |
|
|
||||||
| blog-008-oem-vs-compatible-real-numbers.md | OEM vs Compatible Transceivers: The Numbers Nobody Publishes | buying_guide | 9/10 |
|
|
||||||
| blog-009-100g-to-400g-migration-what-breaks.md | 100G to 400G Migration: What Actually Breaks and Why | migration_guide | 9/10 |
|
|
||||||
| blog-010-qsfp-dd-vs-osfp-form-factor-reality.md | QSFP-DD vs OSFP: The Form Factor War That Already Ended | technology_deep_dive | 9/10 |
|
|
||||||
| blog-011-transceiver-procurement-checklist.md | The Transceiver Procurement Checklist Nobody Gave You | tutorial | 9/10 |
|
|
||||||
| blog-012-coherent-vs-direct-detect-decision.md | Coherent vs. Direct Detect: The Decision Your Network Will Make for the Next Decade | technology_deep_dive | 9/10 |
|
|
||||||
| blog-013-price-drop-timing-when-to-buy.md | When to Buy: Reading the Transceiver Price Cycle Before It Reads You | market_alert | 9/10 |
|
|
||||||
| blog-014-800g-new-products-what-ships.md | 800G Is Shipping: What's Actually Available and What You Can Deploy Today | new_product | 9/10 |
|
|
||||||
| blog-015-compatible-vendor-comparison-who-to-trust.md | Compatible Transceiver Vendors in 2026: Who Does the Testing and Who Just Says They Do | competitor_analysis | 9/10 |
|
|
||||||
|
|
||||||
## Quality Rules Met (per article)
|
|
||||||
|
|
||||||
All articles were generated under strict constraints:
|
|
||||||
- No markdown headers (##, ###) anywhere in body
|
|
||||||
- No bullet lists as structural elements
|
|
||||||
- No LaTeX formulas
|
|
||||||
- No banned AI phrases ("leverage", "optimize", "game-changer", etc.)
|
|
||||||
- No spec dumps or comparison tables
|
|
||||||
- No OEM pricing presented as compatible pricing
|
|
||||||
- No sales language ("BUY / AVOID", verdict blocks)
|
|
||||||
- DR4 connector: MPO-12 (never LC)
|
|
||||||
- DR4 wavelength: 1310nm (never 1550nm)
|
|
||||||
- 400ZR and DR4 treated as distinct technologies
|
|
||||||
- No per-port power figures >25W
|
|
||||||
- No made-up part numbers
|
|
||||||
- Only CMOS/physics-grounded values
|
|
||||||
- One core thesis per article
|
|
||||||
- Flexoptix FINAL OUTCOME TEST: reader finishes ready to validate properly, not defaulting to OEM
|
|
||||||
|
|
||||||
## Usage for BlogLLM Training
|
|
||||||
|
|
||||||
1. Import these as positive examples into the fine-tuning dataset
|
|
||||||
2. Each article is ~800-1200 words (production blog length)
|
|
||||||
3. Type field maps to generation template types in `fo-blog-pipeline.ts`
|
|
||||||
4. These represent the output quality gate — generated articles should be compared to these for scoring
|
|
||||||
|
|
||||||
## Adding More Training Data
|
|
||||||
|
|
||||||
Generate via API: `POST /api/blog/generate` with `use_llm: "fo_pipeline"` + Claude provider, then export from DB as additional training examples.
|
|
||||||
@ -1,41 +0,0 @@
|
|||||||
---
|
|
||||||
title: "400G DR4 Prices Are Moving. Here's What's Actually Happening."
|
|
||||||
type: market_alert
|
|
||||||
audience: network_architects_technical_buyers
|
|
||||||
quality_score: 9
|
|
||||||
generated_by: claude-sonnet-4-20250514
|
|
||||||
generated_at: 2026-04-06
|
|
||||||
training_data: true
|
|
||||||
---
|
|
||||||
|
|
||||||
The quotes landing in inboxes right now look different from six months ago. 400G DR4 pricing has been shifting, and not uniformly — the moves are happening at the vendor level, not the market level, which means what you're seeing depends entirely on who you're asking.
|
|
||||||
|
|
||||||
That distinction matters more than the number itself.
|
|
||||||
|
|
||||||
For the last two years, the dominant pattern in 400G DR4 was simple: OEM pricing from Cisco, Arista, and Juniper stayed high, while the compatible market — Flexoptix, FS.com, ProLabs — followed a different curve. The gap was already significant. What's changed is where that gap has settled, and whether it's likely to hold.
|
|
||||||
|
|
||||||
The driver isn't silicon scarcity anymore. 400G QSFP-DD chipsets are no longer a bottleneck at the fab level. The constraint that kept 400G expensive in 2023 — limited VCSEL array capacity for the parallel lanes, plus yields on the DSP side — has worked itself out. Production has caught up. The yield curves on DR4 modules are now comparable to what SR4 looked like at the 100G ramp in 2018.
|
|
||||||
|
|
||||||
What that means in practice: compatible 400G DR4 is now manufactured at enough volume that pricing pressure from within that market is real. Vendors aren't cutting margins out of generosity. They're responding to supply that's structurally different from two years ago.
|
|
||||||
|
|
||||||
The OEM side hasn't moved equivalently. It rarely does at this phase. The OEM model doesn't reset pricing based on component costs — it resets based on attach rate to hardware, competitive pressure from specific accounts, and whether the RFP in question has someone who actually checked the compatibility list. That last one is more common than it used to be.
|
|
||||||
|
|
||||||
Where this creates a real operational decision: the window for infrastructure builds where 400G DR4 at OEM pricing makes financial sense is narrowing. Not because compatible quality has improved in some abstract sense — it hasn't changed, the specs haven't changed, the qualification testing hasn't changed. The window is narrowing because the cost delta is now visible enough that procurement teams are asking the question, which they weren't consistently doing eighteen months ago.
|
|
||||||
|
|
||||||
The question is usually the wrong one. "Is compatible 400G DR4 as good as OEM?" misses the actual risk surface. The real question is whether the deployment infrastructure around the module is set up to handle what 400G DR4 actually requires — and that's a question that applies to OEM modules too.
|
|
||||||
|
|
||||||
DR4 is not SR4 at a higher speed. The move from multimode to singlemode changes everything about your margin stack. At 400G, a contaminated end-face on an MPO-12 connector doesn't just degrade performance — it can take a lane offline without triggering a clean link-down event. You get partial link failures, asymmetric BER across lanes, behavior that's genuinely hard to diagnose if you're not looking for it.
|
|
||||||
|
|
||||||
This isn't an argument against compatible optics. It's an argument that the deployment validation process needs to match the technology, and that doesn't change based on vendor or price point. An OEM module in a dirty MPO with a poor mating sleeve behaves identically to a compatible module in the same condition. The fiber plant doesn't know who made the transceiver.
|
|
||||||
|
|
||||||
The shift happening now is that buyers who do have that process in place — clean fiber, verified end-faces, proper OTDR traces on the backbone, structured commissioning — are accelerating their 400G DR4 procurement cycles. Because when you trust your infrastructure, the delta between OEM and compatible is just money.
|
|
||||||
|
|
||||||
The buyers who don't have that process in place are slower to move regardless of pricing. That's not a market timing problem. That's a readiness problem.
|
|
||||||
|
|
||||||
Current pricing levels in the compatible market represent a floor that's likely stable for 12-18 months, not a temporary dip. The conditions that create price floors at a technology maturity level — broad supplier base, no single-vendor component dependencies, well-established qualification processes — are all present for 400G DR4. That's not speculation; it's the same pattern that played out in 10G SFP+, 40G QSFP+, and 100G QSFP28 at equivalent points in their cycles.
|
|
||||||
|
|
||||||
The one variable that can move this: if demand for 800G accelerates faster than expected, some of the manufacturing capacity currently allocated to 400G modules shifts. That would tighten supply briefly and reset pricing upward. Right now that scenario is possible but not the base case — 800G is growing in hyperscale but the enterprise and service provider 400G wave hasn't peaked.
|
|
||||||
|
|
||||||
For anyone sitting on a planned 400G DR4 deployment that's been waiting for budget cycles or vendor qualification timelines: the pricing argument for moving now is as strong as it's been. The infrastructure argument for doing your fiber validation before you deploy is the same as it's always been.
|
|
||||||
|
|
||||||
Those two things aren't in conflict.
|
|
||||||
@ -1,39 +0,0 @@
|
|||||||
---
|
|
||||||
title: "The Hidden Tax in Your Transceiver Budget"
|
|
||||||
type: comparison
|
|
||||||
audience: network_architects_procurement_engineers
|
|
||||||
quality_score: 9
|
|
||||||
generated_by: claude-sonnet-4-20250514
|
|
||||||
generated_at: 2026-04-06
|
|
||||||
training_data: true
|
|
||||||
---
|
|
||||||
|
|
||||||
The line item that looks like a small percentage on a BOM is never small when you multiply it across a data center refresh.
|
|
||||||
|
|
||||||
Most network engineers have seen this math at least once. The switch quote comes in. The hardware is competitively priced. The optics line — whether it's QSFP28 100G LR4, QSFP-DD 400G DR4, or something else — is either buried in the chassis cost or listed separately at a price that reflects something other than the component market.
|
|
||||||
|
|
||||||
That price reflects a business model.
|
|
||||||
|
|
||||||
OEM transceivers are not priced based on what they cost to make. They're priced based on their role in a software-enforced captive market. The module itself — manufactured at the same fabs, in many cases using the same chipsets as third-party alternatives — carries a margin that exists because the router or switch in the rack will check a digital signature before it powers the port on. Remove the signature requirement, and the module is worth a fraction of the OEM list price.
|
|
||||||
|
|
||||||
None of this is new. What's changed is how visible it's become to people who didn't used to notice it.
|
|
||||||
|
|
||||||
For 10G SFP+, the gap was material but manageable — the absolute dollar amount per module was low enough that procurement teams often didn't push back. For 100G QSFP28, the numbers started drawing attention. For 400G and above, the per-unit cost is high enough, and the port counts in a modern leaf-spine build are large enough, that the optics line routinely exceeds the hardware line on refresh cycles. At that point, the TCO conversation is unavoidable.
|
|
||||||
|
|
||||||
The technical argument for OEM transceivers has always rested on one foundation: they're tested and validated by the switch vendor for that specific platform. That's true, as far as it goes. The question is what it costs to achieve the same validation state with a compatible module, and whether the OEM premium is actually paying for anything beyond access to the digital key.
|
|
||||||
|
|
||||||
For a platform with a well-documented compatibility check process — Cisco's unsupported-transceiver warnings, Juniper's optics validation, Arista's QSFP management — the path to deploying compatible optics is a configuration change and a verification run, not an engineering project. The module goes in. The software flag gets acknowledged or suppressed based on policy. The DOM readout looks the same. The link comes up.
|
|
||||||
|
|
||||||
The validation work that actually matters isn't vendor-provided. It's yours. Fiber end-face cleanliness, insertion loss per span, OTDR traces, power budget verification — these determine whether the link performs correctly, and they're equally necessary whether the module cost $400 or $4,000. An OEM module in a dirty MPO connector performs worse than a compatible module in a clean one. The physics doesn't care about the digital signature.
|
|
||||||
|
|
||||||
Where OEM lock-in does have a real cost that's underappreciated: spares and RMA cycles. When you standardize on OEM transceivers, your spares inventory is tied to whatever the hardware vendor decides to make available, at whatever price they decide to charge, on whatever lead time they have at the moment you need it. During supply disruptions — and the last few years have had several — the OEM channel was frequently the bottleneck, not the alternative.
|
|
||||||
|
|
||||||
The argument isn't that OEM is always wrong. In specific contexts — ultra-long-haul DWDM with tight interoperability requirements, early-deployment platforms where compatibility lists are short, environments with vendor SLA requirements that explicitly name the transceiver — OEM makes sense. The argument is that defaulting to OEM across an entire deployment because it feels safer is a choice that costs real money without buying equivalent risk reduction in most cases.
|
|
||||||
|
|
||||||
The lock-in calculation changes with scale. For 10 ports, the discussion barely matters. For a 1,000-port leaf-spine build, the optics delta is a budget line that funds significant infrastructure elsewhere. The teams that have done this math once don't need to be convinced twice.
|
|
||||||
|
|
||||||
What usually takes longer is the process argument: "our NOC doesn't know how to handle compatible optics in the ticketing system." That's a real friction point, and it's worth taking seriously. It's also solvable — labeling conventions, runbook updates, a clear policy on what gets flagged as "unsupported" versus what gets treated as standard ops. The process friction is one-time work. The price delta is recurring, every refresh cycle, for the life of the infrastructure.
|
|
||||||
|
|
||||||
The more interesting version of this conversation isn't OEM versus compatible. It's what it means for a data center architecture to have a transceiver strategy that isn't vendor-defined. That means knowing your compatibility matrix before you write the RFP, not after you've committed to a chassis. It means treating fiber validation as infrastructure work, not an afterthought. It means having a spares policy that reflects actual failure rates rather than what the vendor suggested.
|
|
||||||
|
|
||||||
At that point, the module in the port is a commodity decision. Which is exactly what it should be.
|
|
||||||
@ -1,37 +0,0 @@
|
|||||||
---
|
|
||||||
title: "Silicon Photonics Is Shipping. The Industry Hasn't Caught Up Yet."
|
|
||||||
type: technology_deep_dive
|
|
||||||
audience: network_architects_senior_engineers
|
|
||||||
quality_score: 9
|
|
||||||
generated_by: claude-sonnet-4-20250514
|
|
||||||
generated_at: 2026-04-06
|
|
||||||
training_data: true
|
|
||||||
---
|
|
||||||
|
|
||||||
There's a specific moment in a technology transition where the hardware is ready before the rest of the stack has adjusted. Silicon photonics for optical transceivers is in that moment right now.
|
|
||||||
|
|
||||||
Modules based on silicon photonics are shipping. They're in production deployments. The yields have improved enough that they're not experimental, and the power story — which was the main concern through most of the development cycle — has shifted meaningfully at 400G and above. What hasn't caught up is the mental model most network teams carry about what an optical transceiver is, where it fails, and how to operate it.
|
|
||||||
|
|
||||||
The traditional transceiver is a discrete assembly: laser source (usually an InP or GaAs-based VCSEL or DFB), modulator, photodetector, and DSP, assembled from separate components and connected with precise optical alignment inside the package. That assembly process is expensive, yield-limited, and fundamentally not the same as semiconductor manufacturing. The optical alignment tolerances are sub-micron. Individual components get binned and sorted. The production model is artisanal compared to CMOS.
|
|
||||||
|
|
||||||
Silicon photonics changes the fundamental constraint. The waveguides, the modulators, the photodetectors — all fabricated on silicon using the same process nodes as CMOS logic. Coupled with external light sources (typically III-V lasers bonded to the chip), the platform allows optical components to be manufactured at semiconductor scale. Volume, yield, and cost follow a trajectory that discrete assembly can't match.
|
|
||||||
|
|
||||||
This matters operationally because it changes what failure looks like.
|
|
||||||
|
|
||||||
The failure modes in traditional discrete-component transceivers are well-understood: laser aging (slow Tx power decline over months), electrostatic damage to bond wires, thermal stress on the alignment, contamination on the MPO or LC interface. Field engineers have years of pattern recognition around these. A Tx power reading that drops 2 dB over six months means a specific thing about that specific type of module.
|
|
||||||
|
|
||||||
Silicon photonics-based modules introduce different failure modes — not necessarily worse, but different. The silicon waveguide itself is durable. The coupling between the III-V laser and the silicon waveguide, however, is a junction that behaves differently under thermal cycling than a traditional laser mount. Early-generation silicon photonics modules had higher sensitivity to temperature variation at the coupling point than discrete equivalents. That's been engineered down substantially, but it means that temperature-related DOM anomalies in a silicon photonics module require different diagnostic logic than the same readings in a traditional module.
|
|
||||||
|
|
||||||
The other operational difference: DOM reporting. Digital Optical Monitoring on silicon photonics platforms sometimes reflects the optical properties at a different point in the signal path than traditional modules. The Tx power readout is still the modulated output, but the intermediate values — what the laser diode monitor current represents, how bias current scaling maps to output power — aren't always equivalent to discrete-component baselines. Engineers who use DOM trends as a primary diagnostic tool need to recalibrate what "normal drift" looks like on these platforms. Not by a lot. But enough that a runbook built entirely on historical baseline ranges from InP-based modules will occasionally mislead.
|
|
||||||
|
|
||||||
The power efficiency argument is real and worth separating from marketing. For 400G DR4, silicon photonics-based modules are shipping with power consumption numbers that are competitive with the best discrete implementations. For coherent applications — 400ZR, ZR+ — the DSP power still dominates, so the photonic integration advantage is less visible at the module level. The story becomes clearer at 800G and above, where the parallel fiber count and the modulation complexity combine to make the traditional assembly approach structurally harder.
|
|
||||||
|
|
||||||
What doesn't change: the network still needs clean fiber. The physics of MPO connector end-face contamination is the same whether you're transmitting through a silicon waveguide or an InP laser cavity. Insertion loss per span still has to fit within the power budget. OTDR traces still matter. The shift to silicon photonics doesn't paper over any of the optical infrastructure requirements that have always existed — it just changes what's happening inside the transceiver package.
|
|
||||||
|
|
||||||
The adoption question in enterprise and service provider environments is more about qualification than technology. Switching vendors — even for a module form factor with identical electrical and optical specifications — triggers validation work. The silicon photonics-based 400G DR4 in a QSFP-DD housing passes the same interop tests as a discrete-component equivalent. The MSA specifications don't change. The compatibility check in the NOS doesn't distinguish. But the first time a new module type appears in a production ticket, someone has to decide whether the runbook applies or whether this is a new case.
|
|
||||||
|
|
||||||
The teams that will operationalize silicon photonics earliest are the ones that already have structured commissioning processes — power budget verification at installation, baseline DOM readings captured and retained, fiber infrastructure documented. For those teams, a silicon photonics-based module is a component swap with a short recalibration of baselines. For teams running on tribal knowledge about what good DOM numbers look like, any new module generation introduces more friction.
|
|
||||||
|
|
||||||
The technology is ready. The question is whether the operations model is.
|
|
||||||
|
|
||||||
At the volumes currently shipping from the major silicon photonics suppliers, this is no longer a bleeding-edge choice. It's a production reality that's showing up in competitive bids. Understanding what changed — and more importantly what didn't — is the difference between treating it as a risk and treating it as an engineering problem you already know how to handle.
|
|
||||||
@ -1,39 +0,0 @@
|
|||||||
---
|
|
||||||
title: "Your 100G Fiber Plant Is Not Ready for 400G. Here's How to Find Out Before It Bites You."
|
|
||||||
type: tutorial
|
|
||||||
audience: network_engineers_dc_operators
|
|
||||||
quality_score: 9
|
|
||||||
generated_by: claude-sonnet-4-20250514
|
|
||||||
generated_at: 2026-04-06
|
|
||||||
training_data: true
|
|
||||||
---
|
|
||||||
|
|
||||||
The link won't come up. Or it comes up, holds for three minutes, then drops. Or it's up but BER is drifting and you can't figure out why. You've replaced the optic twice. You've swapped the cable. The switch vendor TAC is asking for logs you've already sent them.
|
|
||||||
|
|
||||||
There's a good chance the problem is your fiber plant.
|
|
||||||
|
|
||||||
Specifically: cabling infrastructure that worked fine for 100G SR4 or even 100G LR4 has a meaningful probability of being marginal for 400G DR4 — not because anything broke, but because the loss budget at 400G is tighter and your plant was never characterized to the margin it now requires.
|
|
||||||
|
|
||||||
Here's what changes at 400G.
|
|
||||||
|
|
||||||
QSFP28 100G SR4 over OM4 has a maximum reach of 100m and a total optical budget of around 7.6 dB. That's generous. A slightly dirty connector, a patch cord with 0.5 dB insertion loss instead of 0.3, a couple of aging splice closures — the budget absorbs it. 400G QSFP-DD DR4 over OS2 singlemode has 500m of reach, which sounds like more room, but the available link budget for the entire span, including connectors and splices, is approximately 6.5 dB. That's the entire budget. No forgiveness. A single dirty end-face that would have been invisible at 100G can cost 1-2 dB on a contaminated MPO-12 interface, and now you're at margin. Maybe below it.
|
|
||||||
|
|
||||||
The failure mode isn't always dramatic. Sometimes you get no link. More often — and more insidiously — you get a link that functions at BER levels that are just below the FEC correction threshold under normal conditions, but tips over that threshold under thermal load, traffic bursts, or minor physical perturbation (someone brushes the cable tray, a fiber moves by a millimeter). Post-FEC errors start climbing. You get traffic drops that don't correlate to anything visible in syslog. This is the 400G deployment failure pattern that's hardest to debug, because it doesn't fail cleanly.
|
|
||||||
|
|
||||||
The diagnostic path starts at the MPO-12 interface.
|
|
||||||
|
|
||||||
Pull the fiber. Inspect the end-face with a fiber inspection probe — a visual inspection tool, not a power meter. What you're looking for is contamination, scratches, or chips in the core. Every MPO-12 connector has 12 fibers in a single interface. One contaminated fiber in that array degrades one lane. DR4 uses four transmit and four receive lanes. If any of those lanes is compromised, you have a partial link failure that presents as an asymmetric BER condition across the four lanes.
|
|
||||||
|
|
||||||
Clean it. This matters more than it sounds. A standard MPO cleaning tool (dry cleaning cassette, lint-free swab with IPA, or air clean depending on what you have) removes contamination that genuinely costs 1-2 dB. If you haven't cleaned the connectors recently, do it before you do anything else in the diagnostic chain. The number of 400G failures that resolve with end-face cleaning is high enough that cleaning is step one, every time, no exceptions.
|
|
||||||
|
|
||||||
After inspection and cleaning, take a loss measurement. You need an optical power meter or an OTDR, not the DOM Rx power reading from the switch CLI. The DOM reading tells you what power is arriving at the photodetector — it's useful but it doesn't break down the loss sources. An OTDR trace shows you loss by distance: you can see splice events, connector events, and whether a specific location in the span is introducing unexpected loss. For a new 400G deployment or a troublesome existing one, an OTDR trace on each fiber in the MPO is worth the time it takes.
|
|
||||||
|
|
||||||
The numbers to hold in your head for 400G DR4 on OS2 singlemode: 0.35 dB/km fiber loss at 1310nm (DR4 operates at 1310nm, not 1550nm — this is a common mistake and the loss figures are different), 0.3 dB per connector under clean conditions, 0.1 dB per fusion splice, 3 dB margin minimum. Run the budget with those numbers for your actual span. If the theoretical loss plus margin exceeds 6.5 dB, you have a margin problem that no transceiver replacement will fix.
|
|
||||||
|
|
||||||
The fiber type question catches some teams by surprise. If the cabling infrastructure was installed during a 10G or early 100G era, there may be OM3 or OM4 multimode fiber in the plant. DR4 requires OS2 singlemode. SR4 requires multimode. These are not interchangeable. Putting a DR4 transceiver on a multimode cable doesn't give you a link that degrades gracefully — it gives you nothing, or at best extremely high BER because the modal characteristics of multimode fiber at 1310nm with a singlemode source produce unusable output. If you're inheriting an infrastructure build and don't have a fiber plant documentation, pull the spec sheet for the installed cable before you spec the optics.
|
|
||||||
|
|
||||||
One pattern that appears repeatedly in 100G-to-400G transitions: the existing plant uses short MPO trunk cables with LC breakouts at the patch panels. That works well for SR4 (which is also MPO, also 8-fiber, also multimode). The same physical plant with OS2 trunk cables should work for DR4 — but the breakout loss at the cassette matters more than it did before. Verify the insertion loss specification on the cassette itself, not just the trunk cable. Some cassette designs introduce more connector pairs than others. Every connector pair is another 0.6 dB worst-case.
|
|
||||||
|
|
||||||
The good news: a fiber plant that's causing 400G failures is usually fixable without replacing cable. End-face cleaning, cleaning cassette replacement, occasionally a bad patchcord swap — these resolve the majority of cases. What they require is doing the characterization work before deployment rather than after the first outage.
|
|
||||||
|
|
||||||
Running a power budget calculation before installation takes ten minutes. Running it from a production switch while traffic is impacted takes considerably longer and costs considerably more.
|
|
||||||
@ -1,35 +0,0 @@
|
|||||||
---
|
|
||||||
title: "400ZR Is Not What the Vendor Presentations Said It Would Be"
|
|
||||||
type: technology_deep_dive
|
|
||||||
audience: network_architects_isp_engineers
|
|
||||||
quality_score: 9
|
|
||||||
generated_by: claude-sonnet-4-20250514
|
|
||||||
generated_at: 2026-04-06
|
|
||||||
training_data: true
|
|
||||||
---
|
|
||||||
|
|
||||||
The pitch was simple: put coherent transceivers in the router port, eliminate the standalone transponder chassis, cut the power and rack space, and get 400G per lambda on dark fiber. Plug-and-play DWDM.
|
|
||||||
|
|
||||||
That's broadly accurate. The deployment reality has more edges.
|
|
||||||
|
|
||||||
400ZR is a real standard, and the ecosystem has matured enough that the core promise holds: a QSFP-DD 400ZR module from Flexoptix or any standards-compliant vendor will interoperate with 400ZR gear from other vendors over a compatible DWDM system. The OIF 400ZR standard is well-specified. Interop isn't the problem it was at early availability.
|
|
||||||
|
|
||||||
The problems are operational, and most of them weren't in the vendor presentations.
|
|
||||||
|
|
||||||
The first one is power. A 400ZR module draws 15-20 watts. A QSFP-DD 400G DR4 for a datacenter leaf-spine link draws 6-8 watts. Put 32 ZR ports on a spine switch and you have a 480-640 watt thermal load from the optics alone, before the switching ASIC. That's not a hypothetical — it's why several major cloud operators who piloted 400ZR at the ToR ran into airflow problems in racks that weren't designed for it, even though the switch technically supports ZR modules. Thermal headroom per shelf, per rack, and per row matters, and it has to be calculated before the hardware order.
|
|
||||||
|
|
||||||
The second problem is that 400ZR was designed for short DCI links — metro, edge interconnect, typically under 100km without amplification, under several hundred kilometers with EDFA amplification on a well-characterized optical system. It was not designed for arbitrary dark fiber spans. "We have dark fiber to the other site" and "400ZR will work on this span" are not the same statement. The actual question is what the optical loss is on that dark fiber, what the chromatic dispersion profile looks like, what the OSNR is at the far end after accounting for amplifier noise if EDFAs are in the path, and whether the fiber has been characterized with an OTDR recently enough that you trust the numbers.
|
|
||||||
|
|
||||||
OSNR is the constraint that catches people. 400ZR requires a minimum OSNR — the standard specifies 23 dB for back-to-back performance, but the effective deployment requirement including margin is typically 26-27 dB. Below that threshold, the DSP can't close the link at spec. You'll get errors, or you won't get a link at all. The only way to know whether your span meets this threshold is to measure it or model it accurately. "The fiber was installed in 2018 and the OTDR looked fine then" is not the same as knowing your current OSNR.
|
|
||||||
|
|
||||||
This is where 400ZR deployments that skip proper optical layer commissioning create downstream problems that are genuinely difficult to debug. OSNR issues don't present as clean failure — they present as high pre-FEC BER, intermittent post-FEC errors, and occasional link resets under traffic load. The switch CLI reports an optical link. The ZR DSP reports lock. Traffic flows at reduced rates. The root cause is a span that's 2-3 dB marginal on OSNR, and you won't find it by looking at router logs.
|
|
||||||
|
|
||||||
The practical implication: if you're deploying 400ZR on any span longer than 80km, or on a span with existing EDFA amplifiers that haven't been recently characterized, commission the optical layer first. That means OSNR measurement at the far end, optical spectrum analysis if you have DWDM channels already loaded on the fiber, and loss budget verification per span. For dark fiber with unknown history, an OTDR trace is table stakes.
|
|
||||||
|
|
||||||
For the sub-80km case — metro DCI, ring interconnects, campus backbone — 400ZR is considerably more predictable. The spans are short enough that OSNR is rarely the constraint, the dispersion is manageable with the built-in electronic dispersion compensation in the ZR DSP, and the deployment pattern is close to what the original pitch described. On these spans, the module really does simplify the optical layer.
|
|
||||||
|
|
||||||
There's a ZR+ ecosystem that's worth distinguishing from ZR. 400ZR (OIF) is the standardized profile with well-defined interoperability. ZR+ (OpenZR+) extends the reach to 1200+ km using higher FEC gain and adjustable baud rate, but it's not an interoperability standard — ZR+ is a reach mode that requires matching vendor implementations on both ends. You can't mix ZR+ modules from different vendors and expect interop. If your architecture depends on multi-vendor interop at the optical layer, stay in 400ZR. If you're single-vendor end-to-end on a specific platform, ZR+ opens reach options that base 400ZR can't achieve.
|
|
||||||
|
|
||||||
The operational model for ZR also requires something that most campus and enterprise teams don't have: someone who can interpret optical performance monitoring data. A ZR module running with chromatic dispersion above the DSP compensation window, or on a span with OSNR variation due to Raman noise from other channels, shows specific DSP state changes that are meaningful if you know what to look for. A pre-FEC BER of 10^-3 on a ZR link is information. Knowing whether it's normal for that span at current traffic conditions, or whether it's trending toward a threshold that will cause a link drop in the next 48 hours, requires baseline data and someone who reads it.
|
|
||||||
|
|
||||||
For teams considering 400ZR: the technology is ready. The operational readiness requirement is higher than DR4. That's not a reason to avoid it. It's a reason to understand what you're committing to before you put it in production and measure success by the first week of operation.
|
|
||||||
@ -1,41 +0,0 @@
|
|||||||
---
|
|
||||||
title: "Reading DOM Data Correctly: What the Numbers Are Actually Telling You"
|
|
||||||
type: tutorial
|
|
||||||
audience: network_engineers_noc_operators
|
|
||||||
quality_score: 9
|
|
||||||
generated_by: claude-sonnet-4-20250514
|
|
||||||
generated_at: 2026-04-06
|
|
||||||
training_data: true
|
|
||||||
---
|
|
||||||
|
|
||||||
The DOM readout is on every transceiver in your network. Most engineers look at it when something's broken. The ones who look at it before something's broken find things earlier and fix them for less money.
|
|
||||||
|
|
||||||
Digital Optical Monitoring gives you five parameters: transmit power, receive power, supply voltage, bias current, and temperature. That's the base set. Coherent modules add more — OSNR, laser frequency, pre-FEC BER. For this, focus on the base five, because those are what you have on every port, and what most teams systematically underuse.
|
|
||||||
|
|
||||||
The CLI for getting DOM data varies by platform. On Junos, `show interfaces diagnostics optics xe-0/0/0` gives you the full picture including alarm and warning thresholds. On EOS (Arista), `show interfaces transceiver detail` is equivalent. On IOS-XE, `show interface GigabitEthernet1/0/1 transceiver detail`. Every platform has it. The output format is different but the parameters are the same.
|
|
||||||
|
|
||||||
Here's what each one means operationally.
|
|
||||||
|
|
||||||
Transmit power is the output of the laser. It's specified in dBm and it has a valid range that's in the module spec. For an SFP+ SR module, the range is typically -8.2 to +0.5 dBm. For a QSFP28 LR4, the Tx spec per lane is -4.3 to +4.5 dBm. The absolute values matter less than the trend. A new module installed eighteen months ago with Tx at -1.2 dBm, now reading -4.8 dBm — that's laser degradation. It's slow and it's real. The module may not be failing today, but it's showing you the trajectory.
|
|
||||||
|
|
||||||
Receive power is what's arriving at the photodetector after traveling through the fiber. This is the number that tells you about your fiber plant, not about your transceiver. If Tx looks normal but Rx is low, the problem is between the ports. Dirty connectors. High-loss splice. Wrong fiber type. A cable that was pulled too hard around a tight bend radius. When Rx drops suddenly and Tx hasn't changed, something physical happened.
|
|
||||||
|
|
||||||
Bias current is how hard the laser is being driven to maintain its output. As a laser ages, the control circuit increases bias current to compensate for declining efficiency. A module with Tx power in spec but bias current at 80-90% of the maximum range is a module that's compensating. Tx looks fine, bias tells you it won't last. This is the parameter most teams ignore and the one that gives the earliest warning of laser end-of-life.
|
|
||||||
|
|
||||||
Temperature matters more than most teams account for. Transceivers have operating ranges — COM grade (0-70°C) and Industrial grade (-40 to +85°C) are the main ones. Most data center optics are COM grade. At sustained temperatures above 65°C, you start seeing performance degradation and accelerated aging. The temperature alarm threshold is usually 75°C for COM modules — when you hit an alarm, you're already well into reduced-lifespan territory.
|
|
||||||
|
|
||||||
Voltage is usually boring. Power supply instability causes voltage anomalies, but well-maintained infrastructure rarely shows voltage deviations. If you're seeing voltage alarms, look at the switch power supply first.
|
|
||||||
|
|
||||||
The threshold values in the DOM output — high alarm, high warning, low warning, low alarm — come from the module itself. They're programmed by the manufacturer and they reflect what the module is designed to tolerate. A high alarm on Rx power doesn't mean the link is about to fail; it means the input power is above what the photodetector was calibrated for, which can cause receiver saturation. For LR4 in a short patch context — somebody put an LR4 in a rack-to-rack run that's effectively 3 meters — this is a real scenario. Add an attenuator, don't replace the module.
|
|
||||||
|
|
||||||
The most useful thing you can do with DOM data isn't checking it reactively. It's baseline logging. Record the DOM values for every module at installation. For Tx power, Rx power, and bias current, record the reading once a month. Three months of data shows you trends. Six months of data shows you which modules in your deployment are degrading faster than others, and it shows you before those modules cause outages.
|
|
||||||
|
|
||||||
This is routine in carrier and hyperscale environments. In enterprise and service provider environments below a certain size, it's often not done because it requires tooling and someone to look at the output. The tooling options are simpler than they used to be — LibreNMS, Netdisco, and several commercial NMS platforms will poll and graph DOM data automatically if you configure them to. The cost of not doing it is a Tx power alarm at 2 AM that would have been a planned maintenance window if you'd been watching the trend.
|
|
||||||
|
|
||||||
One practical trap: DOM data from a module is only as useful as the calibration of that module's internal sensors. Most well-made transceivers have sensor accuracy within 2-3 dB on power readings and within 3-5°C on temperature. Generic or extremely low-cost modules sometimes have wider tolerance. If you're seeing DOM readings that don't match an external power meter measurement, the module sensor may be the issue — it's a calibration problem with the module itself, not a fiber plant problem.
|
|
||||||
|
|
||||||
When DOM data and physical measurements disagree, trust the power meter on the fiber, not the module readout. The fiber doesn't lie. The module sensor calibration occasionally does.
|
|
||||||
|
|
||||||
For coherent 400ZR modules, pre-FEC BER is the additional parameter that matters most. Pre-FEC BER below 2.4×10^-4 is normal operating range for KP4 FEC. Above that threshold, the FEC is correcting errors that it may not be able to keep up with under degraded conditions. A stable pre-FEC BER of 1×10^-4 is fine. A pre-FEC BER that varies from 10^-5 to 10^-3 depending on traffic load is a span with marginal OSNR. That's a different problem than a dirty connector, and it requires a different fix.
|
|
||||||
|
|
||||||
DOM data doesn't replace physical inspection and fiber characterization. What it does is tell you where to start.
|
|
||||||
@ -1,37 +0,0 @@
|
|||||||
---
|
|
||||||
title: "800G Is Shipping. Your Infrastructure Probably Isn't Ready."
|
|
||||||
type: hype_cycle
|
|
||||||
audience: network_architects_ctos
|
|
||||||
quality_score: 9
|
|
||||||
generated_by: claude-sonnet-4-20250514
|
|
||||||
generated_at: 2026-04-06
|
|
||||||
training_data: true
|
|
||||||
---
|
|
||||||
|
|
||||||
800G hardware is available. It's in production at hyperscale. The switch ASICs are real, the modules are shipping, and the industry demos are no longer demos. If you're building a greenfield data center in 2026, 800G is the right architecture for spine interconnects in high-performance environments.
|
|
||||||
|
|
||||||
That's the part that's easy to say. Here's the part that gets glossed over.
|
|
||||||
|
|
||||||
The qualification process for 800G is longer than it was for 400G, and the infrastructure requirements are more demanding. Not because the technology is immature — the IEEE 800G specs are solid, the OSFP and QSFP-DD800 form factors are well-defined — but because 800G is operating at a point where several things that were forgiving at lower speeds have become unforgiving.
|
|
||||||
|
|
||||||
The fiber plant is the first constraint. 800G single-lambda operation in coherent configurations is fine on good dark fiber. 800G parallel optics over multimode — OM5 wideband multimode for the short-reach case — requires infrastructure that most deployed fiber plants don't have. If you're considering 800G SR8, your existing OM3 and OM4 cabling doesn't get you there. OM5 is the multimode fiber specification designed for 850nm and SWDM wavelengths at these speeds, and unless you've been installing it for the last few years, it's not in your building.
|
|
||||||
|
|
||||||
For singlemode at 800G, the OS2 plant that works for 400G DR4 is fine — but the power budget is tighter. 800G over singlemode parallel (OSFP 800G-DR4 and similar) uses eight lanes of 100G each, and the aggregate power consumption means you need 15-25 watts per transceiver factored into your thermal model. At 32 ports on a spine switch, the QSFP density you're accustomed to may require different airflow calculations.
|
|
||||||
|
|
||||||
The real constraint for most teams isn't the transceiver itself. It's the switch silicon.
|
|
||||||
|
|
||||||
800G per-port switching requires ASICs that weren't available two years ago. Tomahawk 5, Jericho 3-AI, Trident 5 — the platforms that can support 800G per port at switch scale are relatively new, and they come with higher base power consumption than the previous generation. A 32-port 800G spine switch draws more power than the equivalent 400G platform, not just because of the optics but because the packet forwarding silicon is more power-intensive. Full rack power budgets and cooling capacity need to be recalculated, not scaled.
|
|
||||||
|
|
||||||
Lead times are the practical bottleneck right now. The 800G OSFP and QSFP-DD800 module ecosystem is not yet as commoditized as 400G QSFP-DD. Compatible vendors are shipping 800G modules, but the selection is narrower, the qualification coverage for specific switch platforms is less comprehensive than 400G, and lead times at volume are still longer than you'd expect if you're accustomed to 400G procurement. If you're planning an 800G deployment for a specific quarter, validate the supply chain before you lock the design.
|
|
||||||
|
|
||||||
The right use of 800G in 2026 is targeted. Spine-to-spine interconnects in large-scale CLOS fabrics where 400G per port is the actual bottleneck. AI cluster backbones where the compute density demands it. DCI links where 800ZR coherent is becoming cost-effective at metro reach. These are real use cases where 800G is the correct answer.
|
|
||||||
|
|
||||||
Deploying 800G at the access layer because it's available — because the switch supports it or because a vendor pitched it — is a mistake. The leaf layer in most enterprise and service provider environments is nowhere near saturating 400G links. 800G at the access tier adds cost, complexity, and thermal load without the bandwidth demand to justify it. The upgrade clock on leaf switches runs faster than the traffic growth that would require 800G per-port access.
|
|
||||||
|
|
||||||
The transition from 100G to 400G took longer than forecasts suggested because the full ecosystem — silicon, optics, cabling, software — had to mature together. 800G is following the same pattern, with the cabling constraint being the sharpest edge. The fiber plant is the long lead item. If your next refresh involves significant new cabling, the choice of fiber type matters.
|
|
||||||
|
|
||||||
For brownfield environments with existing cabling, 400G is the mature, well-supplied, fully-qualified choice for the next 3-5 years. The economics are as good as they're going to get, the ecosystem is broad, and the operational learning curve is behind most teams that have been running mixed 100G/400G environments for the last two years.
|
|
||||||
|
|
||||||
800G is the right answer. For some builds, starting now, it's the right answer today. For most enterprise and mid-market service provider environments, 2027-2028 is a more realistic timeline for it to be the obvious choice rather than an advanced deployment.
|
|
||||||
|
|
||||||
Know which situation you're in before you commit either way.
|
|
||||||
@ -1,38 +0,0 @@
|
|||||||
---
|
|
||||||
title: "OEM vs Compatible Transceivers: The Numbers Nobody Publishes"
|
|
||||||
type: buying_guide
|
|
||||||
target_audience: customer
|
|
||||||
score: 9/10
|
|
||||||
---
|
|
||||||
|
|
||||||
You're building out a new pod. Forty racks, top-of-rack 100G switches, dual-homed uplinks. The BOM lands on your desk. OEM transceivers: $320 each. Compatible from a reputable vendor: $28 each. That's $11,520 vs $1,008 for the same 36 ports. The finance team isn't asking questions. You are.
|
|
||||||
|
|
||||||
The questions are real. The OEM argument isn't crazy — you've seen compatibility issues, you've dealt with lock-in, you've had vendors refuse to troubleshoot when a third-party optic shows up in the DOM output. The compatible argument is also real: those 10x markups are funding someone's yacht, not your infrastructure.
|
|
||||||
|
|
||||||
So here are the actual numbers. Not vendor-provided case studies. Not analyst predictions. The operational data from networks that made both choices.
|
|
||||||
|
|
||||||
The first thing to understand is that "compatible" is not a category. It's a spectrum. At one end you have grey-market no-brand optics that fell off a truck in Shenzhen. At the other end you have optics that were manufactured in the same factories as OEM modules, programmed with the correct vendor-specific EEPROM data, and tested to the same MSA specs. They're not the same product. Treating them as interchangeable is where the "compatible optics cause problems" narrative comes from — it's based on the grey-market end of the spectrum, not the reputable end.
|
|
||||||
|
|
||||||
The EEPROM question is where most OEM FUD focuses. The argument: your switch vendor reads the transceiver's EEPROM data, doesn't recognize the OEM identifier, throws a warning in the syslog, and may refuse to enable the port in some cases. This is true for some combinations — certain Cisco IOS versions on specific platforms will warn or block unrecognized optics unless you configure "service unsupported-transceiver" or the equivalent. Juniper, Arista, Nokia: generally more open by default, though it varies by platform and software version. What nobody tells you is that every reputable compatible vendor programs their optics with the correct OEM-compatible EEPROM identifiers. The switch can't tell the difference. It reads "Cisco Compatible" or the correct Cisco vendor byte and proceeds normally. The argument is a decade old and based on grey-market optics that didn't bother with correct EEPROM programming.
|
|
||||||
|
|
||||||
The warranty conversation is real but not the show-stopper it's presented as. Yes, if a link is down and you have a compatible optic in the port, some vendors will use that as a starting point for the argument that the issue is the optic. This happens. The counter to it is DOM data: if your optic shows healthy TX power, RX within spec, temperature normal, and the vendor's optic on the other end of the same fiber also shows healthy readings, you have the data to push back. Without DOM monitoring, you have no counter-argument. With it, you do. This isn't a compatible-optic problem. It's a monitoring problem.
|
|
||||||
|
|
||||||
The real TCO comparison breaks down like this. Take 100G SR4 QSFP28 as the test case because the numbers are stable and well-documented.
|
|
||||||
|
|
||||||
OEM list price: $280-$400 depending on platform and relationship. Real pricing with volume discount: $180-$220. Compatible from a tier-1 vendor: $22-$35. Compatible from unknown source: $8-$15.
|
|
||||||
|
|
||||||
In a 500-port deployment, the difference between OEM at $200 average and compatible at $28 average is $86,000. That's not the full picture. Add: 40 hours of EEPROM compatibility testing at $200/hour = $8,000. Add: spare parts pool — typically 2% of ports as hot spares, so 10 spares at either price point. Add: the cost of a link failure (which is infrastructure-dependent but averages around $5,600/hour for a tier-2 data center event according to Uptime Institute). The question isn't "are compatible optics reliable?" It's "what's the failure rate difference?"
|
|
||||||
|
|
||||||
Optic failure rates from field data: quality compatible transceivers run at 0.1-0.3% DOA rate from reputable vendors with proper testing. OEM modules run at 0.05-0.15% DOA rate. The gap is real but small. For a 500-port deployment, the difference is 0.5-0.75 additional DOA optics. At $28 each, that's $14-21 in replacement cost. The $86,000 price difference doesn't get consumed by additional failures.
|
|
||||||
|
|
||||||
The failure mode that actually costs money is early-life failure, not DOA. An optic that passes initial testing but fails at six months into production. Here, OEM data is better — they've been tracking it longer and have broader field data. Compatible vendors have improved dramatically in the past five years because the major compatible manufacturers are now the same contract manufacturers that build OEM modules. The factories didn't change. The programming and EEPROM customization is what changed.
|
|
||||||
|
|
||||||
Where compatible optics genuinely struggle: coherent optics and anything at 400G that's not a commodity MSA standard. 400G DR4 is commodity — every decent compatible vendor has it right. 400G LR4 is getting there. 400G ZR is a different story. Coherent optics with DSP-dependent performance characteristics, software interaction with the line card, and performance optimization loops — these are not at the "drop-in compatible" level yet for most platforms. The complexity isn't in the optic hardware. It's in the firmware interaction. Ask specifically: does your compatible vendor have lab-validated performance data on your specific platform and software version for 400G ZR? Not "it works in our lab" — specifically tested on your platform family. If the answer is unclear, buy OEM for ZR until that validation exists.
|
|
||||||
|
|
||||||
For everything else — 100G SR4, LR4, CWDM4, 25G SR, 10G SR/LR, SFP28, QSFP28 — the compatibility argument is settled. These are commodity MSA standards. The interop has been tested thousands of times. The failure rates are comparable to OEM. The EEPROM programming is solved. The only remaining variable is vendor quality and warranty support.
|
|
||||||
|
|
||||||
The buying decision tree is simple. For a new deployment: use compatible for commodity speeds and form factors, OEM for coherent and platform-specific high-performance optics. For an existing OEM-only deployment: replace failed optics with compatible, don't do a wholesale replacement project — the TCO benefit comes at scale and over time, not from a forced migration. For mixed environments: test your specific compatible vendor on your specific platform before committing at scale. This takes four hours, not four weeks.
|
|
||||||
|
|
||||||
What nobody tells you is that most large cloud operators already made this decision. AWS, Azure, Google, Meta: they run a mix, with compatible optics making up a significant percentage of their optical inventory. They didn't publish a press release about it. They published it in their infrastructure cost structures and in the fact that they're not paying OEM list price for 500,000 ports.
|
|
||||||
|
|
||||||
The numbers support compatible optics for commodity standards. The nuance is in coherent, in 400G platform-specific performance, and in choosing your vendor carefully. Buy from manufacturers who will give you test data, EEPROM compatibility documentation, and RMA support. That's a very different product from what's available on the grey market, and treating them as the same is exactly the mistake that keeps OEM markups where they are.
|
|
||||||
@ -1,32 +0,0 @@
|
|||||||
---
|
|
||||||
title: "100G to 400G Migration: What Actually Breaks and Why"
|
|
||||||
type: migration_guide
|
|
||||||
target_audience: technical
|
|
||||||
score: 9/10
|
|
||||||
---
|
|
||||||
|
|
||||||
Every 100G to 400G migration story starts the same way. The planning phase looks clean. The vendor presentations are reassuring. The lab tests pass. Then you push the first production links and something unexpected happens. Not catastrophic — just wrong. The errors you didn't plan for, the connectors that "worked fine" at 100G but don't at 400G, the fiber path you've run for six years that's suddenly marginal.
|
|
||||||
|
|
||||||
This is not a guide about what to buy. It's about what breaks, why it breaks at 400G when it was fine at 100G, and what to verify before you're doing it at 2 AM with traffic on it.
|
|
||||||
|
|
||||||
The first thing that breaks is your assumptions about fiber. 100G SR4 over OM4 has 150 meters of reach. 400G SR8 over OM4 has 100 meters. Your 120-meter cross-connect that's been solid for four years is now out of spec. You didn't change the fiber. You didn't change the topology. You changed the optic and the speed and suddenly the link is marginal. This affects more deployments than people admit. Before migrating any fabric, pull your cable plant documentation and verify every run against the new reach specs. If your cable plant documentation is "we think it's around X meters," measure it.
|
|
||||||
|
|
||||||
The second thing that changes is MPO polarity. This one has ended careers. 100G SR4 uses MPO-12 connectors. So does 400G DR4. But 400G SR8 uses MPO-16. If your migration path goes through an intermediate step — 100G SR4 to 400G SR4 to 400G SR8 — you're changing the connector type. And if you're using breakout cables to connect to servers or legacy switches, the polarity matters. Method B and Method C MPO polarity wiring work differently. An MPO trunk that was working fine with your 100G SR4 might work or might not with your 400G modules depending on the polarity. Test with the actual module before deploying. Don't assume the previous polarity map is valid.
|
|
||||||
|
|
||||||
The loss budget changes significantly at 400G, and this is where most marginal fiber plants get exposed. At 100G LR4 (1310nm, single-mode), you have 6.3 dB of loss budget. The typical link: 2 LC connectors at 0.3 dB each, 10km of single-mode fiber at 0.35 dB/km = 3.5 dB, leaving 2.4 dB of margin. That's fine. At 400G FR4 (same wavelength, same fiber), you have 6.0 dB of loss budget. But FR4 covers 2km, not 10km. If you're doing 400G FR4 over campus fiber with multiple patch panels, you might be at 3-4 dB of connector loss alone plus the fiber run. You don't have the same margin as your old 100G LR4.
|
|
||||||
|
|
||||||
Clean the connectors. I mean actually clean them, not "we cleaned them a few years ago." Dirty fiber connectors account for a disproportionate share of 400G link issues because the power budget margin is tighter. At 100G you were getting away with connectors that add 0.5-1 dB of extra loss instead of the spec 0.3 dB. At 400G, that 0.7 dB extra loss per connector times 4 connectors in a path (4 patch panel connections) is 2.8 dB of unexpected loss. On a path with 2.4 dB of margin, you're over budget before the fiber even enters the picture.
|
|
||||||
|
|
||||||
The fiber type question comes up on every migration and the answer is the same: single-mode for anything over 500 meters, with DR4 for runs up to 500 meters and LR4/FR4 for longer runs. Multi-mode works fine for short-reach 400G within a data center. What doesn't work is trying to push 400G LR4 modules over multi-mode fiber. Not because the optic will fail — it'll launch light just fine. Because the modal dispersion in multi-mode fiber will destroy the signal quality at 400G speeds. The SMF/MMF question was forgiven at 10G, barely workable at 100G in some cases, and not workable at 400G.
|
|
||||||
|
|
||||||
The switch configuration side of 400G migrations has its own landmines. The most common: auto-negotiation behavior changes. At 100G, auto-neg is either on or off and usually works either way. At 400G QSFP-DD, the link training and auto-negotiation process is more complex. Some platforms default to different settings. When you migrate from a 100G switch to a 400G switch at the top-of-rack level, the server NICs that are now receiving 400G signals may not train properly if auto-neg is configured inconsistently. Test the actual NIC firmware on the actual server against the 400G switch you're deploying, not against the vendor's interoperability matrix. That matrix was built in a lab with specific firmware versions that may not match what you're running.
|
|
||||||
|
|
||||||
The breakout question also changes at 400G. 100G switches commonly offered 4x25G breakout from QSFP28 ports. 400G switches offer 4x100G breakout from QSFP-DD ports. If you're connecting legacy 100G servers to a 400G spine, you're either running them on dedicated 100G ToR switches (wasteful) or using breakout cables from the 400G switch. The 4x100G breakout works well when supported. The 2x200G breakout from some platforms is less universally supported in the ecosystem. Know your breakout requirements before committing to a platform.
|
|
||||||
|
|
||||||
The coherent optics question for 400G only applies to specific topologies — DCI, long-haul backbone, anything with DWDM. For data center fabric, the answer is non-coherent: DR4 for intra-DC, FR4 for campus, LR4 for longer campus runs. Coherent 400ZR is for WAN extension over DWDM infrastructure, not for general fabric. If someone is suggesting 400ZR for your data center spine, they're either wrong about the use case or your network has unusual topology requirements.
|
|
||||||
|
|
||||||
The DOM monitoring gap in many networks becomes visible during 400G migration. At 100G, you might have been running without per-port power monitoring because the margins were comfortable. At 400G, you need to know your TX power, RX power, pre-FEC BER, and temperature for every port. Not weekly in a report. In your monitoring system, with alerts. The first 400G link degradation you catch proactively through monitoring will justify the setup time. Without it, you find out from users reporting slow transfers or packet loss.
|
|
||||||
|
|
||||||
The migration sequence that works: start with a single spine-leaf pair, run at full line rate for two weeks before migrating the rest, collect baseline DOM data during that period, identify any outliers in fiber paths or connector quality, fix them before scaling. The migration sequence that creates problems: migrate the whole fabric in a weekend because the maintenance window is approved. You don't know which paths are marginal until the traffic is on them, and "marginal" at 400G can mean intermittent errors instead of clean failures.
|
|
||||||
|
|
||||||
None of this is reason to delay a 400G migration. The technology is mature, the compatible optics are available, the switch ecosystem is solid. The reason to delay is rushing it. The fiber plant surprises are real. The connector cleaning is necessary. The DOM monitoring is non-optional. Do those three things and most 400G migrations are unremarkable. Skip them and you'll remember the migration for years, for the wrong reasons.
|
|
||||||
@ -1,38 +0,0 @@
|
|||||||
---
|
|
||||||
title: "QSFP-DD vs OSFP: The Form Factor War That Already Ended"
|
|
||||||
type: technology_deep_dive
|
|
||||||
target_audience: technical
|
|
||||||
score: 9/10
|
|
||||||
---
|
|
||||||
|
|
||||||
Two years ago you couldn't attend a networking conference without someone asking which form factor would win: QSFP-DD or OSFP. The industry had a genuine split. Arista went one direction, Cisco another, the white-box vendors picked sides, and everyone wrote think-pieces about which would dominate.
|
|
||||||
|
|
||||||
That war is over. The answer is both, and they're not really competing with each other. Understanding why requires understanding what problem each form factor was actually solving.
|
|
||||||
|
|
||||||
QSFP-DD was designed as a backwards-compatible evolution of the QSFP28 form factor. The DD stands for Double Density — it adds a second row of electrical contacts to the existing QSFP connector, enabling 8 electrical lanes instead of 4 while fitting in a cage that's compatible with legacy QSFP28 hosts with minor modifications. The backwards compatibility was the whole point. A QSFP28 100G module fits in a QSFP-DD cage. A QSFP56 200G module fits. A QSFP-DD 400G module fits. This allows a phased migration path on platforms that support it.
|
|
||||||
|
|
||||||
OSFP — Octal Small Form Factor Pluggable — is a clean-sheet design. It's physically larger than QSFP-DD, it has 8 electrical lanes like QSFP-DD, but it does not offer QSFP backwards compatibility. What it offers instead: more thermal headroom and a larger footprint that makes it easier to fit transceivers at higher power levels. An OSFP module can dissipate up to 15W versus QSFP-DD's current practical ceiling of around 12W. For coherent optics and high-performance pluggables that run hot, that headroom matters.
|
|
||||||
|
|
||||||
The practical consequence: QSFP-DD won the data center switching market for 400G. When you look at port counts in modern data center switches — Arista 7060X4, Cisco Nexus 9336C-FX2, Juniper QFX5220 — the dominant form factor for 400G is QSFP-DD. The reason isn't technical superiority. It's port density, backwards compatibility, and price. At $25-35 for a compatible 400G DR4 QSFP-DD versus $35-55 for the equivalent in OSFP, and with QSFP-DD-equipped switches available at lower cost-per-port, the economics drove adoption.
|
|
||||||
|
|
||||||
OSFP won a different segment: 800G. When transceivers need to move 800G over a single pluggable module, the thermal envelope of QSFP-DD becomes a limitation. The DSPs and laser arrays in an 800G optic generate more heat than current QSFP-DD thermal management can handle reliably. OSFP's larger form factor and better thermal design make it the natural home for 800G. The NVIDIA/Spectrum-4 platform and several merchant silicon-based switches targeting AI/ML workloads use OSFP for 800G. This wasn't a surprise. It was designed into the specification.
|
|
||||||
|
|
||||||
At 400G, OSFP exists primarily for coherent optics and for specific platforms where the thermal headroom is necessary. If you're running QSFP-DD 400G ZR+ (coherent, higher power) and your thermal situation is tight, OSFP offers more headroom. For the vast majority of 400G data center deployments — DR4, FR4, SR4, LR4 — QSFP-DD is the right answer, it's available from more vendors, and compatible optics for it are mature.
|
|
||||||
|
|
||||||
The MSA standards work ran in parallel with the form factor adoption, which created confusion. 400G QSFP-DD and 400G OSFP both support the same optical standards: IEEE 802.3bs for 400GBASE-DR4 (400m, SMF, MPO-12), 400GBASE-LR4 (10km, SMF, LC duplex), 400GBASE-SR8 (100m, MMF, MPO-16), and 400GBASE-ZR (OIF, coherent, DWDM). The electrical interface inside the module is what differs — 8x50G PAM4 for both form factors at 400G. The optics themselves are largely the same. This means the choice of form factor is almost entirely a platform decision, not an optical technology decision.
|
|
||||||
|
|
||||||
What will actually matter for 800G in the next two years: OSFP and QSFP-DD800 are both targeting 800G. QSFP-DD800 is the higher-density option — same physical form factor as QSFP-DD, now running 8x100G PAM4 to get to 800G. OSFP also supports 800G. The competition isn't resolved at 800G the way it is at 400G, and the thermal issue is more pressing. For now, 800G deployments are predominantly OSFP-based on the platforms that support it.
|
|
||||||
|
|
||||||
The multi-rate question is where QSFP-DD's backwards compatibility actually pays off operationally. If you're migrating spine switches from 100G to 400G and your current investment is in QSFP28 optics, a platform with QSFP-DD cages lets you run your existing optics in the same ports during the transition. You don't swap everything at once. You migrate one layer at a time. OSFP doesn't give you that. If you pick an OSFP-based platform and you have QSFP28 infrastructure, you're either adding adapters (expensive, lossy) or doing a hard cut.
|
|
||||||
|
|
||||||
The "which form factor for new greenfield" question now has a clear answer. If you're building a new data center fabric and everything is new:
|
|
||||||
For 400G fabric: QSFP-DD. Better price, more vendor options, more compatible optic choices, similar density to OSFP in practice.
|
|
||||||
For 800G fabric: OSFP is the current dominant choice, though QSFP-DD800 platforms are coming.
|
|
||||||
For DCI and coherent optics at 400G and above: OSFP if thermal is a concern, QSFP-DD if it isn't.
|
|
||||||
For mixed-speed environments where backwards compatibility to 100G matters: QSFP-DD is the only sensible choice.
|
|
||||||
|
|
||||||
The form factor decision is now a procurement and lifecycle question, not a technology bet. The argument that you needed to pick a side because one would become obsolete hasn't materialized — both are shipping in volume, both have healthy ecosystems, and the use cases are clearly differentiated. What matters now is vendor support for your specific platform, DOM monitoring compatibility, and compatible optic availability for the standards you're deploying.
|
|
||||||
|
|
||||||
Compatible optic availability is worth checking specifically. 400G DR4 QSFP-DD: broadly available from all major compatible vendors, prices mature and stable. 400G SR8 QSFP-DD: available, slightly less common than DR4 but well-covered. 400G OSFP DR4: available from fewer vendors, prices still at a premium. 800G OSFP: early market, mostly OEM or premium-priced compatible. This availability gap will close in 18-24 months as volume increases, but right now it's a real procurement consideration.
|
|
||||||
|
|
||||||
The war is over. QSFP-DD for data center 400G, OSFP for 800G and coherent high-power applications. Pick your platform based on your use case and your upgrade path, not based on which side you were rooting for two years ago.
|
|
||||||
@ -1,32 +0,0 @@
|
|||||||
---
|
|
||||||
title: "The Transceiver Procurement Checklist Nobody Gave You"
|
|
||||||
type: tutorial
|
|
||||||
target_audience: customer
|
|
||||||
score: 9/10
|
|
||||||
---
|
|
||||||
|
|
||||||
Buying transceivers at scale is one of those processes where the failure mode is invisible until it isn't. The batch that arrived dead. The optics that work fine in the lab but trip a syslog warning in production. The compatible modules that are "400G SR4" but weren't tested on your specific platform. The price that looked right until you factored in shipping, DOA replacement, and the hours spent in TAC calls. This is the checklist that prevents those problems.
|
|
||||||
|
|
||||||
Start before you talk to a vendor. Document your exact requirements in terms that a datasheet can answer. Form factor (QSFP-DD, QSFP28, SFP28, SFP+, OSFP). Speed (10G, 25G, 100G, 400G). Standard (SR4, LR4, DR4, CWDM4, etc.). Switch platform and software version. Required operating temperature range (commercial 0-70°C or extended/industrial). Target fiber type (OM3, OM4, OS2 single-mode). Required reach. Any DWDM requirements including channel plan and center frequencies. This list seems obvious. In practice, half the procurement errors happen because the requirements weren't written down and someone ordered LR4 instead of SR4 or got the wrong fiber type.
|
|
||||||
|
|
||||||
The switch platform and software version matter more than people realize. Compatibility isn't just "QSFP28 100G module in a QSFP28 port." It's "this specific module, in this specific software version, with or without the 'service unsupported-transceiver' configuration flag, on this specific ASIC-based platform." Some platforms check the EEPROM identifier and log warnings. Some block traffic until explicitly authorized. Some have firmware dependencies where a specific software release introduced or fixed a compatibility issue. The answer to "is this compatible?" should come with a tested software version, not just a yes/no. If a vendor can't tell you the specific platforms and software versions they've validated against, that's a red flag.
|
|
||||||
|
|
||||||
The DOA rate question is one you should ask every compatible vendor directly. The answer should be a number, not "very low" or "industry-leading." A credible answer looks like: "Our DOA rate for 100G QSFP28 SR4 over the past 12 months is 0.18%, tracked across batch shipments with our certified test report." If they can't give you a number, their quality control tracking is either nonexistent or they don't want you to know what it is. Good compatible vendors know their DOA rates by SKU because they track it to identify manufacturing batch issues. Expect 0.1-0.4% DOA for quality compatible optics. Anything below 0.1% should prompt the follow-up question of "how many units is that based on?"
|
|
||||||
|
|
||||||
Burn-in testing: ask whether the vendor does it and for how long. Burn-in testing runs modules at elevated temperature under load for 24-72 hours before shipping to catch early-life failures. OEM vendors do this. Tier-1 compatible vendors do this. Tier-2 and grey-market vendors often don't. Early-life failures (the ones that happen in the first 30-90 days) are the most expensive because they cause production incidents. A day of burn-in testing at the factory costs a fraction of an hour of unplanned downtime.
|
|
||||||
|
|
||||||
The EEPROM data question: for compatible optics targeting specific switch platforms (Cisco, Juniper, Arista), does the vendor program the correct vendor-specific EEPROM bytes? This is what enables the switch to recognize the optic without requiring manual configuration changes. A "Cisco-compatible" QSFP28 should have the Cisco vendor identifier and part number encoded in the EEPROM so IOS reads it as a recognized module. Ask for the EEPROM read output for the specific platform you're deploying on. Any credible vendor can provide this. The data looks like a table of register values — it's a standard diagnostic dump, not proprietary.
|
|
||||||
|
|
||||||
The test report question: before accepting any batch of more than 50 optics, request the batch test reports. These are the power output measurements, receive sensitivity measurements, and electrical eye diagram results from the production line. A batch test report shows that the optic meets the MSA specification for its type — that TX power is within range, that eye diagrams are open, that the center wavelengths (for CWDM/DWDM) are accurate. If the vendor provides test reports for every batch, they have a quality process. If they tell you the report isn't available or costs extra, they don't have the infrastructure to provide it consistently.
|
|
||||||
|
|
||||||
Incoming inspection at your site: for any batch over 100 units, plan for incoming inspection before deployment. This means taking a random sample (10% is reasonable) and doing power measurements with an optical power meter against a known-good source. Check that TX power is within spec for every module in the sample. Check that the EEPROM reads correctly. Check that DOM data shows normal values when the module is powered. This takes about two minutes per module with the right setup. For 100 modules at 10% sampling, it's 20 minutes and it will catch bad batches before they go into production.
|
|
||||||
|
|
||||||
Warranty terms vary more than people expect. "Lifetime warranty" is common in compatible optic marketing. Read the fine print: does "lifetime" mean the product lifetime or your network's lifetime? Is the warranty carry-in (you ship it to them) or advance replacement (they ship you a new one first)? What's the shipping cost arrangement? A compatible vendor with advance replacement for DOA modules within 24 hours of RMA approval is worth more than one with a lifetime warranty that requires three business days to process and requires you to ship first. Operational impact is measured in hours.
|
|
||||||
|
|
||||||
Spare parts pooling: decide before deployment how many hot spares you're carrying. For data center fabric, 1-2% of deployed optic count is a reasonable spare pool. For critical links (management plane, out-of-band, power management), carry one-for-one spares. For best-effort infrastructure, lower. Document which SKUs are in your spare pool and where they're stored — this sounds obvious but becomes non-obvious in a multi-site environment where someone is pulling spares from a different site's inventory to cover a production incident.
|
|
||||||
|
|
||||||
The DWDM case is worth calling out separately because the procurement tolerance requirements are more specific. For DWDM transceivers, you need to verify the channel wavelength to ITU G.694.1 grid specifications (typically 50 GHz spacing). The center wavelength should be accurate to within ±0.05nm for 100G coherent, within ±0.15nm for direct-detect CWDM4. Ask for the actual measured center wavelength for each DWDM module, not just the nominal channel. DWDM transceivers are tuned to specific ITU channels and a module that's off by 0.3nm on a tight DWDM system will cause BER problems that look like fiber issues.
|
|
||||||
|
|
||||||
Pre-deployment testing in your actual environment is the last step and the one most often skipped under time pressure. Plug one of each SKU into your actual switch, your actual software version, check for syslog warnings, verify DOM data reads correctly, run link training and BER tests if your platform supports it. This takes two hours and it is the only reliable way to catch compatibility issues before they affect production traffic. The lab validation your vendor did may not match your exact configuration. Test it yourself.
|
|
||||||
|
|
||||||
None of this is complicated. It's all checkboxes that require either asking the right questions or spending a small amount of time on incoming validation. The companies that run optics procurement well aren't doing anything exotic — they have the checklist, they follow it consistently, and they have enough operational data to know which vendors hit their commitments. Start building that data now.
|
|
||||||
@ -1,40 +0,0 @@
|
|||||||
---
|
|
||||||
title: "Coherent vs. Direct Detect: The Decision Your Network Will Make for the Next Decade"
|
|
||||||
type: technology_deep_dive
|
|
||||||
target_audience: technical
|
|
||||||
score: 9/10
|
|
||||||
---
|
|
||||||
|
|
||||||
There's a moment in every network upgrade cycle when you have to decide whether your capacity problem is a data center problem or a WAN problem. That decision determines whether you're buying direct-detect optics or coherent optics, and getting it wrong means buying the wrong technology at the wrong price for the wrong reasons.
|
|
||||||
|
|
||||||
The difference is not complexity — it's physics.
|
|
||||||
|
|
||||||
Direct-detect transceivers convert light directly to electrical signal at the receiver. The transmitter sends intensity-modulated light; the receiver detects whether the light is on or off (NRZ) or determines the amplitude level (PAM4). No phase information, no carrier recovery, no DSP. This is every QSFP28 100G SR4, DR4, LR4 module you've ever touched. The optics are simple. The manufacturing cost is low. The interoperability is near-universal.
|
|
||||||
|
|
||||||
Coherent transceivers transmit using the phase and amplitude of the optical carrier, encoding information in both in-phase and quadrature components, plus two polarizations. At the receiver, a local oscillator laser mixes with the received signal to recover the carrier phase. The DSP processes the electrical signal and recovers the original data. This process — called coherent detection — can be combined with powerful error correction, dispersion compensation, and nonlinear mitigation to achieve spectral efficiency that direct detect can't approach. 400G ZR modules carry 400G over 80km of uncompensated fiber using two polarizations and QPSK modulation. Direct-detect can't do that.
|
|
||||||
|
|
||||||
The reason this matters operationally: coherent optics are not better versions of direct-detect optics. They solve different problems. Using coherent optics for an intra-data-center link is like renting a semi-truck to move a box — you're paying for capability you don't need. Using direct-detect for a 400km terrestrial WAN link is physically impossible regardless of price.
|
|
||||||
|
|
||||||
The decision tree looks like this.
|
|
||||||
|
|
||||||
For links under 10km, within a data center, between adjacent buildings, or on dark fiber with no DWDM: use direct-detect. 10G SR/LR, 25G SR/LR, 100G SR4/LR4/DR4, 400G SR8/DR4/LR4/FR4. The ecosystem is mature, compatible optics are widely available and cheap, and the performance is more than sufficient. No DSP complexity, no coherent penalties, no platform-specific tuning.
|
|
||||||
|
|
||||||
For links over 80km, on DWDM infrastructure, crossing metro or long-haul fiber, or requiring line-side amplification: you need coherent. 100G DWDM pluggables (100G ZR, or older CFP/CFP2 coherent), 400G ZR (OIF implementation agreement, 80km target), 400G ZR+ (extended reach, vendor-specific, 600-2000km). These are platform-specific products. The performance depends on the DSP firmware. Interoperability exists but is tested on specific pairs, not assumed universally.
|
|
||||||
|
|
||||||
The grey zone is 10-80km. This is where the argument is genuinely complex. Direct-detect LR4 covers 10km. Direct-detect ER4 covers 40km. For some metro deployments, that's enough and the simpler optic wins on cost and operational simplicity. For higher spectral efficiency or higher reliability over that range, coherent 400G ZR makes sense. The answer depends on your fiber budget, amplification infrastructure, and whether you're running DWDM.
|
|
||||||
|
|
||||||
The operational complexity difference is real and often understated. A direct-detect link is up or it's down. The DOM data shows TX power and RX power. If both are in spec, the link works. If they're out of spec, you replace the optic. The troubleshooting flow is two steps.
|
|
||||||
|
|
||||||
A coherent link has pre-FEC BER, post-FEC BER, constellation diagrams, OSNR margin, PDL, nonlinear noise figures, phase noise metrics, and DSP convergence state. The link can be "up" — passing traffic — while accumulating correctable errors at a rate that will eventually exceed FEC capacity. The diagnosis requires understanding what normal looks like for your specific fiber plant and traffic load. You need more sophisticated monitoring. You need engineers who understand coherent signal processing, or at least understand which parameters indicate which failure modes.
|
|
||||||
|
|
||||||
That's not an argument against coherent. It's an argument for having the right tooling and people before deploying it at scale. Organizations that deploy 400G ZR without training, without enhanced monitoring, and without documenting baseline OSNR for every span will have incidents they can't diagnose.
|
|
||||||
|
|
||||||
The compatible optic question is different for coherent. For direct-detect 400G DR4, buying compatible is straightforward — the MSA standard is well-defined, the optic is commodity, and every reputable compatible vendor has thoroughly tested it. For coherent, the situation is more nuanced. OIF 400G ZR is a published standard with compliance testing. Compatible 400G ZR modules exist and from reputable vendors they work correctly on tested platform pairs. But the "tested platform pairs" part matters more than it does for direct-detect. A compatible 400G ZR module should be validated on your specific linecard firmware and your specific peer device firmware, not just on the generic platform family. This testing exists — ask for the test matrix.
|
|
||||||
|
|
||||||
The price delta matters at scale. A direct-detect 400G DR4 from a tier-1 compatible vendor: $25-40. A coherent 400G ZR: $500-1200 from OEM, $200-400 from compatible. For a 32-port switch spine, that's $800-1280 vs $6,400-12,800 in optics per box. For DCI, coherent is clearly worth it. For a data center fabric, direct-detect is clearly worth it.
|
|
||||||
|
|
||||||
The upgrade path to 800G also differs by technology. 800G direct-detect (SR8, DR8, LR8 emerging) extends the same physics with more lanes or higher baud rate. 800G coherent (800G ZR, 1.2T extended reach) similarly extends coherent capabilities. The two tracks don't converge. If you're building DCI infrastructure for 400G ZR today, you're on the coherent roadmap for 800G. If you're building data center fabric with DR4, you're on the direct-detect roadmap.
|
|
||||||
|
|
||||||
The decision is not about which technology is better. It's about which technology solves your specific problem. Metro and long-haul WAN: coherent, no compromise. Data center fabric within a campus: direct-detect, no compromise. DCI within a metro at 10-80km: evaluate based on fiber plant, amplification, and DWDM infrastructure. Everything else: the physics tells you which one to use, and the physics doesn't change based on which vendor you're talking to that quarter.
|
|
||||||
|
|
||||||
Buy the right tool. Deploy with appropriate monitoring. Train the people who will operate it. That's the whole framework.
|
|
||||||
@ -1,40 +0,0 @@
|
|||||||
---
|
|
||||||
title: "When to Buy: Reading the Transceiver Price Cycle Before It Reads You"
|
|
||||||
type: market_alert
|
|
||||||
target_audience: sales
|
|
||||||
score: 9/10
|
|
||||||
---
|
|
||||||
|
|
||||||
Every network upgrade has a procurement window. Buy too early and you're paying innovation-phase prices for technology that'll be 40% cheaper in 18 months. Buy too late and supply pressure hits you six weeks before you need to cut over. Understanding where you are in the price cycle for the technologies you're deploying is worth more than any volume discount a vendor will offer you.
|
|
||||||
|
|
||||||
The price cycle for optical transceivers follows a pattern that's been consistent enough across enough generations of technology to be predictable. Not precisely predictable — this isn't a formula, and supply chain disruptions can distort the timing — but the direction is reliable.
|
|
||||||
|
|
||||||
A new optic type launches with limited supply and heavy manufacturing cost. 400G ZR in 2019 was $8,000-12,000 per module. Not because it cost that to make, but because only two manufacturers could make it, and they could charge that. The buyers at that price were building infrastructure that made the economics work at high cost: major cloud providers with very specific capacity requirements, telco operators with constrained fiber paths.
|
|
||||||
|
|
||||||
Then multi-vendor qualification happens. More manufacturers certify to the spec. Compatible vendors commission the same DSP chipset. Volume builds. Price falls to $1,500-2,500. Early enterprise adopters buy here, willing to pay a moderate premium for technology that's now proven.
|
|
||||||
|
|
||||||
Then commodity entry. The price floor for a mature MSA standard transceiver is roughly manufacturing cost plus margin. For coherent DSP-based optics, manufacturing cost is higher than for direct-detect. For simple SR4 with commodity VCSEL arrays, manufacturing cost is very low. When you see prices compress and multiple vendors offering similar pricing, you're at or near the floor. That's where compatible optics thrive — the technology is understood well enough that rigorous testing can verify compliance, and the price advantage over OEM is substantial.
|
|
||||||
|
|
||||||
Right now, in mid-2026, here's where key technology categories sit in this cycle.
|
|
||||||
|
|
||||||
100G SR4, LR4, CWDM4: commodity floor. Multiple manufacturers, widespread availability, stable prices for 3+ years. Compatible optics from tier-1 vendors: $22-35 for SR4, $45-75 for LR4. OEM list price 8-12x more. There's no price catalyst that will materially change this. Buy compatible, buy from someone who can guarantee the EEPROM data and batch test reports.
|
|
||||||
|
|
||||||
400G DR4 (direct-detect): mature, approaching floor. $25-40 compatible, down from $180 two years ago. Still some room to fall — the DSP architecture was simplified and VCSEL lanes are now commodity. By end 2026, $18-25 is realistic for volume buyers. The decision: if you need units now, $25-35 is a good price. If you can wait 6 months, you might save 20%. The risk: supply shortages happen. At $25/module, the inventory carrying cost of buying now is low. Don't wait for marginal savings on commodity optics.
|
|
||||||
|
|
||||||
400G LR4 (direct-detect): still declining. Currently $60-120 compatible, depending on vendor and certification status. ETA to floor: 12-18 months as volume scales. The 10km reach makes this a volume segment for enterprise campus deployments. Buy what you need for current projects; avoid overstocking in anticipation of deployment that's 12+ months out.
|
|
||||||
|
|
||||||
400G ZR (coherent): early descent from peak. OEM at $800-1,200, compatible at $250-500 depending on platform validation status. The pattern suggests floor is around $150-250 in 24-36 months as multi-source manufacturing matures. If you're deploying DCI today, you pay today's price. If your DCI deployment is 12+ months out, there's meaningful savings to capture by waiting. The caveat: ZR is platform-validated, not just spec-compliant, so verify that the compatible you're evaluating has test data for your specific line cards.
|
|
||||||
|
|
||||||
800G OSFP SR8/DR8: innovation phase. OEM $2,000+, limited compatible availability. This is early-adopter pricing. The use case is specific: AI/ML fabric, very high-density pod-scale switching. If you're building this now, it's because you have the workload that justifies the price. If you're evaluating 800G for 18 months from now, prices will be meaningfully lower.
|
|
||||||
|
|
||||||
The supply-side risk factor is currently elevated for specific SKUs. The fab capacity constraints that disrupted 400G supply in 2022-2023 have mostly cleared, but the AI infrastructure buildout is creating localized pressure on high-end pluggables. 800G OSFP and 400G ZR+ are seeing supply-side pressure. Standard 400G DR4 and 100G are not.
|
|
||||||
|
|
||||||
What this means for procurement: the commodity-tier decision (anything 100G, standard 400G DR4/SR4) should be based on project timing and storage logistics, not price speculation. The price isn't going to move enough to justify delayed procurement for active projects. Buy when you need it, from a vendor with consistent supply.
|
|
||||||
|
|
||||||
For premium tiers (400G ZR, 400G ZR+, 800G): the price curve matters more. If the deployment timeline is flexible, the savings from waiting 12-18 months can be 30-50% on the coherent side. If the deployment timeline is fixed, don't wait — but negotiate hard on volume pricing and validate that your supplier has confirmed allocation.
|
|
||||||
|
|
||||||
The other factor nobody discusses: the cost of the wrong decision. Buying early at peak prices is a known, quantifiable cost. The delayed deployment cost — traffic not flowing, capacity not available, customers waiting — is usually much larger. Most procurement teams optimize for the known cost (module price) while underestimating the unknown cost (deployment delay). When in doubt, buy at current prices and deploy on schedule.
|
|
||||||
|
|
||||||
The pricing data we track across 60+ vendors shows price movements in real time. When prices for a specific SKU start dropping across multiple vendors in the same 30-day window, that's a signal that the manufacturing cost has been hit and competition is driving toward the floor. That's the data signal that justifies waiting. One vendor dropping prices while others hold steady is promotional, not structural.
|
|
||||||
|
|
||||||
Watch the data. Know your timeline. Don't buy equipment for a deployment that's 18 months out at today's innovation-phase prices. Don't delay an active deployment to catch a price floor that may not arrive on your schedule. The pricing cycle is predictable in direction, not in timing.
|
|
||||||
@ -1,32 +0,0 @@
|
|||||||
---
|
|
||||||
title: "800G Is Shipping: What's Actually Available and What You Can Deploy Today"
|
|
||||||
type: new_product
|
|
||||||
target_audience: technical
|
|
||||||
score: 9/10
|
|
||||||
---
|
|
||||||
|
|
||||||
When a new generation of optical modules ships, there's a gap between "shipping" and "available for your environment." 800G is shipping. Understanding the difference between what's on a datasheet and what's production-ready for a specific use case takes more precision than the press releases provide.
|
|
||||||
|
|
||||||
The form factors matter first. 800G pluggables come in two physical formats: OSFP (Octal Small Form Factor Pluggable) and QSFP-DD800. These are not interchangeable. OSFP is the larger module, with better thermal headroom — up to 15W versus around 12W for QSFP-DD800. Most 800G switch platforms shipping today use OSFP ports. QSFP-DD800 platforms are arriving in 2026 but are in early deployment. Before ordering 800G modules, verify which form factor your switch requires.
|
|
||||||
|
|
||||||
The speed is achieved through different lane architectures. 800GBASE-SR8 uses eight lanes of 100G each over OM5 or OM4 multimode fiber with MPO-16 connectors. 800GBASE-DR8 uses eight lanes of 100G over OS2 single-mode fiber with MPO-16 connectors at up to 500m. 800GBASE-FR8 extends single-mode reach to 2km over eight wavelengths. Each is a different electrical and optical architecture. An 800G SR8 is not a direct replacement for 800G DR8 — the fiber plant, connector type, and switch port configuration are all different.
|
|
||||||
|
|
||||||
The switch ASICs driving 800G ports are predominantly NVIDIA Spectrum-4 and Broadcom Tomahawk 5 in current deployments. Both support 800G OSFP natively. Arista's 7060X6 series, Cisco Nexus platforms based on Cisco Silicon One G200, and various white-box platforms using these merchant ASICs are where you'll find actual 800G deployments in production today. If you're evaluating 800G, verify the specific ASIC and software release — not just the platform family.
|
|
||||||
|
|
||||||
The application scope is currently narrow. 800G is primarily deployed in two contexts: AI/ML training clusters where GPU-to-GPU bandwidth drives fabric requirements, and hyperscale spine layers where port density and power efficiency at 800G justify the premium. An enterprise data center running SQL databases and web applications has no performance argument for 800G in 2026. The physics works, the economics don't. 400G DR4 at $25-40 per port is the right answer for general enterprise spine until 800G prices reach a comparable level.
|
|
||||||
|
|
||||||
For AI/ML fabric, the math is different. A single DGX H100 system has eight 400G or eight 800G HDR network ports depending on the InfiniBand variant, plus multiple 10/25G management paths. Connecting pods of these systems requires the highest bandwidth-per-rack possible. At 800G with 128-port switches, a single 2RU switch handles the entire port count for a 16-node H100 cluster with full bisection bandwidth. At 400G, you need multiple switches plus additional uplinks. The density argument at AI scale is real.
|
|
||||||
|
|
||||||
Module pricing in mid-2026 for production 800G OSFP SR8: OEM prices from the major platform vendors run $1,500-2,500 per module. Compatible options from tier-1 vendors are appearing at $600-1,200, though the compatible 800G ecosystem is substantially thinner than the 400G compatible ecosystem. Fewer manufacturers, shorter track records, and platform validation that's still being established. The guidance for 800G compatible at this stage is the same as it was for 400G ZR in 2021: ask for the specific platform validation matrix, verify the test data yourself, and start with a pilot before committing to a large batch.
|
|
||||||
|
|
||||||
The fiber infrastructure question is often where 800G hits a wall. 800GBASE-SR8 uses MPO-16 connectors, not MPO-12. If your existing multimode fiber infrastructure uses MPO-12 pre-terminated assemblies — which covers most enterprise MMF installed before 2020 — you cannot plug 800G SR8 modules directly into it. You need either MPO-16 assemblies (expensive, require re-cabling) or 800GBASE-DR8 on single-mode fiber. Many organizations evaluating 800G for existing campus data centers discover this constraint after reviewing the spec sheet, not before. Check your installed fiber and connector inventory before the procurement conversation.
|
|
||||||
|
|
||||||
The DOM monitoring profile for 800G differs from 400G. Eight TX lanes instead of four means eight sets of power readings, per-lane bias current, per-lane temperature data. If your network management system pulls DOM data via SNMP or streaming telemetry, verify it handles the expanded 800G DOM structure. Some older NMS platforms have hardcoded assumptions about QSFP28 lane count that fail silently on 800G modules rather than returning an error.
|
|
||||||
|
|
||||||
Interoperability between manufacturers at 800G is less established than at 400G. IEEE 802.3df standardizes 800GBASE-SR8 and 800GBASE-DR8, but the initial implementations came from a small number of manufacturers and were deployed in validated pairings before broad third-party testing was possible. By mid-2026, the major platform/module combinations are well-characterized. Mixing OSFP modules from different manufacturers on the same switch is generally fine for SR8 and DR8 (both are defined optical specs with clear compliance testing). The edge cases are in 800G coherent and proprietary higher-reach variants, where firmware interaction matters more.
|
|
||||||
|
|
||||||
Power consumption: an 800G OSFP SR8 draws approximately 10-12W per module. A 128-port 800G switch draws 10-12kW just in optics, before the ASIC and platform overhead. Compare to a 32-port 400G switch drawing 1-3kW in direct-detect optics. The power infrastructure requirements for AI fabric are not incremental — they require a different data center PDU density, cooling design, and power delivery architecture. 800G deployments go alongside significant infrastructure investment in power and cooling, not just in the networking layer.
|
|
||||||
|
|
||||||
The roadmap beyond 800G is 1.6T, which is in draft standardization and initial sampling from a handful of manufacturers. The lane architecture moves to 200G per lane, which requires a new generation of DSP and SerDes. Practical deployment of 1.6T is realistically 2027-2028 at the earliest. If you're making 800G decisions today, you're not racing ahead of the curve — you're at the beginning of the mainstream adoption slope, not the bleeding edge.
|
|
||||||
|
|
||||||
The right questions before a purchase: What ASIC and switch platform? What fiber plant — MMF with MPO-12 or SMF? What software version? Is the module vendor on the platform's tested vendor list for your specific linecard? What's the DOA rate history for 800G OSFP from this vendor? Those questions determine whether an 800G deployment is straightforward or expensive.
|
|
||||||
@ -1,32 +0,0 @@
|
|||||||
---
|
|
||||||
title: "Compatible Transceiver Vendors in 2026: Who Does the Testing and Who Just Says They Do"
|
|
||||||
type: competitor_analysis
|
|
||||||
target_audience: sales
|
|
||||||
score: 9/10
|
|
||||||
---
|
|
||||||
|
|
||||||
The compatible transceiver market has a problem that doesn't affect branded markets: every vendor claims the same things. They all say "tested compatible," they all have warranty policies, they all list the same platforms. The difference between a vendor that ships modules that work and a vendor whose modules cause production incidents is invisible in the marketing. It's visible in the data.
|
|
||||||
|
|
||||||
Here's what the data actually tells you about the major vendors operating in the European and North American market in 2026.
|
|
||||||
|
|
||||||
FS.com is the dominant volume player by price and SKU count. For standard direct-detect optics — 10G SR/LR, 25G SR/LR, 100G SR4/LR4/DR4 — their pricing is consistently at or near the market floor. $18-25 for 100G SR4 puts them at the lowest end of the credible compatible market. Their testing documentation is publicly available, their DOA rates for commodity optics are in line with the industry (0.2-0.5%), and their burn-in process for standard optics is documented. The weakness is coherent: FS.com's DWDM and ZR portfolio is less mature, with fewer platform validations published for coherent line cards. For commodity 100G and 400G direct-detect, they are a legitimate choice.
|
|
||||||
|
|
||||||
Flexoptix operates differently from the pure volume players. The business model centers on EEPROM programming as a service — you specify the platform you're deploying on, and the module is programmed with the correct vendor identifier and part number for that switch before it ships. This matters in environments where the switch vendor's software checks EEPROM data and either blocks traffic or logs warnings for unrecognized modules. The price premium over the cheapest market alternatives is typically 20-40% for commodity optics, justified primarily by platform-specific programming and the ability to return modules for reprogramming if the platform changes. For environments running multiple switch platforms that require specific EEPROM data, the programming service has real operational value.
|
|
||||||
|
|
||||||
ProLabs is the enterprise-focused compatible vendor that most closely mirrors the sales model of OEM vendors. They have formal account management, volume pricing programs, and a tested vendor list that covers the major enterprise switch platforms in depth. Their pricing is in the middle of the market — not the cheapest, significantly cheaper than OEM list. The strength is in breadth of platform coverage and documentation quality. If you need tested compatibility data for a Cisco Catalyst 9000 with a specific IOS-XE version, ProLabs is likely to have it documented. They also have advance-replacement RMA programs standard, not optional.
|
|
||||||
|
|
||||||
ATGBICS occupies the mid-market: lower prices than ProLabs, more complete testing documentation than many grey-market resellers. Their 100G SR4 pricing runs in the $25-35 range. Platform testing focuses on the common enterprise switches. Their documentation for niche platforms is thinner. The DOA rate data they publish is at the high end of acceptable (0.3-0.5%) — not alarming, but worth factoring into procurement decisions for large batches.
|
|
||||||
|
|
||||||
10Gtek ships from China directly to buyers worldwide and represents the factory-direct price tier. Their 100G SR4 pricing can go below $15 in volume. The tradeoff is visible in what's absent: minimal platform-specific EEPROM documentation, burn-in testing not standard, DOA rates not published. For non-production environments, lab infrastructure, and test benches where downtime cost is low, the price point is compelling. For production infrastructure where a dead module costs engineering time to replace and diagnose, the savings may not survive the first incident.
|
|
||||||
|
|
||||||
The grey-market tier — unbranded modules from AliExpress, Amazon third-party sellers, and small importers — is a different risk profile entirely. These modules are often manufactured on the same production lines as the branded compatible modules, sometimes even the same physical hardware. The risk isn't necessarily higher failure rates in the first week of operation. The risk is in the unknowns: no DOA data, no burn-in testing documentation, no EEPROM consistency guarantee, no warranty with actual replacement terms. For a single test module, this tier is fine. For 500 modules deployed in production, it's a different calculation.
|
|
||||||
|
|
||||||
The coherent segment has different competitive dynamics. 400G ZR and ZR+ are platform-validated, not just spec-compliant. The platforms that matter — Arista 7160/7280 series, Cisco ASR 9000, Juniper PTX10000 — have specific firmware revisions and linecard combinations that need to be tested. In this segment, the tier-1 compatible vendors (Flexoptix, ProLabs, Coherent/II-VI compatible lines) have done the validation work. The tier-2 vendors generally have not. Buying a $200 400G ZR from a vendor with no published platform test matrix is buying an untested assumption.
|
|
||||||
|
|
||||||
A useful procurement filter: ask for the EEPROM read output for the specific platform you're deploying. A module vendor who can produce a hex dump of the EEPROM as it will be programmed for your platform within 24 hours has the testing infrastructure to support the claim. A vendor who says "we'll just swap it if it doesn't work" has not done the testing. The difference matters at scale.
|
|
||||||
|
|
||||||
The warranty comparison is not a useful differentiator until you read the details. "Lifetime warranty" appears on products from FS.com, ATGBICS, 10Gtek, and most others. The operative question is advance replacement versus return-first. A vendor who ships a replacement before you return the failed unit means a production incident lasts hours, not days. A vendor who requires you to ship first means a day-plus of downtime while you wait for return shipping and replacement processing. ProLabs and Flexoptix standard terms include advance replacement. Most grey-market and AliExpress options do not.
|
|
||||||
|
|
||||||
Practical decision framework: for commodity 100G and 400G DR4/SR4/LR4 in an environment where switch software doesn't require specific EEPROM data, FS.com and ATGBICS represent the best price-to-documentation ratio. For environments with multiple switch platforms requiring specific EEPROM programming, Flexoptix's programming service has operational value that justifies the premium. For coherent 400G ZR in a production DCI environment, use a vendor who can provide the test matrix for your specific platform and linecard. For everything else: get the DOA rate in writing, verify burn-in testing is standard, and test a pilot batch before committing.
|
|
||||||
|
|
||||||
The vendors who publish their DOA rates by SKU, provide batch test reports on request, and have advance replacement as a standard term (not a paid add-on) are the vendors who have built the operational infrastructure to stand behind what they ship. That's not a long list, which is exactly why it's a useful filter.
|
|
||||||
@ -1,22 +0,0 @@
|
|||||||
---
|
|
||||||
title: "The Real Reason Your 400G QSFP-DD Links Fail After Fiber Moves"
|
|
||||||
type: tutorial
|
|
||||||
target_audience: technical
|
|
||||||
score: 9/10
|
|
||||||
---
|
|
||||||
|
|
||||||
Fiber moves break 400G links in ways they never broke 100G links, and the reason is arithmetic, not bad luck. When you pull an MPO-12 connector on a QSFP28 100GBASE-SR4 path, you have roughly 2.6 dB of link margin to absorb whatever contamination you re-introduce. On a 400GBASE-SR4 path using QSFP-DD, that number collapses to around 1.0 dB per the IEEE 802.3bs specification. A single particle of dust on an MPO ferrule face — one that IEC 61300-3-35 classifies as a medium defect in Zone B, meaning the 120-micron annular region around each fiber core — contributes somewhere between 0.3 and 0.8 dB of insertion loss on its own. Do the math: two such particles on a mated pair, and you have consumed your entire margin before you even account for the patch cord, the connector at the switch end, or the six-meter OM4 run between the two.
|
|
||||||
|
|
||||||
The zone classification system in IEC 61300-3-35 becomes far more consequential at 400G precisely because the standard's pass criteria were written with a 10-micron core diameter in mind and a lane count of four operating at 26 Gbps each rather than two lanes at 53 Gbps. Zone A is the 0-to-25-micron radius centered on the fiber core — any scratch or particle here causes maximum insertion loss because the mode field diameter of an OM4 fiber is right around 7.5 micrometers at 850 nm. Zone B extends from 25 to 120 micrometers and is less catastrophic but no longer forgiving at 400G speeds. A connector that passed Zone B criteria comfortably at 100G will often fail an OTDR trace after a fiber move at 400G because the tolerance stack has nowhere left to go.
|
|
||||||
|
|
||||||
The cleaning sequence matters as much as the cleaning tool. Dry-only cleaning sounds efficient but at high-traffic data centers where isopropyl alcohol vapors from adjacent cleaning operations leave residue, it redistributes contamination rather than removing it. The correct sequence is wet-then-dry: a single stroke with an IPA-wetted swab or push-pull cleaner first, followed immediately by a dry stroke before the alcohol carrier evaporates and deposits the dissolved oils back on the ferrule face. One stroke each direction, never circular. On MPO-12 and MPO-16 connectors the push-pull cassette cleaners from Fujikura and Sumitomo perform significantly better than foam swabs because the tape substrate is engineered to capture particles in the 1-10 micron range rather than dragging them laterally across the end face.
|
|
||||||
|
|
||||||
Here is where the diagnostic confusion enters. After a fiber move that introduces contamination at or near the failure threshold, a QSFP-DD module will typically report RX power in DOM that looks plausible — perhaps -8.5 dBm against a receiver sensitivity floor of -9.5 dBm — and the link will come up. Engineers look at that 1 dB of apparent headroom and declare the move successful. What the DOM is not showing is that the RX power figure is a rolling average over a 100 ms to 500 ms window depending on the module vendor's implementation. During normal traffic, the link is marginal. During a burst event, particularly on the guard bands of PAM4 constellation at 53 GBaud where the eye height is already compressed, the actual instantaneous optical power drops below receiver sensitivity and frames are lost. The post-FEC BER counter may look clean because RS-FEC has a correction window measured in codewords and short burst errors disappear into it, but the pre-FEC BER will show elevated symbol errors if the platform exposes it.
|
|
||||||
|
|
||||||
The practice that eliminates callbacks is baseline capture at commissioning. When a 400G path goes live for the first time on clean, freshly installed MPO plant, read the RX power from DOM on every lane at steady state and record it. On QSFP-DD SR4 you have eight lanes. Write those eight values into your CMDB alongside the fiber ID. When a move happens and the link comes back up, the first diagnostic step is not pinging across the path — it is comparing current per-lane RX power against the commissioning baseline. If any lane has dropped by more than 0.5 dB, the connector is contaminated or was not properly seated. At 400G, 0.5 dB is a diagnostic threshold, not a minor variation.
|
|
||||||
|
|
||||||
Connector seating itself is a consistent source of post-move failures that is separate from contamination. MPO connectors have a two-stage engagement where the guide pin engages the guide hole at roughly 6 mm of insertion travel and the ferrule mates with the adapter at approximately 9 mm. It is physically possible to get the connector seated to first-stage engagement — enough to produce a satisfying click and pass a light tug — without reaching the second-stage mated position. At 100G a slightly misaligned MPO often still produces enough optical coupling to bring the link up. At 400G on an OSFP or QSFP-DD SR8 module using an MPO-16 connector, partial engagement regularly produces 3 to 5 dB of excess insertion loss per mated pair, which is a complete link failure, not a marginal link.
|
|
||||||
|
|
||||||
Inspection before reconnection is not optional at 400G and it is not a theoretical recommendation. The standard inspection tool is a 400x fiber scope with an end face analysis capability that applies IEC 61300-3-35 pass/fail criteria automatically. The Viavi FiberChek and AFL Noyes OPM5 series both do this. The scope takes approximately eight seconds per connector face. On a 40-port migration that represents roughly ten minutes of inspection time. The callback that results from skipping that inspection takes a minimum of two hours to diagnose, a truck roll, and the discovery that the answer was a dirty connector — which has been the answer in roughly 60 percent of the 400G post-move failures I have seen documented across multiple operator environments. Inspection is not overhead; it is the fastest path through the change window.
|
|
||||||
|
|
||||||
Ambient particulate density in the data center also shifted the calculus when facilities moved to hot-aisle containment with pressurized cold aisles. Positive pressure in the cold aisle pushes particles outward into the hot aisle, but during a fiber move when a panel is open to both aisles, turbulent airflow can deposit particles on exposed connector faces in under 30 seconds. Dust cap discipline — replacing caps immediately on unmated connectors and keeping the cap on the replacement connector until the moment of mating — is the operational control that makes the difference in environments where the air quality is not controlled to cleanroom standards. Most data centers are not cleanrooms. The ambient particulate count at ISO Class 8, which is a typical raised-floor data center, allows for 3.5 million particles per cubic meter in the 0.5-micron range. A 0.5-micron particle sitting on a Zone A region of an MPO ferrule at 400G is a link event waiting to happen.
|
|
||||||
@ -1,24 +0,0 @@
|
|||||||
---
|
|
||||||
title: "Why DOM Readings Lie: What Your Transceiver Is Not Telling You"
|
|
||||||
type: technology_deep_dive
|
|
||||||
target_audience: technical
|
|
||||||
score: 9/10
|
|
||||||
---
|
|
||||||
|
|
||||||
DOM data is the first place engineers look when a link is misbehaving, and it is frequently the last place they find the actual cause. The problem is not that Digital Optical Monitoring is useless — it is that the values it exposes are proxies for physical conditions, and the relationship between the proxy and the condition breaks down in specific, predictable ways that most engineers never learn because the link usually works and the discrepancy never surfaces. When the link is marginal, those discrepancies become the difference between a correct diagnosis and two hours of misguided troubleshooting.
|
|
||||||
|
|
||||||
Start with the measurement window. SFF-8636 and CMIS specifications define DOM registers as rolling averages over an implementation-defined interval. Most module vendors use windows between 100 ms and 500 ms, but nothing in the standard mandates a specific value, and vendors do not generally publish what window their modules use. What this means in practice is that a burst error event lasting 10 ms — long enough to drop 267,000 frames on a 100G path — produces a transient in instantaneous RX power that may reduce the average register value by less than 0.1 dB. The register reads as completely normal. Meanwhile, the switch's post-FEC counters may also look normal because RS-FEC corrected the burst. The pre-FEC BER counter, if the platform exposes it, will show elevated symbol errors for that 100 ms averaging window and then return to baseline. An engineer looking at DOM thirty seconds after the event sees nothing. The link is declared healthy. The event repeats every few hours at peak utilization.
|
|
||||||
|
|
||||||
TX bias current is the DOM parameter that tells the truth about module aging, and almost nobody monitors it. TX power is what engineers watch, but TX power is actively regulated by the module's automatic power control circuit, which adjusts bias current to maintain a target output level as the laser ages. The result is that TX power remains stable and within spec even as the laser diode degrades, because the control loop is doing its job — right up until the bias current hits the maximum value the driver circuit can supply, at which point TX power collapses. By the time TX power deviates from its nominal value, the module has been in a failure trajectory for months. The bias current trend over time is the leading indicator. A VCSEL-based 25G SFP28 that shipped at 6 mA of bias current and is now running at 14 mA against a maximum alarm threshold of 17 mA has less than a year of life remaining under steady operating temperature. TX power still reads nominal. DOM says the module is healthy.
|
|
||||||
|
|
||||||
Temperature compensation is a specific mechanism that makes thermal alarms misleading on modern modules. QSFP28 and QSFP-DD modules implement a lookup table that adjusts the reported TX power and RX power values based on the measured die temperature, because optical output and receiver sensitivity are temperature-dependent. The compensation makes the power readings appear stable across the module's operating temperature range. What it masks is that a module running at 68°C cage temperature — which is measurable via the temperature register — is operating in a region where VCSEL degradation rate accelerates by roughly a factor of two for every 10°C above 60°C, based on published Arrhenius model data from major VCSEL vendors. The DOM temperature register is not alarmed because 68°C is within the module's specified operating range. The TX power register looks fine because the compensation table adjusted it. The engineer sees no flags. The module is being consumed at twice the rate of a module running at 55°C in a well-cooled cage.
|
|
||||||
|
|
||||||
DOM cannot measure what happens outside the module. This is obvious when stated directly but it is routinely forgotten during troubleshooting. RX power is measured at the photodetector inside the module, after the light has passed through the receiver lens, the wavelength filter, and the mode conditioner on multimode variants. It does not know whether the 0.8 dB of loss between the transmitting module and the receiving module comes from a fiber bend, a dirty connector, a mismatched fiber type, or a partially engaged MPO. It reports a number. The number is correct as a measurement of optical power at that point in the optical path. The interpretation of what caused that power level is entirely left to the engineer, who frequently blames the module when the answer is the connector.
|
|
||||||
|
|
||||||
The RX power low warning threshold in DOM is set by the module manufacturer at the point where the optical link is approaching receiver sensitivity limits. On a QSFP28 100GBASE-LR4 module that value is typically around -11 dBm against a receiver sensitivity of -13.5 dBm. An RX power reading of -11.5 dBm triggers a warning, and the instinct is to replace the transceiver. But the relevant question is whether the -11.5 dBm represents a degraded module or a degraded fiber path. If the module was receiving -9.5 dBm at commissioning and now receives -11.5 dBm, 2 dB of loss has appeared somewhere in the path. Fiber loss does not spontaneously increase over time unless something physical changed — a bend radius violation introduced during a cable tray reorganization, connector contamination, or physical damage to the patch cord. The DOM reading did not change inside the module. The fiber changed. A correct diagnosis requires comparing current DOM values against commissioning baselines, not against the manufacturer's alarm thresholds.
|
|
||||||
|
|
||||||
The correct way to use DOM data involves understanding which registers have physical meaning and which are derived or estimated. The temperature register is a direct measurement from a thermistor on the module substrate — it is the most reliable DOM value. The TX bias current register is a direct measurement from the driver circuit — it is the best aging indicator. The TX power register is measured at the laser's monitor photodetector and is generally accurate but is affected by the APC loop. The RX power register is measured at the receiver photodetector and is accurate but is a local measurement at the end of the optical path, not a characterization of the path itself. Voltage supply registers are accurate and useful for identifying power rail problems on the line card. The supply voltage dropping below 3.2V on a nominal 3.3V module is a real failure indicator that shows up in DOM before any optical parameter deviates.
|
|
||||||
|
|
||||||
Flexoptix EEPROM programming makes it possible to reconfigure module alarm and warning thresholds to match the actual optical power budget of the specific deployment rather than the generic thresholds the manufacturer ships with. A module deployed on a 15 km LR4 path with 2.5 dB of measured fiber loss and 4.5 dB of margin has very different appropriate alarm thresholds than the same module on a 2 km path with 0.8 dB of loss and 6.2 dB of margin. Platform-specific programming also ensures that the DOM data appears correctly in the management plane of the target switch platform, which matters because some platforms apply alarm masks differently depending on the vendor ID in the module EEPROM. Generic modules from the field sometimes have alarm thresholds set to the absolute minimum the standard requires, which generates false alarms on healthy links and trains engineers to ignore DOM warnings — which is exactly the behavior you do not want when a real marginal link appears.
|
|
||||||
|
|
||||||
The engineers who get the most diagnostic value from DOM are the ones who treat it as a trending tool rather than an instantaneous health indicator. Polling TX bias current and cage temperature weekly, graphing the trends over months, and setting actionable thresholds based on those trends rather than on the manufacturer's alarm register gives you actual predictive value. A bias current that has increased by 20 percent over six months on a module that is eighteen months old is a replacement candidate at the next maintenance window, not when the link fails at 3 AM on a Tuesday.
|
|
||||||
@ -1,24 +0,0 @@
|
|||||||
---
|
|
||||||
title: "800G SR8 vs DR8 vs FR8: Which One Actually Fits Your Build"
|
|
||||||
type: comparison
|
|
||||||
target_audience: technical
|
|
||||||
score: 9/10
|
|
||||||
---
|
|
||||||
|
|
||||||
The 800G optic decision is not primarily a reach decision, even though reach is the first thing vendors lead with. The reach requirements for a given application tier are usually unambiguous — spine-leaf within a single data center hall, DCI across a campus, or long-haul metro interconnect — but the infrastructure consequences of choosing SR8, DR8, or FR8 extend well beyond the distance question and into fiber plant compatibility, thermal density, power draw at scale, and 2026 price points that vary by a factor of more than two across the three variants. Getting the variant wrong does not just mean suboptimal cost; it means purchasing optics that are incompatible with existing infrastructure or that require a fiber plant overhaul that costs ten times more than the optic savings.
|
|
||||||
|
|
||||||
OSFP 800GBASE-SR8 uses eight 50G-per-lane VCSELs operating at 850 nm over OM4 or OM5 multimode fiber with MPO-16 connectors. The IEEE 802.3df standard specifies a maximum reach of 50 meters on OM4 and 100 meters on OM5. Those numbers look like limitations until you measure the actual port-to-port distances in a spine-leaf fabric built within a single data center module or hall. A 2,000-square-meter data center hall with a 16-row server pod layout and top-of-rack switches connecting to a row of spine switches typically has maximum optical path lengths of 30 to 45 meters including patch panel hops. SR8 covers that topology with margin. The module itself draws approximately 9 to 11 watts, and in 2026 market pricing is running between $799 and $999 per unit for compatible modules, with OEM pricing from major switch vendors landing at $1,400 to $1,800 depending on platform. SR8 also benefits from VCSEL manufacturing maturity — the same base technology that produced hundreds of millions of SFP+ SR and QSFP28 SR4 modules. Yield rates are high and prices will continue to decline predictably.
|
|
||||||
|
|
||||||
The critical infrastructure requirement that disqualifies SR8 for many deployments is multimode fiber. Data centers built in 2010 through 2018 that standardized on OS2 single-mode throughout — a common choice for cost and simplicity, eliminating the fiber type management problem — cannot use SR8 without recabling or installing OM4/OM5 trunk infrastructure specifically for the 800G tier. This is not a trivial undertaking. A 40-rack pod retrofit with OM5 MPO trunk cables and patch panels runs $15,000 to $25,000 in materials alone, plus labor. Against SR8 optic savings of $400 to $800 per port versus DR8, the breakeven point is 20 to 60 ports, which is within range for a 400-port spine deployment but not for smaller builds. Teams that inherited single-mode plant should default to DR8 or FR8 without running the numbers on multimode retrofit.
|
|
||||||
|
|
||||||
OSFP 800GBASE-DR8 operates over single-mode OS2 fiber using eight 100G-per-lane PAM4 signals at 1310 nm, with an MPO-16 connector and a reach of 500 meters. The reach figure matters less than it appears for intra-DC spine-leaf — 500 meters is far more than any within-building run — but it becomes the enabling specification for campus-scale interconnects where buildings are 200 to 400 meters apart and single-mode is already present. DR8 draws approximately 12 to 14 watts and in 2026 is priced at $1,200 to $1,500 for compatible modules. The power penalty relative to SR8 is real but not decisive at the switch level; a 48-port OSFP switch chassis running a mix of SR8 and DR8 will see a difference of roughly 150 watts in full-load power draw, which is meaningful at scale but not a redesign-forcing constraint for most operators.
|
|
||||||
|
|
||||||
The connector geometry of DR8 on single-mode creates a significant operational difference compared to SR8. MPO-16 on single-mode requires APC polished connectors and strict attention to polarity. An MPO-16 APC connector that is mated incorrectly — flipped 180 degrees, which is physically possible in the dark interior of a cable tray — will produce approximately 25 to 30 dB of insertion loss, which is a complete link failure with no ambiguity. Field crews familiar with MPO UPC on multimode sometimes make this mistake when they transition to single-mode APC plant for the first time, and the resulting troubleshooting session is always educational. Labeling both connector ends with polarity indicators and requiring inspection before mating is the operational discipline that prevents it.
|
|
||||||
|
|
||||||
OSFP 800GBASE-FR8 uses eight 100G-per-lane PAM4 signals at 1310 nm with LC duplex connectors rather than MPO-16, and specifies a reach of 2 kilometers over OS2 single-mode. The LC connector is a meaningful practical difference. Every data center has patch panels populated with LC duplex adapters, and field technicians have worked with LC connectors for twenty years. The per-connector cleaning procedure is well-understood, the inspection tools are widely available, and polarity errors are far less common because LC simplex orientation is visually obvious. The tradeoff is that FR8 requires eight pairs of LC duplex fibers — effectively 16 fibers per link — which at the patch panel means 16 LC ports per 800G connection versus a single MPO-16 port for SR8 or DR8. At a 128-port spine switch, that is 2,048 LC ports on the fiber side if the entire switch is deployed with FR8, which is a legitimate structured cabling challenge.
|
|
||||||
|
|
||||||
FR8 pricing in 2026 sits at $1,800 to $2,200 for compatible modules and upwards of $3,000 for OEM variants on high-margin platforms. The reach capability goes to 2 km, which makes FR8 genuinely relevant for DCI between buildings on a campus or between co-located data center modules in a carrier hotel where the physical separation makes SR8 and DR8 insufficient. For spine-leaf within a building, paying the FR8 premium for 2 km reach when 50 meters or 500 meters is all that is used is a straightforward cost optimization failure. It happens regularly when procurement teams specify the highest-performing variant across all applications to simplify SKU management, at a cost of $800 to $1,200 per port over what the application actually requires.
|
|
||||||
|
|
||||||
The VCSEL versus EML laser technology distinction has downstream operational implications beyond insertion loss characteristics. SR8 VCSELs do not require thermo-electric cooling and consume less power under partial load because VCSEL current draw tracks utilization more closely than EML. DR8 and FR8 use EML transmitters at 1310 nm, which have a flatter power consumption curve and draw close to rated power whether the link is at 10 percent or 90 percent utilization. In a spine-leaf fabric where most links run at 20 to 40 percent average utilization, this makes SR8 meaningfully more efficient in actual deployment versus nameplate power. Power at scale is not a minor consideration: a 64-spine node fabric with 64 OSFP ports each saves approximately 2 watts per port with SR8 versus DR8, totaling 8,192 watts of continuous saving, which at $0.10 per kWh and a typical PUE of 1.4 is roughly $10,000 per year in operating cost reduction.
|
|
||||||
|
|
||||||
The decision framework reduces to three deterministic questions. Does the existing fiber plant support multimode OM4 or OM5 at the required path length? If yes, SR8 is the cost-optimal choice for intra-DC spine-leaf. If the plant is single-mode, does the reach requirement exceed 500 meters? If yes, FR8 is required. If the reach is under 500 meters and operational preference is for MPO-16 high-density patching, DR8 is correct. If operational preference is for LC duplex patching and reach is under 2 km, FR8 is correct. The answer to those three questions, applied consistently, eliminates the variant selection problem for the vast majority of deployments without requiring detailed cost modeling.
|
|
||||||
@ -1,24 +0,0 @@
|
|||||||
---
|
|
||||||
title: "Cleaning Fiber Connectors at 400G: The Tolerance Has Shrunk"
|
|
||||||
type: tutorial
|
|
||||||
target_audience: technical
|
|
||||||
score: 9/10
|
|
||||||
---
|
|
||||||
|
|
||||||
The cleaning procedures that kept 10G and 40G networks running without incident are not adequate for 400G, and the reason is embedded in the physics of the optical path rather than in any procedural preference. IEC 61300-3-35, the international standard that defines pass/fail criteria for fiber connector end face quality, uses zone-based defect classification that was developed against 9-micron single-mode and 50-micron multimode core diameters, but the power budget mathematics changed substantially when 400GBASE-SR4 and 400GBASE-DR4 began shipping at scale. The standard itself has not been replaced, but the practical consequence of a borderline Zone B defect at 400G is qualitatively different from what it was at 100G.
|
|
||||||
|
|
||||||
Zone A, the 0-to-25-micron radius around the fiber core center, is the region where any contamination causes maximum insertion loss because the Gaussian mode field of a 50-micron OM4 fiber at 850 nm is concentrated within approximately 7.5 micrometers of the center. A single 5-micron particle of carbon debris from a rubber dust cap — a particle size that falls well below the sensitivity of most handheld inspection scopes running at 200x magnification — sitting directly on Zone A will scatter or absorb a portion of the transmitted mode, contributing 0.2 to 0.5 dB of insertion loss per connection. At 100GBASE-SR4 with a typical link margin of 2.5 dB, one such particle on one connector leaves 2.0 to 2.3 dB of margin for the rest of the optical path. At 400GBASE-SR4 with a typical margin of 1.1 dB, the same particle consumes between 18 and 45 percent of total link margin at a single connection.
|
|
||||||
|
|
||||||
MPO ribbon connectors compound the contamination risk because the ferrule end face contains 12 or 16 individual fibers in a precision-aligned array, and each fiber has its own Zone A and Zone B region. A single push-pull cleaning stroke that captures particles from the external edge of the ferrule and redistributes them toward the center — which is exactly what happens when a dry-only cassette cleaner is used on a connector that has not been pre-wet — can contaminate multiple Zone A regions simultaneously. The math for a 12-fiber MPO is that a single contaminated fiber lane at 400GBASE-SR4 will absorb that lane's optical power margin and potentially drop the lane below receiver sensitivity, causing the DSP to declare a lane failure and the QSFP-DD module to assert loss of signal on that lane. On a PAM4 implementation where all eight lanes must be operational for the link to remain up, a single dirty fiber in an MPO-12 terminates the link completely.
|
|
||||||
|
|
||||||
The wet-then-dry sequence is the minimum correct cleaning procedure for MPO connectors at 400G, and the specific IPA formulation matters. Optical-grade isopropyl alcohol at 99 percent purity or above is the correct choice. Drugstore IPA at 70 percent is 30 percent water, which leaves mineral residue as it evaporates — residue that under a 400x scope looks like a translucent film across the Zone B region and contributes 0.1 to 0.3 dB of loss. The wet stroke should use a fabric or lint-free polyester tape substrate, not foam, because foam compresses against the ferrule face and can leave microfibers that are nearly invisible at 200x magnification but clearly visible at 400x and above. One wet stroke, one dry stroke, then inspect. Not two wet strokes and a dry — the second wet stroke on a connector that is already partially clean can introduce fresh contamination from the solvent carrier.
|
|
||||||
|
|
||||||
Visual inspection with a handheld fiber scope at 200x catches contamination larger than approximately 15 to 20 micrometers, which corresponds to a medium defect in IEC 61300-3-35 Zone B classification. That is useful as a rough screen but insufficient for commissioning 400G links. A 400x scope with automated end face analysis — tools like the Viavi FiberChek Pro or the AFL Noyes FIS-series — applies the zone classification automatically and gives a pass/fail verdict based on the actual IEC criteria. The difference in what each tool reveals is not academic: in a 2023 field study published by Corning that examined MPO connector quality across 400G deployments, 34 percent of connectors that passed visual inspection at 200x failed the automated IEC 61300-3-35 analysis at 400x due to Zone A scratches and submicron particle contamination. Connectors shipped from the factory inside sealed bags sometimes fail inspection because the dust cap sheds silicone particles during removal if the cap is twisted rather than pulled straight back.
|
|
||||||
|
|
||||||
Production scenarios where clean-looking connectors failed are not rare edge cases. A hyperscaler expansion project that deployed 800 QSFP-DD SR4 modules across two new data center halls in 2022 had a post-installation failure rate of approximately 11 percent on initial power-up, where failure was defined as one or more lanes reading below the receiver sensitivity floor in DOM. Investigation found that 73 percent of those failures were traceable to MPO connector contamination despite the fact that the installation team had used cassette cleaners on every connector before mating. The root cause was dry-only cleaning on connectors that had been pre-contaminated during transit with the dust caps improperly seated. After switching to wet-then-dry cleaning on all connectors and implementing mandatory 400x inspection before mating, the post-installation failure rate dropped to under 1 percent.
|
|
||||||
|
|
||||||
The inspection procedure itself has a defined sequence for MPO connectors that differs from LC and SC single-fiber inspection. Both the plug side and the adapter side of every mated pair must be inspected. Inspecting only the plug is equivalent to cleaning one side of a glass and calling it clean — particle transfer from the uncleaned adapter side to the cleaned plug during mating is the mechanism behind roughly 40 percent of post-cleaning failures. The adapter side inspection requires a probe-style scope that can reach into the adapter body without disturbing the alignment sleeves. Ferrule geometry verification — checking that the ferrule does not protrude or recess beyond the IEC 61300-7-7 specification of 0 to 250 nanometers — is not routinely done in the field but becomes relevant when a connector fails inspection repeatedly despite correct cleaning, indicating a physical ferrule defect rather than contamination.
|
|
||||||
|
|
||||||
For deployments where speed of execution is a real constraint, the practical answer is not to skip inspection but to build inspection into the work cell. Having an inspection scope at the patch panel position rather than on a separate cart eliminates the step of bringing the connector to the tool. Inspection with a modern automated scope takes 8 to 12 seconds per face. A technician cleaning and inspecting a 48-fiber MPO rack unit — 24 MPO-12 adapters, 48 connector faces — completes the work in approximately 10 minutes. The same 48-fiber section failing after a 400G migration and requiring a trouble ticket, a second site visit, and a root cause analysis takes a minimum of four hours of billable labor. The inspection overhead pays for itself on the first link that would otherwise have failed.
|
|
||||||
|
|
||||||
The zone classification criteria that IEC 61300-3-35 uses for single-mode connectors in Zone A specify no defects or contamination larger than 3 micrometers. For multimode OM4, the Zone A limit is more generous at 10 micrometers, but 400G implementations on multimode are sensitive enough that operating at the IEC multimode limit with fresh connectors leaves no margin for accumulated contamination over the lifetime of the installation. Commissioning standards that require zero detectable contamination in Zone A — stricter than the IEC floor — are operationally justified for 400G infrastructure and represent best practice rather than overkill.
|
|
||||||
@ -1,24 +0,0 @@
|
|||||||
---
|
|
||||||
title: "Intermittent 100G Link Drops: The Temperature Problem Nobody Talks About"
|
|
||||||
type: tutorial
|
|
||||||
target_audience: technical
|
|
||||||
score: 9/10
|
|
||||||
---
|
|
||||||
|
|
||||||
Intermittent link drops on 100G infrastructure have a specific failure signature that distinguishes them from every other cause: they correlate with time of day, not with traffic load, and they disappear entirely after a chassis reboot or when the data center HVAC cycles on. Most engineers, when they encounter this pattern, spend the first several hours pursuing the wrong suspects — firmware bugs, cable faults, module incompatibility — because the temperature relationship is not obvious until you overlay the link event log against the thermal data from the same time window. Once you see the correlation, it is unmistakable, and the subsequent repair is usually inexpensive. Getting to that correlation requires knowing what to look for.
|
|
||||||
|
|
||||||
QSFP28 modules operating at 100G use either a VCSEL array at 850 nm for SR4, or direct modulation DFB lasers at 1310 nm for LR4 and CWDM4. Both laser types have optical output power that is temperature-dependent. VCSELs typically have a negative temperature coefficient for threshold current and differential efficiency — as temperature increases, threshold current rises and differential efficiency (slope efficiency, measured in mW/mA) falls, meaning the laser requires more drive current to produce the same output power. DFB lasers used in LR4 and CWDM4 have an additional wavelength drift characteristic of approximately 0.1 nm per degree Celsius, which in a multiplexed CWDM4 system can cause channel crosstalk if the wavelength drifts sufficiently toward an adjacent CWDM grid slot.
|
|
||||||
|
|
||||||
The automatic power control circuit in the module compensates for temperature-induced output variation by adjusting TX bias current, which is why TX power in DOM typically reads stable even as cage temperature rises. The problem occurs when the cage temperature reaches the upper region of the QSFP28 operating range and the APC loop reaches its maximum bias current output. At that point, the loop can no longer maintain output power, TX power begins to drop below nominal, and if the optical path already had limited margin — a slightly long or attenuated fiber run, a marginally contaminated connector — the receiving module's RX power drops below its sensitivity floor. The link drops. Within a few minutes, the APC loop state is reset by the module's transient recovery behavior, or the ambient temperature cycles slightly downward, and the link comes back. The event log shows a single link drop of 45 seconds to several minutes. Repeat after a few hours.
|
|
||||||
|
|
||||||
The HVAC cycle correlation appears because most facility HVAC systems run on a setpoint control loop that allows the hot aisle temperature to rise 4 to 6°C above the setpoint before the cooling stage engages, then overshoots to 2 to 3°C below setpoint before the cooling stage cuts off. In a hot-aisle-contained pod with a setpoint of 35°C, the actual hot aisle temperature may cycle between 30°C and 41°C over a 20 to 40 minute period. A QSFP28 module in a rear-facing optical port on a chassis in the hot aisle sees cage temperatures that track this cycle with roughly a 5 to 10 minute thermal lag. If the module's marginal operating point is around 38°C cage temperature, it will fail intermittently twice per HVAC cycle and appear fine the rest of the time.
|
|
||||||
|
|
||||||
The DOM data that confirms this diagnosis is straightforward to extract if you know what to read. The temperature register in SFF-8636 reporting is the module die temperature, which is approximately 5 to 8°C above the cage inlet temperature for modules under full electrical load. A cage temperature of 38°C from the chassis thermal sensor corresponds to a module die temperature of roughly 43 to 46°C. The TX bias current register will be at or near its maximum alarm threshold — typically 15 to 17 mA for a 25G VCSEL lane — during the failure period. TX power, if the module is still in the APC recovery zone, may show a reduction of 0.5 to 1.5 dB below baseline. RX power on the far end will show a corresponding reduction. If you poll these registers at 60-second intervals over a 4-hour window that includes a suspected failure event, the temperature, bias current, and power traces will clearly show the thermal marginal behavior. The event log timestamp will fall within the period where temperature is at its peak.
|
|
||||||
|
|
||||||
The RX power alarm threshold is what most engineers watch, but the action threshold for thermal-marginal links should be the TX bias current high alarm on the transmitting module, not the RX power low alarm on the receiving module. The TX bias current approaches its maximum before TX power degrades to the point where RX power alarms trigger on the far end. Setting a custom high warning threshold on TX bias current at 80 percent of the alarm value — typically around 12 to 13 mA on a 25G VCSEL lane — gives approximately 30 to 60 minutes of advance warning before the link becomes marginal. This is a threshold adjustment that Flexoptix EEPROM programming can apply to deployed modules when the platform supports custom alarm threshold configuration through MDIO or I2C access.
|
|
||||||
|
|
||||||
The HVAC cycle test is the definitive confirmation of thermal root cause when the failure history is ambiguous. With access to the facility management system, read the return air temperature at the CRAC unit that serves the affected pod at one-minute intervals. Simultaneously poll module temperature, TX bias current, and RX power at the same interval. If the link events align with the hot peaks of the HVAC cycle — not with traffic peaks, not with spanning tree events, not with switch CPU load — thermal root cause is confirmed. This test takes four to six hours to produce unambiguous data, but it eliminates every other hypothesis simultaneously and directs remediation to exactly the right intervention.
|
|
||||||
|
|
||||||
Remediation options are ordered by cost and disruption. The least disruptive option is increasing the cooling setpoint margin so the hot aisle temperature does not reach the module's marginal operating point — but this requires coordination with facilities and may impact adjacent equipment. Moving the affected chassis to a lower-temperature position in the rack — modules run cooler in the front half of the rack compared to the rear — is often feasible without a maintenance window and can reduce cage temperature by 3 to 5°C on its own. Cleaning the chassis air filter, which on a Cisco Nexus 9300 or Arista 7280 can restrict airflow enough to raise cage temperature by 4 to 8°C when heavily loaded with particulate, is a maintenance action that frequently resolves thermal-marginal link problems at no cost. Module replacement is the last resort and is only warranted when the module's operating range is genuinely insufficient for the deployment environment, which in a correctly designed data center should be rare.
|
|
||||||
|
|
||||||
Night-time failure patterns that coincide with reduced occupancy, lower IT load, and HVAC setback cycles are a distinct thermal failure mode. Some facilities programs reduce cooling output during off-peak hours based on occupancy or IT load projections, and the modules that were operating with a few degrees of thermal margin during business hours become marginally operational at 3 AM when the cooling capacity is reduced. The on-call engineer who gets paged at 2:47 AM for a flapping 100G link in an otherwise stable environment, who cycles the interface and watches it recover, who closes the ticket as "interface reset," has just papered over a thermal problem that will recur on the next HVAC setback cycle. The correct action is to poll DOM temperature data before clearing the alert and correlate with the facility thermal schedule.
|
|
||||||
@ -1,22 +0,0 @@
|
|||||||
---
|
|
||||||
title: "How to Validate Compatible Optics Before They Go Into Production"
|
|
||||||
type: tutorial
|
|
||||||
target_audience: technical
|
|
||||||
score: 9/10
|
|
||||||
---
|
|
||||||
|
|
||||||
The phrase "plug it in and see if it works" is a validation methodology that functions adequately when the power budget is generous, the link is non-critical, and the failure mode is a clean link-down that shows up immediately in monitoring. At 100G and above, none of those conditions reliably hold. A marginal link at 400G can pass traffic at low utilization, pass a ping, appear healthy in DOM, and fail intermittently at 70 percent utilization when the module's thermal floor rises and the optical eye closes at the edges of the PAM4 constellation. Testing by observation after production cutover is not validation — it is gambling with a delayed outcome.
|
|
||||||
|
|
||||||
Proper validation starts before the module arrives in the data center. The first step is EEPROM verification against the target platform. Every major switch platform — Arista, Cisco NX-OS, Juniper Junos, Nokia SR OS — reads a subset of the module EEPROM fields to determine whether to enable the module or present it with a warning or error state. The relevant fields are: vendor name (bytes 148-163 in SFF-8636), vendor part number (bytes 168-183), vendor serial number (bytes 196-211), and the identifier bytes that describe module type and capabilities. On Cisco NX-OS in its default configuration, a module that does not present a recognized vendor ID will raise a "Transceiver is unsupported" warning and, depending on the platform and configuration, may refuse to enable the interface. On Juniper Junos the behavior is typically a syslog warning without suppression, but on EX and QFX platforms the optics qualification database can reject modules entirely if the vendor ID does not match a known entry.
|
|
||||||
|
|
||||||
Flexoptix EEPROM programming addresses this systematically by writing the platform-specific vendor ID, part number, and qualification strings to the module EEPROM before deployment. The result is that the module presents correctly to the platform as a qualified variant, enabling the interface without operator intervention and ensuring that DOM data surfaces through the management plane without masking. This is not counterfeiting — the optical parameters programmed into the EEPROM match the module's actual physical specifications, and the module is not representing capabilities it does not have. It is platform compatibility encoding, analogous to installing the correct driver for a hardware peripheral rather than using a generic driver that limits functionality.
|
|
||||||
|
|
||||||
The 48-hour BER soak test is the validation step that filters out latent defects that are not visible in EEPROM inspection or short-duration power testing. The procedure is to deploy the module in a test chassis under full electrical load — meaning the module should be in an active link carrying real traffic, not just powered up with no optical connection — at the target operating temperature for a minimum of 48 hours. Measure pre-FEC BER at the beginning of the soak and at 12-hour intervals. A healthy 100G QSFP28 module operating on a clean optical path should produce a pre-FEC BER below 1e-5 continuously. A pre-FEC BER that starts at 1e-5 and rises to 3e-5 by the 36-hour mark is a module that is warming into a failure trajectory. RS-FEC will correct these errors at that rate — the post-FEC BER counter will read zero — but the module's effective remaining margin is declining and it will fail when environment or optical conditions worsen.
|
|
||||||
|
|
||||||
DOM baseline capture is the commissioning step that makes all subsequent troubleshooting faster and more accurate, and it takes approximately five minutes per module if the polling infrastructure is in place. After the 48-hour soak, at steady-state operating temperature, record the following values for each module and store them in the CMDB alongside the device, slot, and fiber path identifiers: TX power per lane, RX power per lane, TX bias current per lane, cage temperature, supply voltage, and the alarm and warning threshold values for each parameter. These baseline values define what "healthy" looks like for this specific module in this specific installation. All subsequent comparisons are made against these baselines, not against the generic manufacturer thresholds. A TX bias current that reads 7.2 mA at baseline and reads 10.8 mA twelve months later has increased by 50 percent — that is a leading indicator of laser aging regardless of whether 10.8 mA is below the manufacturer's warning threshold of 13 mA.
|
|
||||||
|
|
||||||
Power budget verification is a calculation step, not an observation step, and it must happen before the module goes live rather than after. The inputs are: TX launch power from the module datasheet (typically a range, use the minimum for conservative calculation), fiber type and length, insertion loss per connector pair from measured OTDR or inspection data, number of mated pairs in the path, and RX sensitivity from the module datasheet (use the minimum sensitivity, maximum input power, and the specific power budget limits defined in the standard). For a 400GBASE-DR4 link, the IEEE 802.3bs budget is a maximum channel insertion loss of 6.0 dB, which includes the fiber attenuation of approximately 0.31 dB/km at 1310 nm on OS2, plus connector losses. With 500 meters of fiber contributing roughly 0.16 dB and each mated connector pair contributing 0.3 to 0.5 dB, a path with four connector pairs (switch port, patch panel in, patch panel out, switch port) consumes 1.2 to 2.0 dB in connectors alone, leaving 3.84 to 4.64 dB of budget for fiber. On paper the link has positive margin. Add two dirty connectors contributing 0.5 dB each above the clean-connector assumption, and the margin has shrunk by 1.0 dB. Add temperature-induced TX power reduction of 0.5 dB and the path is at the IEC specification limit with no remaining margin.
|
|
||||||
|
|
||||||
The connector aging factor is an input that is systematically omitted from power budget calculations at commissioning because it is an estimate of future degradation rather than a current measurement. Optical connector insertion loss increases over time due to physical wear on the ferrule surface, oxidation of the polish face on non-APC connectors, and particle accumulation in environments where cleaning frequency is insufficient. A study of MPO connector aging in operational hyperscaler environments published in the Journal of Lightwave Technology in 2021 found a median insertion loss increase of 0.08 dB per connector pair per year in environments where connectors were cleaned at annual maintenance cycles. Over three years, four connector pairs on a 400G DR4 path add approximately 0.96 dB of loss above the commissioning measurement. A path that had 1.8 dB of margin at commissioning has 0.84 dB of margin after three years of normal aging — which is uncomfortably close to the IEC specification limit and provides no headroom for additional degradation or environmental variation.
|
|
||||||
|
|
||||||
The practical implication is that validation must demonstrate not just that the link passes today, but that it has sufficient margin to absorb the aging trajectory and still operate within specification at the end of the expected infrastructure lifecycle. Forty-eight-hour soak tests, DOM baseline capture, and conservative power budget calculations with aging factors built in are the three elements of a validation methodology that produces links which remain stable for four to seven years without callback. Teams that skip these steps generate stable links for six to eighteen months and then generate an ongoing stream of marginal link incidents that occupy disproportionate troubleshooting resources because the root cause — insufficient margin at deployment — is not visible in any single incident.
|
|
||||||
@ -1,22 +0,0 @@
|
|||||||
---
|
|
||||||
title: "OEM vs Compatible Optics: What the Lab Tests Actually Show"
|
|
||||||
type: comparison
|
|
||||||
target_audience: sales
|
|
||||||
score: 9/10
|
|
||||||
---
|
|
||||||
|
|
||||||
Head-to-head laboratory testing of OEM and compatible transceivers produces results that are more nuanced and more operationally useful than either camp's marketing literature suggests. The narrative from OEM vendors is that compatible optics are inherently inferior and pose reliability risk. The narrative from compatible vendors is that their modules are functionally identical. Both framings are misleading in ways that matter to the network operators who have to make purchasing decisions with real money and operate the resulting infrastructure for five to seven years. What the lab data actually shows is a more granular picture: specific parameters where the two module populations are statistically indistinguishable, specific parameters where compatible modules show measurable but operationally insignificant differences, and specific failure patterns that trace to process and deployment failures rather than to the optical components themselves.
|
|
||||||
|
|
||||||
The parameters that show no statistically significant difference in controlled lab comparison are also the parameters that matter most to link stability. TX launch power, RX sensitivity floor, maximum receiver input (the overload point), center wavelength accuracy on CWDM4 and LR4 variants, extinction ratio, and rise/fall time at 25 Gbaud all perform within the same range across OEM and quality-tier compatible modules when measured under identical temperature and load conditions. A 2023 comparative study conducted across twelve 100G QSFP28 LR4 modules — six OEM from two major switch vendors, six compatible from two tier-1 compatible manufacturers — found that TX launch power variance across all twelve modules was 0.8 dB, and that variance was not correlated with OEM versus compatible origin; it was correlated with manufacturing date and production lot. Two of the six OEM modules showed higher variation than any of the compatible modules in the same test.
|
|
||||||
|
|
||||||
Where compatible modules show measurable differences is in long-term temperature stability testing and in the statistical tail of the TX bias current distribution after 2,000 hours of accelerated aging. Under 85°C accelerated aging per Telcordia GR-468-CORE methodology, OEM modules from the two largest switch vendors showed a median TX power degradation of 0.11 dB over 2,000 hours. Compatible modules from tier-1 manufacturers showed 0.14 dB median degradation. The difference is real and statistically significant with sufficient sample sizes. The difference is also 0.03 dB, which is not operationally meaningful for a network with a correctly calculated power budget and appropriate margin. The compatible modules passed the same GR-468 CORE requirement, which specifies a maximum power degradation threshold. The difference matters if you are designing a system with zero margin and need every decimal of performance — which describes essentially no actual production deployment. It does not matter if you have followed the power budget discipline described in a correct deployment methodology.
|
|
||||||
|
|
||||||
The failure attribution problem is where the OEM narrative diverges most dramatically from what lab and field evidence supports. When a compatible transceiver fails in production, the cause is attributed to the module being compatible. When an OEM transceiver fails in production, it is attributed to aging, environmental conditions, or network events. This asymmetric attribution is not unique to optics procurement — it applies to every commodity infrastructure component — but it has a practical consequence: organizations that track RMA rates and failure root causes without adjusting for attribution bias will consistently overestimate the failure rate of compatible modules. A proper controlled comparison requires tracking failures of both module populations over the same deployment period, in the same environmental conditions, with failures diagnosed to root cause rather than assumed to be the module. When that methodology is applied, field failure rates for quality-tier compatible modules in 100G infrastructure come within 10 to 15 percent of OEM rates — a difference that is within the range explained by sample size variation and measurement methodology.
|
|
||||||
|
|
||||||
The deployment failures that are genuinely traceable to compatible optics rather than to process failures have a specific signature. The two mechanisms are EEPROM incompatibility with the target platform and missing or incorrectly implemented DOM register support. EEPROM incompatibility is not an optical performance failure — the module's laser and receiver are functioning correctly, but the switch platform refuses to enable the interface or displays incorrect DOM data because the vendor ID, part number, or capability bytes do not match the platform's qualification database. This is entirely resolvable through proper EEPROM programming before deployment. A compatible module programmed with platform-correct EEPROM data by Flexoptix or a similar service presents to the switch platform identically to a qualified OEM module, enables without warning, and surfaces DOM data through all the standard management interfaces. The optical component performance is the same; the management plane behavior is corrected.
|
|
||||||
|
|
||||||
Missing DOM register support is a less common but real quality differentiator. Some low-tier compatible modules implement DOM registers in a non-standard way, or do not implement certain optional registers that specific management platforms depend on for threshold monitoring. The consequence is that alarm and warning thresholds either do not function or surface incorrectly in the management plane. This is a legitimate quality concern that is addressed by sourcing from tier-1 compatible manufacturers whose modules implement SFF-8636 or CMIS completely and correctly, and by verifying DOM register compliance as part of the pre-deployment validation methodology.
|
|
||||||
|
|
||||||
The actual test data ranges that engineers should demand from compatible vendors before purchase are specific and quantifiable. TX launch power should be specified as a range with minimum and maximum values, not just a nominal, and the range should be consistent with the relevant IEEE or MSA standard. RX sensitivity should include the measurement methodology — BER floor at what bit error rate, measured at what wavelength, at what temperature. DOM register compliance should be stated against SFF-8636 revision 2.10 or CMIS 5.0 as applicable, with identification of which optional registers are implemented. Accelerated aging data under GR-468-CORE or equivalent should be available. Mean time between failure projections should cite the underlying test methodology and sample size. Vendors who cannot provide this data are not operating at the tier-1 compatible level and should not be evaluated further.
|
|
||||||
|
|
||||||
The cost difference between OEM and quality-tier compatible modules at 100G in 2026 is approximately $200 to $400 per port for QSFP28 variants, and approximately $600 to $1,000 per port for QSFP-DD 400G variants. A 512-port spine deployment at 400G represents a potential compatible-module savings of $307,200 to $512,000. At the volume of a hyperscaler or large enterprise, the savings at 100G access layer are often more than $1 million per major expansion. That economic case is sufficiently compelling that the correct evaluation question is not "are compatible modules as good as OEM?" but rather "what is the specific deployment methodology that makes compatible modules perform reliably at scale?" The methodology exists, is well-documented, and the lab data confirms that it works.
|
|
||||||
@ -1,24 +0,0 @@
|
|||||||
---
|
|
||||||
title: "PAM4 at 800G: Why FEC Errors Spike at Peak Traffic Hours"
|
|
||||||
type: technology_deep_dive
|
|
||||||
target_audience: technical
|
|
||||||
score: 9/10
|
|
||||||
---
|
|
||||||
|
|
||||||
The correlation between peak traffic load and FEC error rate increases at 800G is not intuitive because FEC errors are an optical and electrical signal integrity phenomenon, not a traffic volume phenomenon. Traffic volume itself cannot directly degrade an optical signal — photons do not care how many frames per second they carry. What traffic volume does do is generate heat inside the module, inside the ASIC, and inside the optical cage, and heat is the mechanism that closes the PAM4 eye diagram and drives pre-FEC BER upward. Understanding this chain — sustained utilization to thermal buildup to SNR degradation to FEC error increase — is the difference between a network operations team that watches FEC counters spike at 18:00 every business day and treats it as background noise, and one that understands it as a system operating at its thermal margin and heading toward a link failure event at the next thermal peak.
|
|
||||||
|
|
||||||
PAM4 modulation encodes two bits per symbol by using four discrete amplitude levels rather than the two levels of NRZ signaling. The signal-to-noise ratio requirement to reliably distinguish four amplitude levels is substantially higher than for two levels. At 800G with 106.25 Gbaud PAM4 on eight lanes, the vertical eye opening for each amplitude transition — the gap between adjacent signal levels — is approximately one-third the vertical eye opening of the equivalent NRZ signal at the same baud rate. This reduced eye opening is why the theoretical pre-FEC BER of PAM4 is higher than NRZ at the same optical power level. The IEEE 802.3df specification for 800GBASE-SR8 specifies a pre-FEC BER threshold of 2.4e-4 per lane under the RS(544,514) FEC scheme. That is not a floor — it is the maximum allowable pre-FEC BER at which the FEC scheme can reliably correct errors and deliver post-FEC BER below 1e-12. Operating near that threshold provides no margin.
|
|
||||||
|
|
||||||
The thermal mechanism works as follows. At 800G, each OSFP module is drawing between 9 and 20 watts depending on variant, and the ASIC ports driving those modules are adding additional heat to the PCIe card zone of the chassis. At 40 percent average utilization during business hours, the PCB temperature in the optical cage area is in a stable regime. As utilization climbs toward 70 to 75 percent during peak hours — a common evening peak for backbone and peering ports — the sustained electrical activity in the SerDes lanes, the ASIC forwarding elements, and the laser drivers increases heat generation. The module die temperature rises. On a QSFP-DD or OSFP module, the DOM temperature register captures this, and a module that showed 52°C at 40 percent utilization will often read 60 to 64°C at sustained 70 to 75 percent utilization in a chassis where the cage cooling is designed for average rather than peak loading.
|
|
||||||
|
|
||||||
The temperature increase of 8 to 12°C above average-load operating temperature has a direct effect on the optical transmitter's characteristics. In EML transmitters used in DR8 and FR8 variants, a 10°C rise reduces the extinction ratio by approximately 0.5 to 1.0 dB due to increased transparency current and altered chirp characteristics. In VCSEL arrays used in SR8 variants, a 10°C rise increases threshold current by 5 to 10 percent and reduces differential efficiency by a similar fraction, requiring the APC loop to increase bias current to compensate. If the APC loop is at or near its ceiling, the compensation is incomplete and TX power drops, reducing the received optical power and pushing the receiver's decision circuit toward the noise floor. The result is increasing symbol errors on the affected lanes, captured as rising pre-FEC BER.
|
|
||||||
|
|
||||||
Pre-FEC BER versus post-FEC BER tell different stories about the same link condition and should be read in conjunction, not in isolation. Post-FEC BER is what the traffic experiences — if RS-FEC is correctly correcting all symbol errors, post-FEC BER is zero and no frames are dropped. This causes the common misdiagnosis of "the link is fine because we're not dropping frames." Pre-FEC BER is what the physical layer is experiencing before correction, and it tells you how much of the FEC budget you are consuming. A pre-FEC BER of 1.0e-4 is consuming 42 percent of the RS(544,514) FEC correction capacity. A pre-FEC BER of 2.0e-4 is consuming 83 percent. A pre-FEC BER of 2.4e-4 is at the correction limit, and any transient that pushes it momentarily higher — a brief thermal spike, a vibration event, a voltage transient — produces a burst of uncorrectable errors and potentially a link down. The post-FEC counter shows nothing until the moment it shows everything.
|
|
||||||
|
|
||||||
The pre-FEC BER threshold that predicts imminent link failure is platform-specific, but a general operational rule is that sustained pre-FEC BER above 1.5e-4 during peak load on a link that reads below 5e-6 during low load represents a link that is thermally marginal and will fail within weeks to months under continued peak loading and normal environmental variation. The asymmetry between low-load and peak-load pre-FEC BER is itself diagnostic: a large ratio (more than two orders of magnitude difference) confirms the thermal mechanism rather than a persistent optical path degradation, which would show elevated pre-FEC BER continuously rather than only at peak load.
|
|
||||||
|
|
||||||
Operational changes that reduce peak-load thermal stress without hardware replacement fall into two categories. Chassis airflow management — cleaning filters, ensuring proper blanking panel installation so air does not bypass the modules, verifying that cable management does not impede cage-face airflow — can reduce module operating temperature by 3 to 7°C at peak load. On many Arista 7800 and Cisco NX-9500 series chassis, the fan speed control algorithm increases fan RPM in response to inlet temperature rather than in response to optical module die temperature directly, which means the fans may not ramp to their maximum speed until the inlet temperature rises, by which time the module die temperature has already spiked. Some platforms allow configuring a lower temperature threshold for fan speed increase, which reduces peak module temperature at the cost of approximately 3 to 8 percent higher steady-state fan power.
|
|
||||||
|
|
||||||
Traffic engineering — specifically, load-balancing policies that limit any individual 800G link to a maximum sustained utilization of 65 to 70 percent rather than allowing 80 to 85 percent — provides margin that the thermal control system cannot. This is a ECMP hashing or traffic policy configuration change with no hardware cost, and it is the most immediate intervention when a link is showing pre-FEC BER degradation at peak load. The objection that limiting link utilization "wastes" capacity is based on treating the link's data sheet maximum as the correct operating point, which it is not — the data sheet maximum is the specification limit, not the continuous operating point for a system that needs to remain healthy for a seven-year infrastructure lifecycle.
|
|
||||||
|
|
||||||
For links where thermal-marginal pre-FEC behavior persists after chassis airflow optimization and utilization policy changes, the root cause is typically that the chassis cooling system was not designed with 800G power density in mind. A 32-port OSFP 800G chassis running SR8 modules draws approximately 350 to 400 watts from optical modules alone at full utilization, in addition to the ASIC power. Older chassis designed for 100G or first-generation 400G traffic densities may not have the per-port cooling capacity for sustained 800G thermal loads. This is a platform refresh consideration, not a transceiver problem — but the pre-FEC BER data is what surfaces the constraint.
|
|
||||||
@ -1,22 +0,0 @@
|
|||||||
---
|
|
||||||
title: "Understanding RX Power Budgets Before You Deploy 400G"
|
|
||||||
type: tutorial
|
|
||||||
target_audience: technical
|
|
||||||
score: 9/10
|
|
||||||
---
|
|
||||||
|
|
||||||
The engineers who generate the most callback tickets on 400G deployments are the ones who did their power budget calculations at the per-fiber level rather than at the per-link level, or who used nominal connector loss values from a catalog instead of measured insertion loss from an OTDR or OLTS test set. The difference between a power budget that keeps a 400G link stable for five years and one that produces marginal behavior within twelve months is rarely more than 1.0 to 1.5 dB of unaccounted loss — but at 400G, 1.0 dB is the entire link margin on a 400GBASE-DR4 path and represents the difference between a link with headroom and a link that is operating at the edge of the specification.
|
|
||||||
|
|
||||||
Start with the applicable standard's channel insertion loss allocation. For 400GBASE-SR4 per IEEE 802.3bs, the maximum channel insertion loss is 2.9 dB at 850 nm on OM4 multimode. For 400GBASE-DR4, the maximum is 6.0 dB at 1310 nm on OS2 single-mode for a 500-meter reach. For 400GBASE-LR4, specified at 10 km, the budget is also defined but the arithmetic is typically dominated by fiber attenuation rather than connector loss. These numbers are the ceiling — if your calculated worst-case insertion loss exceeds them, the link will not meet specification. If your calculated nominal insertion loss leaves less than 1.5 dB of margin below the ceiling, you are designing a link that will reach its specification limit as connectors age and accumulate contamination over a four to six year operational lifetime.
|
|
||||||
|
|
||||||
The DR4 insertion loss budget deserves specific arithmetic because it is the one that most frequently surprises engineers who are accustomed to 100G margins. At 400GBASE-DR4, with a 500-meter OS2 fiber run, the fiber attenuation at 1310 nm contributes approximately 0.31 dB/km times 0.5 km, which is 0.155 dB. That is less than 3 percent of the 6.0 dB channel budget. Every remaining dB in the budget must be allocated to connectors, splices, patch panels, and the measurement uncertainty in the link's actual loss. A typical spine-leaf run in a single data center building uses four mated connector pairs: the switch port, an inline patch panel in the cable management path, a cross-connect or main distribution frame, and the switch port at the far end. At 0.5 dB per mated pair under clean conditions — a reasonable assumption for freshly installed and inspected LC or MPO-16 connectors — those four connector pairs consume 2.0 dB. Add the fiber and you are at 2.155 dB against a 6.0 dB budget. That appears to leave 3.845 dB of margin.
|
|
||||||
|
|
||||||
That 3.845 dB evaporates under a realistic aging and tolerance model. Connector insertion loss of 0.5 dB per pair is a nominal value for a clean, freshly mated connection. The IEC 61300-3-4 specification for MPO connector insertion loss allows up to 0.75 dB per mated pair for a compliant connector under test conditions. In an operational deployment where connectors are cleaned once per year, particle contamination in the Zone B region accumulates and adds 0.05 to 0.15 dB per pair per year based on published aging data. After three years, four connector pairs that started at 0.5 dB each are consuming 2.6 to 3.0 dB rather than 2.0 dB. Add two more connector pairs if the path includes a cross-connect at a mid-facility patch panel — a common architecture in larger data centers — and the connector total alone reaches 3.25 to 3.75 dB after three years. Combined with fiber attenuation and a measurement uncertainty allowance of 0.2 dB, the available link margin is now 2.05 to 2.55 dB. That is operationally adequate, but only if nothing else goes wrong.
|
|
||||||
|
|
||||||
The connection aging factor is the input that most power budget templates either omit entirely or apply as a fixed 0.1 dB per connector pair without citing an underlying data source. A more defensible approach is to audit the specific connector type — LC APC, LC UPC, MPO-16, SC — and the cleaning regime that will be applied to those connectors over the deployment lifetime, and to select an aging factor that is consistent with peer-reviewed data for that combination. The Corning White Paper WP7527 on optical connector aging provides measured data across connector types and cleaning frequencies that can be used as a technical basis for the aging factor. For LC APC connectors on OS2 in a data center with annual maintenance cleaning, 0.08 dB per connector pair per year is supported by the published data. For MPO-16 connectors with semi-annual cleaning, 0.06 dB per pair per year is a reasonable estimate.
|
|
||||||
|
|
||||||
Before deploying 400G onto an existing fiber plant that was previously carrying 100G or lower, a fiber audit is necessary rather than assumed-adequate. The audit consists of OTDR testing of every active fiber path to characterize insertion loss at the 1310 nm wavelength band, identification of reflectance events that indicate damaged or improperly mated connectors, and documentation of any bend radius violations introduced during previous cable management activities. Fiber that has been routed through trays over a period of years in a busy data center frequently has bend radius violations at the points where cable management loops are tightest. A tight bend on OS2 single-mode at 1310 nm contributes approximately 0.1 to 0.5 dB of bend-induced loss for a bend radius below 15 mm, which is within the range of structural damage from cable ties. OTDR traces will show these as elevated attenuation sections rather than discrete reflectance events, and they are distinguishable from connector loss by their distributed rather than point-source character.
|
|
||||||
|
|
||||||
The practical audit checklist for each fiber path before a 400G migration includes: end-to-end insertion loss measurement with an OLTS test set at 1310 nm and 1550 nm, OTDR trace with event markers at each connector pair, comparison of measured insertion loss against the DR4 budget with three years of aging factored in, documentation of any events above 0.5 dB that require investigation, and a note on the number of mated connector pairs in the path. Any path where the three-year-aged calculated insertion loss exceeds 5.0 dB on a 6.0 dB DR4 budget — leaving less than 1.0 dB of remaining margin — should be flagged for connector replacement or path re-routing before the 400G module is installed. Discovering a marginal path after the module is live and traffic is running produces a much more expensive remediation than identifying and addressing it during the audit phase.
|
|
||||||
|
|
||||||
Engineers who skip the power budget calculation and the fiber audit, then deploy 400G modules, are not lazy — they have typically been conditioned by 10G and 100G deployments where the margin was large enough to be forgiving of imprecision. A 10GBASE-LR SFP+ has a channel budget of 6.2 dB and a maximum reach of 10 km, which gives roughly 2.0 to 2.5 dB of margin on a typical building run even with degraded connectors. That conditioning produces an intuition that "it will work" without detailed calculation, and that intuition is correct often enough at 100G to be reinforced. At 400G DR4, the same intuition applied to a four-connector-pair path after two years of aging produces a marginal link — not a failed link, but a marginal one that generates intermittent symptoms and troubleshooting investment out of proportion to its cause.
|
|
||||||
@ -1,22 +0,0 @@
|
|||||||
---
|
|
||||||
title: "SFP28 Links That Work in the Lab but Fail in the Rack"
|
|
||||||
type: tutorial
|
|
||||||
target_audience: technical
|
|
||||||
score: 9/10
|
|
||||||
---
|
|
||||||
|
|
||||||
The gap between lab validation and production performance is wider for SFP28 than for any other common transceiver form factor, and the reason is thermal geometry. A lab test bench is, from an airflow perspective, a best-case scenario: the module sits in a single slot with open air on all sides, the ambient temperature is controlled to roughly 20 to 23°C, there is no adjacent slot heat contribution, and the module is running at low traffic load because the test is primarily checking link establishment and basic DOM function. A production chassis deploying 48 SFP28 ports is thermally the opposite: dense front-to-rear airflow that must cool 48 lanes of SerDes driving 48 SFP28 modules simultaneously, cage-to-cage thermal coupling where a module in slot 20 receives pre-heated air from the heat produced by modules in slots 1 through 19, and sustained utilization that keeps the SerDes at full electrical load continuously.
|
|
||||||
|
|
||||||
The SFP28 operating temperature specification is 0 to 70°C at the module case, which in SFF-8402 terminology means the temperature measured at the top surface of the module case at the midpoint of its length. That 70°C ceiling is the legal specification limit, not the comfortable operating point. A module operating continuously at 65°C is 5°C below specification but is running its VCSEL at approximately 15°C above the temperature it would experience in a well-cooled lab setup, and VCSEL degradation rate doubles for every 10°C increase above 60°C in the Arrhenius model. Production network engineers who see "0-70°C" on the data sheet and interpret it as "any temperature below 70°C is fine" are conflating the compliance boundary with the optimal operating range.
|
|
||||||
|
|
||||||
In a dense SFP28 line card or fixed chassis — a Cisco Nexus 9348GC, an Arista 7050CX3, or a Juniper QFX5100-48S, all of which pack 48 SFP28 ports into a 1RU chassis with constrained airflow — the rear ports typically run 4 to 8°C hotter than the front ports because the cooling air has absorbed heat from the front port modules before reaching the rear section. Measured data from chassis temperature diagnostic commands confirms this: on a Cisco Nexus 9348GC with 48 SFP28 ports at 80 percent utilization, the module temperature spread from coldest to hottest port is consistently 6 to 9°C in a properly sealed 25°C intake environment. The hottest modules — typically in ports 37 through 48 in rear-facing slot positions — read 58 to 64°C while the coolest modules in ports 1 through 8 read 50 to 56°C. Both populations are within specification. The population at 62°C is degrading at roughly 2.5 times the rate of the population at 52°C.
|
|
||||||
|
|
||||||
The specific failure mode that appears in production but not in the lab is thermal-marginal TX bias current. A VCSEL-based SFP28 module that was tested in the lab at 25°C ambient with a die temperature of approximately 35°C and a TX bias current of 6.5 mA is operating well below its APC ceiling of 15 mA. Install that same module in slot 42 of a 48-port chassis at sustained 75 percent traffic load, and the die temperature rises to 58 to 62°C. The APC loop increases bias current to maintain TX power as VCSEL efficiency falls with temperature. At 62°C, the same module is now running at 10 to 11 mA of bias current — 70 to 75 percent of its APC ceiling. The TX power reads nominally stable in DOM. The link appears healthy. But the module now has very little headroom before the APC loop reaches its ceiling, and any incremental temperature increase — a dirty chassis filter, a hot afternoon when the facility HVAC is under load, the thermal wake of a new high-power card installed in the adjacent slot — can push the module into the marginal region where TX power drops and the link becomes intermittent.
|
|
||||||
|
|
||||||
The diagnostic for distinguishing thermal failure from fiber failure from EEPROM incompatibility as the root cause of an SFP28 lab-to-production failure follows a specific logical sequence. First, check the module temperature register in DOM and compare it against the same module in a cooler slot or in the lab environment. A temperature difference of more than 15°C between the failed deployment and the test bench environment establishes thermal environment as a significant factor. Second, check the TX bias current register and compare it against the module's specification maximum and against the baseline captured at initial deployment. Bias current at or above 80 percent of maximum in a module that was at 50 percent of maximum at deployment confirms thermal-APC saturation as the active failure mechanism. Third, check the EEPROM vendor ID and platform compatibility status — an unsupported transceiver warning in the system log before the link failures is diagnostic of EEPROM incompatibility. These three checks, performed in sequence, identify the root cause within fifteen minutes for the vast majority of lab-to-rack failures.
|
|
||||||
|
|
||||||
The EEPROM cage temperature register deserves specific attention as a diagnostic tool because it reports what the chassis sees, not what the module's internal thermistor measures. On Cisco NX-OS and Arista EOS platforms, the show interface transceiver command returns both the module-reported temperature (from the SFF-8636 temperature register) and the chassis-reported cage temperature (from the chassis management controller's local sensor). Comparing these two values shows the thermal gradient between the cage environment and the module die. A 12°C gradient between cage and die temperature, combined with a cage temperature of 48°C, indicates a die temperature of approximately 60°C even if the ambient at the chassis inlet is 25°C. That combination — high gradient plus high cage temperature — identifies a module in a thermally stressed position even when the DOM temperature register value itself falls within the operating specification.
|
|
||||||
|
|
||||||
Chassis mixing problems represent a distinct category of lab-to-rack failure. SFP28 chassis have manufacturer-specific airflow profiles — some are front-to-rear, some are rear-to-front, and some are side-to-side. Mixing a front-to-rear chassis in a rack with rear-to-front adjacent chassis violates the hot-aisle/cold-aisle containment architecture and results in the intake of one chassis ingesting the exhaust of another. Module temperatures in the affected chassis rise by 8 to 15°C above design values. Lab testing uses single isolated chassis and never reveals this. The failure appears in production within the first week as intermittent SFP28 link events during afternoon peak hours when the thermal load is highest. The fix is rearranging the rack layout so that all chassis in a contained aisle have the same airflow direction — a change that requires a maintenance window but no hardware expenditure.
|
|
||||||
|
|
||||||
For SFP28 deployments in thermally dense environments where slot temperatures consistently exceed 55°C, selecting modules with extended temperature ratings (0 to 85°C case temperature, often marketed as "Industrial Temp" or "ET" variants) provides additional operating headroom and reduces the rate of VCSEL degradation at the thermal operating point. These modules typically cost 15 to 25 percent more than the standard 0-70°C variant. The premium is justified when the deployment environment is known to push module temperatures above 60°C — which any dense 48-port chassis at sustained high utilization in a moderately warm data center will do — and when the infrastructure lifetime expectation is five years or longer.
|
|
||||||
@ -1,22 +0,0 @@
|
|||||||
---
|
|
||||||
title: "400G ZR vs ZR+: Choosing the Right Coherent Optic for Your Metro Network"
|
|
||||||
type: comparison
|
|
||||||
target_audience: technical
|
|
||||||
score: 9/10
|
|
||||||
---
|
|
||||||
|
|
||||||
The 400G coherent optic landscape consolidated around two interoperable standards — OpenZR+ MSA's 400ZR and 400ZR+ — and the decision between them is more consequential than a simple reach comparison because the two specifications encode different tradeoffs that affect power consumption, platform compatibility, operational complexity, and the ability to share optical amplification infrastructure with other traffic. Getting this wrong means either paying a significant ongoing power and cost premium for reach capability you will never use, or deploying infrastructure that requires a costly replacement when a new network segment exceeds the 400ZR reach ceiling.
|
|
||||||
|
|
||||||
400ZR, defined by the OIF 400ZR Implementation Agreement, uses DP-16QAM modulation at 60 Gbaud to achieve 400G on a single 75 GHz or 50 GHz ITU-T grid channel. The maximum reach for a standalone point-to-point connection — no inline EDFA, no Raman pump, no dispersion compensation — is approximately 120 km on standard SSMF with -20 dBm launch power and a 15 dB OSNR budget. With a single inline EDFA, that reach extends to approximately 300 km. With a properly planned amplified route using multiple EDFA spans of 80 km each, reach to 1,000 km is achievable on low-loss SSMF with appropriate span engineering. The power consumption of a 400ZR QSFP-DD module is approximately 14 to 15 watts, which is notably lower than the 18 to 22 watts of 400ZR+ modules. At a 32-port switch with all QSFP-DD ports populated with coherent optics, the difference is 128 to 224 watts of continuous power draw, which at typical data center PUE and power cost represents $900 to $1,500 per year in operating expense.
|
|
||||||
|
|
||||||
400ZR+ is defined by the OpenZR+ MSA and extends the same spectral slot with software-selectable modulation formats: 400G DP-16QAM (identical to 400ZR for interoperability), 300G DP-8QAM for extended reach, 200G DP-QPSK for maximum reach, and 100G DP-BPSK for extremely long haul. The maximum reach at 200G DP-QPSK is approximately 2,000 to 2,500 km on standard SSMF with appropriate amplification — more than double the engineered reach of 400ZR at 400G throughput. The 400G reach on 400ZR+ using DP-16QAM is similar to 400ZR's maximum 400G reach but with more margin because 400ZR+ implementations typically have higher output power and better OSNR sensitivity than the minimum 400ZR specification.
|
|
||||||
|
|
||||||
The operational complexity difference between the two standards matters more than most network architects account for at design time. 400ZR is a fixed-modulation, simple-to-configure technology that behaves similarly to direct-detect optics from a management perspective — launch power, receive power, and pre/post-FEC BER are the primary operational parameters. 400ZR+ with selectable modulation requires operational decisions about which modulation format to run on each link, understanding of OSNR budget calculations for each format, and management of a system where reducing modulation order to increase reach also reduces throughput. The OSNR budget requirement for DP-16QAM (approximately 26 dB) versus DP-QPSK (approximately 14 dB) is a 12 dB difference in required OSNR, which translates directly into amplifier spacing and total link budget requirements. Teams that are not comfortable with coherent link budget calculations should not deploy 400ZR+ without the support of a coherent system vendor or a pre-validated optical line system.
|
|
||||||
|
|
||||||
Platform-specific validation is substantially more complex for 400G coherent modules than for direct-detect multirate optics. On Arista 7160 and 7280 series platforms with 400G coherent support, the platform requirements for 400ZR include specific firmware versions — EOS 4.26.2 and later for initial 400ZR support, EOS 4.28.0 for full OpenZR+ selectable modulation — and specific provisioning commands that differ from the configuration model for direct-detect optics. The Cisco ASR 9000 with 400G coherent PIDs requires IOS XR 7.5.2 or later for 400ZR support and a licensing activation for the coherent DSP functionality that is separate from the base platform license. On Juniper PTX10000 series, 400ZR coherent requires Junos 22.1R1 and the coherent TSYS-QSFP-400G-ZR PIC. Each of these platform versions introduced known bugs related to coherent module state reporting in DOM that were fixed in subsequent releases, and deploying on the minimum supported version without verifying the bug-fix releases is a source of management plane instability.
|
|
||||||
|
|
||||||
Coherent transceivers require operational management practices that differ fundamentally from direct-detect modules. TX power calibration on a coherent link is not set-and-forget: the optimal launch power depends on the total span loss, the EDFA gain setting, the nonlinear noise contribution at different launch powers on DWDM systems with multiple channels, and the target OSNR at the receive end. Overdriving a coherent link — launching at higher power than optimal — increases nonlinear noise from four-wave mixing and cross-phase modulation on multi-channel DWDM systems, degrading OSNR rather than improving it. Coherent link commissioning requires OSNR measurement at the receiver, iterative launch power optimization, and pre-FEC BER confirmation at steady state. This is a two to four hour commissioning process per link versus the fifteen-minute commissioning process for a direct-detect 400G DR4 link.
|
|
||||||
|
|
||||||
The 400ZR+ margin value proposition materializes when a network has segments that vary widely in path length and OSNR budget. A metro network with segments of 50 km, 180 km, and 800 km can run all three on the same 400ZR+ module hardware by selecting DP-16QAM for the 50 km segment, DP-8QAM for the 180 km segment, and DP-QPSK for the 800 km segment. The hardware SKU is identical across all three segments. Without 400ZR+ selectable modulation, the 800 km segment would require a different technology (traditional coherent system, transponder, or muxponder) with different hardware and different management integration. The margin on 400ZR+ pays for itself when the network has this reach variability and when the operational team has the coherent expertise to manage selectable modulation — or is willing to develop it.
|
|
||||||
|
|
||||||
For a network where all segments are under 400 km on a single-vendor platform with a design assumption of maximum 400G throughput per link and no plans for lower-throughput higher-reach segments, 400ZR with its lower power, simpler operation, and lower module cost ($1,800 to $2,400 for compatible 400ZR QSFP-DD in 2026 versus $2,800 to $3,600 for 400ZR+ QSFP-DD) is the correct choice. The Flexoptix platform-specific EEPROM programming service applies to both 400ZR and 400ZR+ variants, ensuring that the module presents correctly to the target platform's coherent management infrastructure and that DOM data surfaces without requiring vendor-specific software customization on the platform side.
|
|
||||||
@ -1,22 +0,0 @@
|
|||||||
---
|
|
||||||
title: "Fiber Plant Audit Before a 100G Upgrade: What to Check and Why"
|
|
||||||
type: tutorial
|
|
||||||
target_audience: technical
|
|
||||||
score: 9/10
|
|
||||||
---
|
|
||||||
|
|
||||||
Network teams that migrate from 10G to 100G and experience a wave of link instability within the first month almost uniformly skipped the fiber plant audit. The instability they experience is not caused by the 100G modules being defective or incompatible — it is caused by fiber infrastructure that was adequate for 10G's generous margin but is inadequate for 100G's tighter power budget, now exposed by the migration. The audit takes one to three days depending on the scale of the deployment. The post-migration firefighting it prevents takes weeks of engineer time and generates the kind of escalation heat that terminates migration projects and ends careers. The audit is not optional.
|
|
||||||
|
|
||||||
The OTDR testing methodology for a pre-migration audit differs from installation verification testing in important ways. Installation OTDR tests are typically single-direction, single-wavelength measurements done immediately after connector installation when connectors are new and clean. Pre-migration OTDR tests should be bidirectional — measuring loss from both ends and averaging the results to eliminate directional asymmetry from angled polish connectors — and should be performed at the wavelength of the target 100G technology. For 100GBASE-SR4, that is 850 nm. For 100GBASE-LR4, that is 1310 nm. For 100GBASE-CWDM4, it is 1271 to 1331 nm. Using an OTDR at 1310 nm to test a path that will carry 100GBASE-SR4 at 850 nm gives you results that do not map to the actual link budget because multimode fiber attenuation at 850 nm (typically 3.0 to 3.5 dB/km on OM4) is significantly different from attenuation at 1310 nm (approximately 1.0 dB/km). Always test at the operating wavelength.
|
|
||||||
|
|
||||||
Interpreting OTDR results against 100G power budget specifications requires understanding which events are measurement artifacts and which represent real optical loss. An OTDR event that shows 0.02 dB of reflectance gain — where the fiber appears to gain optical power rather than lose it — is a Fresnel reflection artifact at a connector that has an air gap, not a measurement of real gain. These ghost reflections can appear upstream of real events and create a false picture of the fiber path topology. Every event marker in an OTDR trace above 0.15 dB should be verified as a real connector pair location by cross-referencing against the fiber path documentation. An unmapped 0.25 dB event on a path that was supposed to have only three connector pairs is either a damaged splice or an undocumented connector that will consume budget headroom at 100G.
|
|
||||||
|
|
||||||
Fiber type compatibility is the most common and most expensive surprise in 100G migrations from legacy infrastructure. OM1 fiber, which was widely deployed in campus and enterprise buildings through the mid-2000s, has a 50-micron or 62.5-micron core and a minimum modal bandwidth of approximately 200 MHz·km at 850 nm for the 62.5-micron variant. The IEEE 802.3ba standard for 100GBASE-SR4 requires OM3 with a minimum modal bandwidth of 2000 MHz·km or OM4 with 4700 MHz·km. OM1 at 850 nm supports a maximum 100GBASE-SR4 distance of approximately 33 meters for 62.5-micron core, which means almost any OM1 run longer than the patch cord connecting a server to a top-of-rack switch will fail at 100G SR4. Teams that deployed OM1 in horizontal cable runs with 20 to 50 meter lengths between equipment rooms and server racks face complete fiber replacement for those segments, regardless of how well-maintained the connectors are.
|
|
||||||
|
|
||||||
OM2 is slightly better but not by much. The OM2 specification at 850 nm gives a maximum 100GBASE-SR4 reach of approximately 26 to 30 meters, depending on the specific OM2 fiber product. As with OM1, runs longer than that distance are not upgradeable to 100G SR4 without fiber replacement. OM3 supports 100GBASE-SR4 to 70 meters, which covers most intra-building horizontal runs, though it does not leave significant margin for longer runs in large facilities. OM4 is the minimum fiber type that makes 100G SR4 deployable without distance anxiety for runs up to 100 meters, and OM5 extends this further through wideband multimode operation. An infrastructure audit that characterizes all fiber paths by type — OM1, OM2, OM3, OM4, OS2 — and maps them against the path length data is the essential first step before any budget is allocated to 100G module procurement.
|
|
||||||
|
|
||||||
Connector degradation over time is the second category of audit findings that the migration team needs to quantify before deploying 100G. Connectors installed in the late 2000s and early 2010s, now 12 to 15 years into service life, have accumulated years of dust, mating cycles, and physical wear. Published data on MPO connector insertion loss degradation in operational environments shows that connectors cleaned once per year at annual maintenance see median insertion loss increases of 0.08 dB per mated pair per year. A connector pair that measured 0.3 dB at installation in 2010 may be at 1.2 to 1.5 dB by 2026. At 10GBASE-SR with a channel budget of 7.5 dB on OM3, this degradation is absorbed easily. At 100GBASE-SR4 with a channel budget of 1.9 dB on OM3, a single mated pair at 1.5 dB consumes 79 percent of the entire budget before accounting for any other loss in the path.
|
|
||||||
|
|
||||||
The audit checklist that prevents post-migration firefighting structures the work into three phases. The pre-test phase gathers all existing fiber plant documentation — installation records, previous OTDR trace files, fiber type certifications, and connector installation dates. These documents are frequently incomplete or absent for infrastructure installed more than eight years ago, in which case the physical test data becomes the sole basis for decisions. The test phase executes bidirectional OTDR traces at operating wavelength for every fiber path that will carry 100G traffic, supplemented by insertion loss measurement with an OLTS test set for paths with events that are marginal or ambiguous in the OTDR data. The analysis phase compares measured insertion loss against the 100G budget specification for the relevant technology type, applies a three-year aging factor to each connector pair measurement, and classifies each path as pass, marginal, or fail.
|
|
||||||
|
|
||||||
Remediation decisions for marginal and failing paths follow a cost-effectiveness filter. For a path where the only issue is connector contamination — measured insertion loss above 0.5 dB per mated pair on what should be a clean connector — wet-then-dry cleaning plus re-test brings most of those connections into compliance at negligible cost. For paths where insertion loss is elevated due to fiber bends or physical damage to the fiber, remediation requires either re-routing the cable to eliminate the bend or replacing the affected segment. For OM1 paths that are too short-reach for SR4 regardless of connector condition, the only practical option is fiber replacement. A decision rule that routes OM1 paths shorter than 20 meters to "accept as SR4 compatible," paths of 20 to 33 meters to "test with SR4 module before committing," and paths over 33 meters to "replace fiber or use single-mode LR4" correctly classifies most OM1 scenarios without requiring individual engineering judgment on each circuit. The economics of remediation versus replacement versus technology change should be calculated at the path level rather than applied uniformly, because a uniform policy will over-invest in some paths and under-invest in others.
|
|
||||||
@ -1,22 +0,0 @@
|
|||||||
---
|
|
||||||
title: "Why Your 400G DAC Cables Work at 3m But Not at 5m"
|
|
||||||
type: tutorial
|
|
||||||
target_audience: technical
|
|
||||||
score: 9/10
|
|
||||||
---
|
|
||||||
|
|
||||||
Direct Attach Copper cables at 400G have a physical cutoff point that is not a soft degradation but a hard failure boundary — a DAC cable that works perfectly at 3 meters will produce complete link failure at 5 meters under the same conditions, and the failure is not a flaky link or a marginal BER condition. It is a link that will not come up, or a link that comes up and immediately drops. Understanding why this happens requires understanding how PAM4 signaling interacts with the frequency-dependent attenuation characteristics of the coaxial or twinax copper medium, and why the number of signal levels in PAM4 makes this interaction far more consequential at 400G than it was at 100G with NRZ signaling.
|
|
||||||
|
|
||||||
Copper signal attenuation in the twinax medium used in SFP28, QSFP28, QSFP-DD, and OSFP DAC cables increases with frequency following a skin-effect model where attenuation in dB/meter scales approximately with the square root of frequency. At 26.5625 Gbaud — the symbol rate for 100G NRZ on a 4-lane QSFP28 DAC — the copper attenuation over a 5-meter 26 AWG twinax is approximately 18 to 22 dB depending on the specific cable construction. At the same cable length, a QSFP-DD 400G DAC operating at 53.125 Gbaud PAM4 per lane sees approximately 26 to 32 dB of insertion loss per lane, because the higher baud rate components of the PAM4 signal experience greater skin-effect attenuation. The SerDes and cable driver in the QSFP-DD module must overcome this with transmit equalization (pre-emphasis) and receiver equalization (CTLE and DFE) to reconstruct the original PAM4 signal at the receiver.
|
|
||||||
|
|
||||||
The equalization budget is finite and technology-specific. The receiver equalization in a QSFP-DD direct attach cable is implemented in the cable assembly's end connectors, not in the switch ASIC, because the host electrical interface for DAC cables is specified as a passive electrical specification — the cable assembly is responsible for meeting the signal integrity requirements at the host connector. The maximum equalization capability designed into typical QSFP-DD passive DAC cable assemblies supports insertion loss up to approximately 22 to 24 dB at Nyquist frequency (26.5625 GHz for 53.125 Gbaud PAM4). Below this loss limit, the cable assembly delivers a compliant signal to the host. Above it, the equalized eye remains open but with insufficient eye height and eye width to reliably decode PAM4 symbols with four distinct amplitude levels.
|
|
||||||
|
|
||||||
The reason the failure is sharp rather than gradual is the PAM4 amplitude level spacing. In a PAM4 signal with four amplitude levels — labeled 0, 1, 2, 3 — the spacing between adjacent levels is one-third of the total signal swing. After equalization that compensates for the bulk frequency roll-off but adds noise through the DFE tap adaptation, the effective noise floor relative to the inter-symbol spacing determines the symbol error probability. When the insertion loss is 22 dB (within equalization range), the equalized eye height at the decision threshold is above the noise floor with margin. When the insertion loss reaches 28 dB (beyond equalization range), the equalized eye height collapses to a small fraction of the noise floor and symbol error rate increases exponentially rather than gradually. This exponential behavior is why a 3-meter cable works and a 5-meter cable fails without a transitional zone of marginal performance.
|
|
||||||
|
|
||||||
The insertion loss versus length relationship for common AWG gauge twinax used in QSFP-DD DAC cables places the 22 to 24 dB Nyquist frequency insertion loss limit at approximately 3.0 to 3.5 meters for 26 AWG and approximately 4.0 to 4.5 meters for 24 AWG. This is why QSFP-DD passive DAC cables are typically available in 0.5, 1, 1.5, 2, and 3 meter lengths, but rarely in 4 or 5 meter lengths — the 4 to 5 meter range is where 26 AWG passive DAC cables fail and where 24 AWG passive DAC cables are at their limit. Manufacturers who sell 5-meter "passive" QSFP-DD DAC cables are either using 22 AWG cable (heavier, stiffer, harder to route in dense racks) or are actually selling active cables with integrated signal conditioning that they are labeling as passive for procurement simplicity.
|
|
||||||
|
|
||||||
Active Electrical Cables, also called active DAC or AEC, address the distance limitation by integrating a retimer or re-driver IC in the cable assembly connectors. The retimer fully reshapes the PAM4 signal, effectively resetting the signal integrity budget at each end rather than relying on passive equalization across the full cable length. AEC cables at QSFP-DD 400G support lengths of 5, 7, and in some implementations 10 meters, at the cost of power consumption — typically 1.5 to 2.0 watts per end connector, adding 3 to 4 watts total to the link power budget. AEC cables also require the host SerDes to operate correctly with the retimer's electrical interface characteristics, which is generally the case for production platforms but should be validated against the specific platform datasheet or QSFP-DD vendor qualification list. The latency of AEC cables is approximately 50 to 100 nanoseconds higher than passive DAC cables due to the retimer pipeline, which is irrelevant for most applications but matters for precision-timing applications and some high-frequency trading infrastructure.
|
|
||||||
|
|
||||||
Active Optical Cables at 400G QSFP-DD use the same form factor with optical fiber replacing the copper twinax core. AOC cables support distances of 10 to 100 meters and beyond, are immune to electromagnetic interference, and have consistent insertion loss across length that is not subject to the skin-effect copper attenuation penalty. The per-port cost premium over passive DAC is typically $80 to $150 for a 10-meter QSFP-DD 400G AOC versus $40 to $70 for a 3-meter passive DAC. For spine-leaf rack architectures where port-to-port distances are under 3 meters, passive DAC is the correct choice. For architectures where port-to-port distances range from 3 to 7 meters — as in some oversubscription-optimized pod designs where spine switches are mounted above the leaf switches with cable runs through overhead management — AEC fills the gap between passive DAC reach and AOC cost. For distances above 7 meters, AOC or structured optical cabling is the correct solution.
|
|
||||||
|
|
||||||
Specifying DAC cable lengths for spine-leaf port distances requires measuring actual port-to-port paths in the physical rack layout, not assuming a nominal rack-unit distance. A 3-meter cable specified for a port that is 14U above its peer in the same rack will need to route through a cable manager, potentially adding 0.5 to 0.8 meters of additional path length. A passive DAC specified at exactly 3 meters for a 2.6-meter measured port-to-port distance with cable management overhead becomes a cable that routes with cable ties creating 5 cm radius bends at every direction change — which does not cause electrical loss in passive copper DAC the way it would on optical fiber, but does cause mechanical stress at the connector boot over time. Specifying cables 0.5 meters longer than the measured path length gives routing latitude without pushing into the attenuation-limited length range.
|
|
||||||
@ -1,24 +0,0 @@
|
|||||||
---
|
|
||||||
title: "Pre-Deployment Checklist for 800G OSFP in Spine-Leaf Fabrics"
|
|
||||||
type: tutorial
|
|
||||||
target_audience: technical
|
|
||||||
score: 9/10
|
|
||||||
---
|
|
||||||
|
|
||||||
First 800G OSFP spine-leaf deployments fail in predictable ways, and almost all of the failure modes were already documented in TAC cases, vendor release notes, and network operator incident reports before the team doing the deployment encountered them. The engineers who spent 36 hours troubleshooting an 800G spine that would not bring up its OSFP ports had a 99 percent chance of encountering a firmware compatibility issue that was fixed in a release published three months before their deployment date. This pre-deployment checklist is the operational synthesis of those documented failure modes, structured to be run before the first module is inserted into a production chassis.
|
|
||||||
|
|
||||||
Firmware version verification is the first and most consequential check. Both Arista EOS and Cisco NX-OS had 800G OSFP support introduced in releases that contained known defects specific to OSFP port bring-up and DOM reporting that were fixed in subsequent releases. On Arista EOS, initial 800G OSFP support arrived in EOS 4.30.0, which supported 800GBASE-DR8 and SR8 optics with working link negotiation. EOS 4.30.1 fixed a bug where OSFP modules in specific slot positions of the 7800R3 series chassis reported incorrect cage temperature values that could cause the thermal protection system to incorrectly mask or disable ports. EOS 4.30.3 addressed a DOM polling race condition that caused CMIS state machine failures on OSFP modules at system boot when multiple OSFP ports initialized simultaneously. Deploying on EOS 4.30.0 means running with all three of these defects active. The correct target version for initial 800G OSFP deployments on Arista 7800 series as of mid-2026 is EOS 4.31.2 or later.
|
|
||||||
|
|
||||||
On Cisco NX-OS, 800G OSFP support for Nexus 9000 series platforms with the appropriate line cards appeared in NX-OS 10.3(2)F, which addressed the initial CMIS compatibility issues encountered with first-generation OSFP modules. NX-OS 10.4(1)F improved OSFP DOM polling to correctly handle the longer initialization time of 800G OSFP modules — which require approximately 3 to 5 seconds to complete the CMIS state machine initialization versus 1 to 2 seconds for QSFP-DD — preventing the platform from incorrectly declaring the module absent during boot. Before NX-OS 10.4(1)F, OSFP modules in specific line card slot positions would initialize correctly on cold boot but fail to reinitialize after a line card OIR event, requiring a manual `shut/no shut` on each affected port. The correct target NX-OS version for production 800G OSFP as of mid-2026 is 10.4(2)F or later.
|
|
||||||
|
|
||||||
EEPROM validation is the second checklist item and covers two distinct aspects. The first is platform compatibility — verifying that the module's EEPROM presents the vendor ID, part number, and CMIS revision that the target platform expects for unsuppressed operation. OSFP modules use CMIS (Common Management Interface Specification) version 4.0 or 5.0 for module management, and some platform-specific implementations have requirements about which CMIS revision a module must advertise. A module advertising CMIS 3.0 may initialize correctly on some platforms but fail to expose the full register set that 800G management functions require. Flexoptix EEPROM programming can address platform-compatibility encoding and CMIS revision presentation for OSFP modules, ensuring the module presents correctly across the specific platform versions in the target deployment.
|
|
||||||
|
|
||||||
The second EEPROM validation aspect is per-lane capability advertisement. OSFP modules at 800G implement media-side application codes that identify the supported 800G variants — 800GBASE-SR8, DR8, FR8, or 2xFR4 breakout. The application code must match the physical module variant, and the host system uses the application code to configure the SerDes lane mapping and FEC configuration. A mismatch between the application code and the physical module — which can result from incorrect EEPROM programming or from receiving a module that was mislabeled at the manufacturing stage — produces a link that initializes the host-side SerDes correctly but applies the wrong FEC configuration to the media-side lanes, generating uncorrectable FEC errors from the first transmitted frame.
|
|
||||||
|
|
||||||
DOM baseline capture is the third checklist item. After EEPROM validation and with the chassis running the verified firmware version, insert the module into a test chassis under representative thermal load and capture the following values within 30 minutes of thermal steady state: TX power per lane (8 lanes for 800G), RX power per lane (8 lanes), TX bias current per lane, module die temperature, supply voltage (3.3V primary), and all configured alarm and warning thresholds. This baseline data goes into the CMDB alongside the module serial number, target chassis position, and fiber path identifier. For 800G SR8 modules on OM4 or OM5, note the per-lane TX launch power variance — it should be less than 1.5 dB across all eight lanes for a healthy module. Lane imbalance above 2 dB at commissioning indicates a factory defect and the module should be returned before production deployment.
|
|
||||||
|
|
||||||
Alarm configuration is the fourth item and requires setting thresholds that are specific to the deployment context rather than accepting the factory defaults. Factory alarm thresholds for OSFP modules are set conservatively to avoid false positives across all deployment scenarios. For a production deployment where the power budget is known and the fiber path is characterized, alarm thresholds tuned to the specific deployment provide earlier warning of degradation. A practical configuration sets TX power low warning at 0.5 dB above the receiver sensitivity floor on the far end module (not at the generic factory threshold), TX bias current high warning at 75 percent of the rated maximum (rather than 90 percent), and cage temperature high warning at 60°C (rather than the specification maximum of 70°C). These tighter thresholds generate alerts at a point where the module is degrading toward a failure condition, providing time to schedule a replacement during a maintenance window.
|
|
||||||
|
|
||||||
The 48-hour burn-in process is the fifth item and is operationally more important for first 800G deployments than it was for mature 100G or 400G deployments because the 800G installed base is young enough that early-life failure rates are not yet fully characterized. Burn-in consists of running the module at full-rate traffic for 48 continuous hours while polling DOM registers every 60 seconds and monitoring pre-FEC BER on each lane. Modules that fail the burn-in period — defined as pre-FEC BER exceeding 1e-4 on any lane for more than 5 continuous minutes — are returned before going into production. Industry data on infant mortality in optical transceivers consistently shows that a 24 to 48-hour burn-in period catches 60 to 75 percent of the modules that would otherwise fail within the first 90 days of production service, at the cost of the burn-in time rather than the cost of a production outage.
|
|
||||||
|
|
||||||
Common mistakes on first 800G deployments fall into three categories that repeat across operator environments. The first is underestimating the time to thermal steady state — OSFP modules at 800G require 20 to 35 minutes from cold insertion to reach thermal equilibrium in a production chassis, and DOM readings taken before steady state produce misleading baselines. The second is treating 800G DAC cables as interchangeable with 400G DAC cables — the physical OSFP connector on an 800G cable is different from the QSFP-DD connector on a 400G cable, and mislabeled or misidentified cable inventory from mixed deployments causes the kind of connection confusion that generates multi-hour troubleshooting when a cable is physically inserted but the switch reports no module present. The third is not reading the OSFP module initialization sequence in the chassis event log before declaring a port failed — the CMIS state machine for 800G OSFP produces a specific sequence of syslog messages during successful initialization, and any deviation from that sequence points directly to the failure stage in the initialization process, reducing root cause analysis time from hours to minutes.
|
|
||||||
@ -1,22 +0,0 @@
|
|||||||
---
|
|
||||||
title: "When to Stop Using 10G SFP+ and What the Upgrade Path Actually Costs"
|
|
||||||
type: comparison
|
|
||||||
target_audience: sales
|
|
||||||
score: 9/10
|
|
||||||
---
|
|
||||||
|
|
||||||
The 10G to 25G or 100G upgrade conversation has a specific trigger point that most network architects know intuitively but rarely quantify: when uplink ports on access or aggregation switches sustain above 70 percent utilization for more than four hours per day, the economics of the upgrade shift from discretionary improvement to capacity-driven necessity. Below that threshold, 10G is cheap, operationally stable, and fully adequate for the workload. Above it, packet loss, latency variance, and increased retransmission rates are degrading application performance, and the cost of that degradation is larger than the cost of the hardware upgrade. The challenge is that most organizations reach the trigger point before they have done the cost modeling, which means the upgrade happens reactively and expensively rather than proactively and efficiently.
|
|
||||||
|
|
||||||
The 2026 per-port economics are more favorable for upgrading than they have been at any previous point in the technology's lifecycle. Compatible SFP+ SR optics for 10GBASE-SR run approximately $20 to $28 per port. Compatible SFP28 SR optics for 25GBASE-SR run approximately $35 to $45 per port — a premium of $15 to $17 per port for 2.5 times the bandwidth. Compatible QSFP28 SR4 optics for 100GBASE-SR4 run approximately $50 to $65 per port, a premium of $22 to $37 over SFP+ for 10 times the bandwidth. The per-gigabit cost at 100G is now approximately 20 percent of the per-gigabit cost at 10G. Stating it as an absolute per-port premium — $37 for the 100G versus 10G optic comparison — obscures how favorable the relative economics have become. The historical inflection point where 100G optic cost per port dropped below the 10G optic cost per port plus the bandwidth premium justification was 2022. In 2026 the economics of 100G are unambiguous for any application that generates over 3 Gbps of sustained traffic.
|
|
||||||
|
|
||||||
The full migration cost calculation includes four components that are routinely underestimated or omitted. The first is switch hardware: the access or aggregation switches must support the target port speed, which for a migration from 10G to 25G at the server access layer means replacing the switch rather than just the optics if the existing 10G switches do not have SFP28 ports. A 48-port 10G SFP+ switch with 4x 100G uplinks typically costs $2,000 to $4,000 to replace with an equivalent 48-port 25G SFP28 switch with 4x 100G or 2x 400G uplinks in 2026, depending on vendor and whether OEM or white-box hardware is used. For a 40-switch deployment, that is $80,000 to $160,000 in switch hardware alone — a cost that does not appear in the optic-cost-only analysis.
|
|
||||||
|
|
||||||
The second component is the cabling audit and remediation. OM1 and OM2 fiber, which was widely deployed for 10G SR connections in enterprise buildings from 2005 through 2014, is compatible with 10GBASE-SR at lengths up to approximately 33 meters on OM1 and 80 meters on OM2. Neither is compatible with 25GBASE-SR at those lengths — the 25GBASE-SR specification requires OM4 or OM5, and OM3 is only supported to 70 meters. An enterprise with 200 servers connected via OM1 patch cords to top-of-rack switches, each patch cord 2 meters long, might find that all 200 connections need OM4 replacement to support 25G SFP28. OM4 patch cords cost approximately $12 to $18 each in duplex LC format, but the labor to replace 200 patch cords in a live server environment during maintenance windows adds substantially to the real cost. Organizations that undercount this component discover it during the migration as a project-stopping surprise.
|
|
||||||
|
|
||||||
The third component is the operations labor for the migration itself. A 10G to 25G optic swap on a running server requires a maintenance window if the server has a single NIC port, or can be done hitlessly with a dual-NIC server that can failover. A 40-switch deployment with an average of 48 ports per switch is 1,920 port conversions. At a conservative estimate of 8 minutes per port including the optic swap, cable verification, link confirmation, and documentation update, that is 256 hours of hands-on operations labor. At $85 per hour burdened cost, that is $21,760 in direct labor — again, a cost that rarely appears in the optic-purchase-only budget that is often the only number leadership sees in the business case.
|
|
||||||
|
|
||||||
The fourth component is testing and validation time. A migration of 1,920 ports that is done without per-link validation produces a post-migration environment with some number of marginal or misconfigured links that generate support tickets over the subsequent 60 to 90 days. Those tickets cost roughly $200 to $400 in engineering time each. A migration with per-link validation before cutover costs 3 to 5 minutes of validation time per port but eliminates most post-migration tickets. The investment in validation is usually less than the avoided support cost for deployments larger than 200 ports.
|
|
||||||
|
|
||||||
The 25G versus 100G decision framework for server access versus aggregation layers has a clear structural answer that holds for most enterprise and cloud topologies in 2026. Server access ports connect servers to top-of-rack switches, and server NIC bandwidth requirements determine whether 25G or 100G is correct at that tier. A server running typical enterprise workloads — virtualization, database, application serving — with a 2x 25G bonded NIC produces a maximum of 50G of traffic toward the access switch, which makes a 25G access port (used in active-passive bonding) or a 100G access port (used in active-active LACP bonding with two 50G NIC ports) correct depending on the NIC configuration. Servers running storage-intensive or machine learning workloads with 200G or 400G NIC cards dictate 100G or 400G access ports. The aggregation and spine layers, which aggregate traffic from multiple access switches, need the bandwidth multiplication headroom of 100G or 400G regardless of access port speed.
|
|
||||||
|
|
||||||
A common planning error is selecting 25G server access ports based on the observation that existing servers only use 5 to 8G of bandwidth, without accounting for the server refresh cycle. Enterprise server lifecycles are typically 4 to 6 years. Deploying 25G access infrastructure today means the first generation of refreshed servers will arrive in 2029 to 2031. Server NIC bandwidth at that point will be dominated by 100G and 200G NIC options, and the 25G access infrastructure will be a bottleneck within 18 months of the server refresh completing. Deploying 100G access infrastructure today and accepting that current servers use only 25 to 30 percent of available bandwidth is the architecture that remains correct through the next full server refresh cycle and eliminates the access infrastructure replacement that would otherwise be required in 2030.
|
|
||||||
@ -1,24 +0,0 @@
|
|||||||
---
|
|
||||||
title: "CWDM4 vs PSM4 for 100G: Why the Four-Wavelength Decision Matters More Than You Think"
|
|
||||||
type: comparison
|
|
||||||
target_audience: technical
|
|
||||||
score: 9/10
|
|
||||||
---
|
|
||||||
|
|
||||||
The 100G QSFP28 market bifurcated cleanly along two lines when IEEE 802.3bm ratified in 2015: CWDM4 and PSM4. Both deliver 4x25G lanes over SMF to 500m, both land at roughly the same optical reach, and at a glance both seem interchangeable for the same cabling run. They are not. The decision between them compounds across thousands of ports in a real data center build, and getting it wrong means either pulling fiber or throwing away optics, neither of which is cheap.
|
|
||||||
|
|
||||||
PSM4 — Parallel Single Mode 4-lane — is conceptually the simplest architecture imaginable. Four 25G lanes each travel over a separate single-mode fiber at approximately 1310nm (the individual lane wavelengths are not tightly controlled since they don't need to be wavelength-division multiplexed), with all four using NRZ modulation at 25.78125 Gbps per lane. The connector is an MPO-12 (the outer 4 fibers on each side unused), which means every PSM4 link consumes eight fiber strands. This is the critical arithmetic: a 48-port leaf switch with PSM4 uplinks requires 192 individual fibers just for the uplinks. In a spine-leaf fabric with 10,000 server-facing 25G ports and 400 100G uplinks, PSM4 alone demands 3,200 strands of single-mode fiber between the layers. The Senko MPO connectors on production PSM4 modules — as used in Innolight TR-FC13J-NCD or Flexoptix P.10741 — have a mechanical life of roughly 500 insertion cycles before ferrule wear degrades the contact geometry enough to affect loss budget.
|
|
||||||
|
|
||||||
CWDM4 takes those same four 25G lanes and wavelength-division multiplexes them onto two fibers using four distinct center wavelengths: 1271nm, 1291nm, 1311nm, and 1331nm, with 20nm channel spacing. The two fibers are LC-duplex, which is the same connector your existing 10G and 40G plant almost certainly uses. The mux/demux is done with thin-film filter arrays inside the module itself. Each lane has its own CDR (Clock and Data Recovery) circuit, which is why CWDM4 modules burn approximately 3.5W versus PSM4's 2.5W — an additional 1W per module that, across a 10,000-port fabric, adds up to 10kW of additional cooling load. Flexoptix P.10733 and the Finisar FTLC1152RDNM are representative production examples. The CDR also introduces approximately 100ps of additional lane-to-lane deskew processing, though this is irrelevant for Ethernet since 802.3bm Clause 87 allows up to 120ns of skew between lanes.
|
|
||||||
|
|
||||||
The cost differential has narrowed considerably from 2017 highs when CWDM4 modules cost nearly three times PSM4, but a material gap remains. In volume pricing as of early 2026, compatible CWDM4 QSFP28 modules from a quality vendor like Flexoptix or ProLabs land at approximately €180-220 per unit, while PSM4 equivalents are €120-150. On a 400-port spine layer that is a €24,000 to €28,000 difference just in optics. That number must be weighed against fiber plant cost: an 8-fiber MPO trunk cable costs roughly 40% more than a 2-fiber LC-duplex equivalent for the same run length, and MPO cassettes for breakout add another €15-25 per port of termination cost. The crossover point where PSM4's cheaper optics are eaten by higher fiber plant costs typically occurs around the 200-300 port threshold for new greenfield builds where fiber is being installed anyway.
|
|
||||||
|
|
||||||
For brownfield environments, CWDM4 almost always wins on economics even at its optics premium. Any data center built after 2010 has LC-duplex SMF infrastructure to every cabinet. Pulling new 8-fiber MPO trunks to replace 2-fiber LC runs costs €8-15 per meter in installation labor plus materials, so a 50-meter average run to 400 switch ports is €160,000-300,000 in fiber plant costs before a single PSM4 module is purchased. The CWDM4 optics premium of €70 per module times 400 modules is €28,000 — a trivial fraction.
|
|
||||||
|
|
||||||
The interoperability risk that gets overlooked in vendor comparisons is connector polarity. PSM4 uses Type B MPO polarity (per TIA-568-C.3), meaning the fiber labeled 1 at one end connects to fiber 1 at the other. A Type A MPO cassette — the most commonly pre-installed type in legacy data centers — crosses the fibers, which will work fine for 40G QSFP+ where both ends use MPO, but PSM4 QSFP28 requires methodical polarity management. Plugging a PSM4 module into an incorrectly polarized MPO plant is a non-obvious failure: the module will power on, DOM will show nominal TX power on all four lanes at the transmitting end, but the far end will show either zero RX power or a scrambled fiber-to-lane mapping that produces persistent bit errors. Field engineers unfamiliar with PSM4 will spend 45 minutes inspecting the optics before realizing the MPO cassette orientation is wrong.
|
|
||||||
|
|
||||||
Platform support nuances also favor CWDM4 in heterogeneous environments. Cisco Nexus 9332C and 93180YC-FX both support CWDM4 and PSM4, but the 9200 series requires a firmware upgrade to enable PSM4 auto-negotiation correctly, and Juniper QFX5120-48Y had a known bug in Junos 20.2R1 where PSM4 modules would intermittently fail to come up after a port flap until the bug was addressed in 20.2R3. CWDM4 with its LC-duplex interface is electrically and mechanically simpler from the platform's perspective — the transceiver looks and behaves more like a conventional duplex interface, which means fewer edge cases in NOS port drivers.
|
|
||||||
|
|
||||||
The decision framework is straightforward once you quantify the numbers. For new hyperscale builds where leaf-to-spine cabling is being installed from scratch, PSM4 saves real money at scale when the fabric exceeds roughly 500 ports per tier. For enterprise data centers operating on existing LC-duplex SMF plant, any calculation that ends with pulling and replacing fiber plant for PSM4 should be rejected — CWDM4 at its optics premium is the rational choice. For inter-building runs where the fiber plant is OS2 single-mode but the connectors are already MPO for 40G migration, PSM4 is worth evaluating only if you have verified Type B polarity throughout. Mixed environments — where some switches use CWDM4 and some PSM4 — require optical-to-electrical breakout panels at the connection point, since you cannot directly couple a CWDM4 module to a PSM4 module regardless of the fiber plant. These modules are not optically compatible, full stop.
|
|
||||||
|
|
||||||
One final consideration: CWDM4 gives you a more credible upgrade path to 400G CWDM4 (100G per lane, 4 lanes on the same 1271/1291/1311/1331nm wavelength plan per IEEE 802.3bs Clause 87), meaning your fiber plant investment carries forward. PSM4 fiber infrastructure does the same job for 400G-DR4 (IEEE 802.3bs Clause 124), but DR4 requires OS2 with 0.2dB/km loss specification and highly polished MPO connectors, not the generic OM3/OM4 that 40G PSM4 sometimes ran on with margin to spare. If your 10-year fiber plant investment needs to justify both present-day 100G and future 400G density, the wavelength route with LC-duplex is the lower-risk architectural bet.
|
|
||||||
@ -1,24 +0,0 @@
|
|||||||
---
|
|
||||||
title: "What MSA Compliance Actually Guarantees (And What It Doesn't)"
|
|
||||||
type: technology_deep_dive
|
|
||||||
target_audience: technical
|
|
||||||
score: 9/10
|
|
||||||
---
|
|
||||||
|
|
||||||
The phrase "MSA-compliant" appears in nearly every compatible transceiver data sheet, and it is nearly meaningless as a guarantee of interoperability with any specific switch platform. Understanding why requires understanding what Multi-Source Agreements actually specify, what they deliberately leave unspecified, and how switch vendors exploit that ambiguity to implement lock-in that has nothing to do with optical performance.
|
|
||||||
|
|
||||||
A Multi-Source Agreement is a voluntary industry specification maintained by informal consortia of vendors — not a ratified standard from IEEE or IEC. The SFF Committee (Small Form Factor) publishes the foundational documents: SFF-8472 for SFP/SFP+ management interface, SFF-8636 for QSFP28 and QSFP+, and the OIF's CMIS (Common Management Interface Specification) covering QSFP-DD, OSFP, and QSFP112. These specifications define the physical connector dimensions to within tenths of a millimeter, the electrical interface characteristics (differential signaling, impedances, voltage rails), the I2C or MDIO management bus protocols, and critically, the EEPROM register map that exposes DOM (Digital Optical Monitoring) data. What they explicitly do not define is how a switch platform must respond to any particular EEPROM value. That gap is where vendor lock-in lives.
|
|
||||||
|
|
||||||
The SFF-8636 register map allocates byte 0 of page 00h as the identifier byte. Value 0x0D indicates a QSFP28, 0x11 a QSFP-DD. The next 128 bytes include the vendor name (bytes 148-163), vendor OUI (bytes 165-167), vendor part number (bytes 168-183), and vendor serial number (bytes 196-211). Nothing in SFF-8636 specifies what a host system must do with these bytes. Cisco decided to use the vendor OUI and part number to gate module recognition on Nexus platforms: if the OUI doesn't match a Cisco-approved value, NX-OS generates a "transceiver is not supported" warning and, depending on platform and version, may leave the port administratively disabled by default. The fix is "service unsupported-transceiver" in global config plus "no service unsupported-transceiver" at the interface level — but many network teams don't know this and interpret the warning as a compatibility failure rather than a policy enforcement flag.
|
|
||||||
|
|
||||||
Juniper takes a different approach on EX and QFX platforms. Junos checks a Juniper-specific EEPROM field that Juniper-branded modules contain but MSA-compliant third-party modules lack. The consequence is a log message at notice severity — not an alarm — but Junos will still bring the interface up. The practical issue is that Juniper's proactive DOM threshold alerts won't work unless the module's EEPROM has been programmed with Juniper-compatible alarm and warning thresholds in the correct registers. A module that is fully MSA/SFF-8636 compliant will report its DOM data correctly on any SFF-8636-aware management system, but Juniper's specific per-platform thresholds for "warn high TX power" may not trigger because the module programmed slightly different threshold bytes in the optional fields.
|
|
||||||
|
|
||||||
The distinction between IEEE 802.3-compliant and MSA-compliant is one that even experienced engineers conflate. IEEE 802.3 defines the optical and electrical performance specifications for the physical medium: minimum TX power, maximum TX power, receiver sensitivity, extinction ratio, eye diagram masks, wavelength accuracy. These are the specifications that determine whether the link will actually work. SFF-8472/8636 defines the electrical connector, I2C register map, and DOM data format — but says nothing about the optical performance of the module itself. A module can be perfectly MSA-compliant (correct form factor, correct EEPROM layout, correct electrical interface) while delivering optical performance that doesn't meet IEEE 802.3 LR4 spec, and vice versa. When evaluating a compatible transceiver vendor, the question "is it MSA-compliant?" is less important than "does it meet IEEE 802.3 Clause 88 optical specifications?" — because the latter is what determines whether the link actually achieves BER <1e-12 at 2km.
|
|
||||||
|
|
||||||
The EEPROM programming question gets more specific for certain Cisco platforms. Cisco Catalyst 9500 and Nexus 93600CD-GX will check for a specific byte pattern in the extended ID fields (bytes 64-95 of SFF-8636 lower memory map) that Cisco's internal module qualification process stamps into OEM modules. This check is separate from the OUI check. A module that passes the OUI check but lacks the extended ID pattern will generate a different warning code. Flexoptix programs EEPROM in-house at their Karlsruhe facility specifically to address this: they maintain platform-specific EEPROM templates for Cisco, Juniper, Arista, Huawei, and Nokia, ensuring that the relevant identification fields match what each platform's firmware expects. This is categorically different from a vendor who receives pre-programmed modules from a factory in Shenzhen with a generic EEPROM template and relabels them — the generic template may work on Arista (which does essentially no EEPROM validation beyond SFF-8636 compliance) but fail on a Catalyst 9300 that performs stricter field checks.
|
|
||||||
|
|
||||||
Arista EOS deserves specific mention because it is the most permissive of the major platforms in terms of EEPROM validation. By default, Arista will bring up any module with a valid SFF identifier byte and log a transceiver-unsupported warning without blocking traffic. The "xcvr" command family in EOS provides DOM data regardless of vendor bytes. This permissiveness is intentional — Arista explicitly supports third-party optics — but it also means that Arista environments see fewer "lock-in" failures, which can create a false sense of confidence about module compatibility that doesn't transfer to a Cisco or Nokia environment using the same optics.
|
|
||||||
|
|
||||||
Nokia 7750 SR platforms present a different wrinkle: Nokia uses a custom EEPROM field for their "Nokia Optical Transceiver" designation, and certain SR-OS versions (pre-22.x) require this field to be present for coherent modules on the line cards. For grey optics on FP4-based line cards, Nokia is more permissive, but DWDM pluggables require explicit Nokia compatibility certification, not just MSA compliance. The CMIS state machine requirements for QSFP-DD coherent modules add another layer: if the Nokia CMIS driver version doesn't match the module's CMIS revision (3.0 vs 4.0 state machine behavior differs in the DataPath activation sequence), the module may initialize correctly on 400G QSFP-DD grey optics but fail to complete the coherent channel initialization on 400ZR modules.
|
|
||||||
|
|
||||||
When evaluating any compatible transceiver vendor, the right question is not "are these MSA compliant?" — assume yes — but rather "which specific platform firmware revisions have you tested this against, what EEPROM programming do you perform for each target platform, and can you show me your test results on the specific NOS version I'm running?" A vendor who answers with "it's MSA compliant, it'll work" and can't produce platform-specific test evidence is giving you a factory-stock module with a generic EEPROM template and hoping for the best. For Arista 7050CX3, that often works. For Cisco Nexus 9336C-FX2 running NX-OS 9.3(8) with Cisco's latest transceiver database, the failure rate on unvalidated generic stock is meaningfully higher than zero.
|
|
||||||
@ -1,26 +0,0 @@
|
|||||||
---
|
|
||||||
title: "25G DAC vs AOC vs Optical: The Total Cost of Ownership Nobody Calculates"
|
|
||||||
type: comparison
|
|
||||||
target_audience: technical
|
|
||||||
score: 9/10
|
|
||||||
---
|
|
||||||
|
|
||||||
Every data center architect has been through the DAC versus optical conversation, usually at the point where someone in procurement discovers that a passive copper DAC costs €18 while an SFP28 SR module pair costs €120 and asks why anyone would pay six times more for the same 25G connection. The answer is not obvious from a unit price comparison, and the people who answer "always use DAC" for short distances have usually never managed a large-scale cabling change, dealt with an HVAC rerouting project, or attempted to replace a failed cable in a densely packed 2U server row.
|
|
||||||
|
|
||||||
Passive 25G DAC cables — the twin-axial copper assemblies conforming to SFF-8431 and IEEE 802.3by — operate reliably to approximately 3m in the Twinax configuration and to 5m in heavier-gauge variants, with some 7m cables marketed by vendors like FS.com and Molex that work in practice only on specific platforms with aggressive equalization. Beyond 5m, attenuation at 25.78125 Gbps NRZ exceeds what most SerDes equalizers can recover reliably, and you start seeing platform-specific behavior where Arista 7050CX3 will link up with a 7m DAC that a Cisco Nexus 93180YC-EX refuses to negotiate. Electrically, a passive DAC consumes zero power from the SFP28 cage — the optical port power budget shows 0W per lane. This is genuinely attractive in a high-density compute cluster where the sum of 500 server uplinks represents meaningful power and cooling overhead.
|
|
||||||
|
|
||||||
The physics problem with copper at 25G manifests as cable management complexity that doesn't show up in the procurement spreadsheet. A 5m 25G DAC cable has a minimum bend radius of approximately 40mm and weighs roughly 120g. A rack with 48 DAC connections to an adjacent ToR switch accumulates 5.76kg of cable mass, all of which has to be managed with cable arms, Velcro, and careful routing to avoid violating the bend radius at patch panel exits. More critically, passive DAC cables cannot be rerouted to a different rack without swapping the entire fixed-length assembly — and DAC cables are non-field-serviceable. When the 25G leaf switch in row 7 is replaced with a 100G capable switch during a refresh cycle and the new switch is 3 racks away instead of 1, every DAC cable becomes scrap. The per-unit cost of €18 that seemed so attractive in year one becomes €18 x 48 ports in disposal cost during the refresh, plus €18 x 48 for new cables of the correct length, plus roughly 2 hours of cabling labor per rack at €80/hour for a skilled data center technician.
|
|
||||||
|
|
||||||
AOC (Active Optical Cable) splits the difference in an uncomfortable way. An AOC for 25G — physically an SFP28 module at each end bonded permanently to a multi-strand OM3 fiber cable — costs approximately €55-80 for a 3m assembly from quality vendors like Flexoptix P.10811 series or Lumentum. The optical cable portion can be routed around bends as tight as 5mm (vs 40mm for copper Twinax), the cable weighs approximately 30g for a 5m assembly versus 150g for a DAC equivalent, and AOC works to 30m reliably on OM3. These properties make AOC genuinely superior for high-density cabling where cable management is constrained, particularly in blade server environments where cables must traverse tightly managed channels.
|
|
||||||
|
|
||||||
The trap with AOC is the non-field-serviceability problem, now worse than DAC because the fiber plant is integrated into a relatively expensive assembly. When an AOC fails — the most common failure mode is the active element at one end developing a fault, which happens at a rate of approximately 0.8-1.5% per year based on field data from large deployments — you lose the entire €70 assembly and cannot reuse any component. Compare this to a discrete optical solution: SFP28 SR module (Flexoptix P.10701 or equivalent) plus a 3m duplex OM4 patch cord costs approximately €50 per module (€100/pair) plus €8-12 for the patch cord. When the SFP28 SR fails — field MTBF on quality modules runs 5-7 years — you replace the €50 module, not the fiber. The patch cord, if undamaged, serves another 15-20 years.
|
|
||||||
|
|
||||||
The 7-year TCO model is where optical wins decisively for anything larger than a pilot deployment. Assume a 48-port server-to-leaf interconnect with an average distance of 5m, requiring one link refresh over 7 years (swap rate of 0.8%/year = roughly 3 port failures per year, 21 total over 7 years). For DAC: €18 initial cost x 48 = €864 plus one full cable replacement at switch refresh in year 4 at €18 x 48 = €864 again, total €1,728 plus 2 hours labor for the refresh at €160 = €1,888. For AOC: €70 x 48 = €3,360 initially plus €70 x 21 failure replacements = €1,470, plus the year-4 refresh at €70 x 48 = €3,360, total €8,190. For optical (SFP28 SR + patch cord): €50 module + €10 cord x 96 modules + 48 cords = €5,280 initial, plus €50 x 21 module failures = €1,050, plus year-4 refresh requires only new optics on the new switch (the fiber plant stays), so €50 x 48 modules for the new switch = €2,400. Total optical 7-year cost: €8,730.
|
|
||||||
|
|
||||||
That calculation looks like AOC beats optical narrowly — and for a static 48-port deployment it might. The model collapses when you introduce moves, adds, and changes. In a production data center, roughly 20-25% of server connections move or change distance within any given year. For 48 ports, that's 10-12 DAC or AOC swaps annually just from MAC activity, each requiring a physically matching replacement. The DAC inventory problem is concrete: you need to stock 1m, 2m, 3m, 5m variants. A stocking policy for 4 DAC lengths costs more in inventory carrying cost than the difference between DAC and optical becomes irrelevant. With optical, you reuse the fiber plant and swap only the SFP28 modules, which are all the same SKU regardless of reach.
|
|
||||||
|
|
||||||
The power differential bears quantification for large deployments. Passive DAC: 0W per link, effectively zero. AOC: approximately 1W total (both active ends combined), so 0.5W per SFP28 equivalent position. SFP28 SR: approximately 1.0W per module at full output, 2.0W per link pair. At 1,000 links (a modest-sized leaf layer), optical consumes 2,000W more than DAC — roughly €1,400 per year in electricity at European data center power costs of €0.10/kWh PUE-adjusted. This is real money but it needs to be compared against the infrastructure flexibility cost of locking yourself into a fixed-length copper plant that cannot adapt to network topology changes without full cable replacement.
|
|
||||||
|
|
||||||
The structured cabling argument often gets inverted in these discussions. OM4 multimode fiber installation for a 500-server deployment costs approximately €25-35 per port in properly installed horizontal cabling — a one-time infrastructure investment that can support OM4-compatible speeds from 10G through to 100G (SFP28 SR) and potentially 200G (SFP56) without touching the fiber plant. That €25/port paid once amortizes over 15 years. The DAC solution defers that infrastructure investment but forces a de-facto fiber installation during every rack refresh cycle as cable lengths change, at a per-instance cost higher than the original structured cabling would have been.
|
|
||||||
|
|
||||||
The correct answer for server-to-ToR connections is: DAC for static, single-rack, cost-constrained deployments with no expected topology changes; optical for any environment with active MAC activity, cross-aisle connections, or a service life beyond 3 years. AOC occupies a narrow wedge where you need 10-30m reach and don't want to invest in structured cabling infrastructure — typically useful for storage interconnects to NAS arrays on the opposite side of a raised floor.
|
|
||||||
@ -1,24 +0,0 @@
|
|||||||
---
|
|
||||||
title: "Grey Optics vs DWDM for Metro: The Point Where Wavelengths Start Saving Money"
|
|
||||||
type: comparison
|
|
||||||
target_audience: technical
|
|
||||||
score: 9/10
|
|
||||||
---
|
|
||||||
|
|
||||||
The transition from grey optics to DWDM pluggables is the most consequential optical infrastructure decision most enterprise network architects and ISP engineers make, and it almost always gets made too late, after the fiber lease costs have already become embarrassing on a P&L. The economics are counterintuitive: you spend more per port on optics to spend dramatically less on transport infrastructure. Understanding where the crossover point sits requires building an actual model rather than relying on rules of thumb.
|
|
||||||
|
|
||||||
Grey optics — the industry's informal term for single-wavelength transceivers operating outside the DWDM C-band grid — cover the practical metro range with two common choices. For 2km to 10km, 1310nm LR (IEEE 802.3ae Clause 49 for 10G, Clause 88 for 100G LR4) is the workhorse. For 10km to 40km, 1550nm ER modules based on directly-modulated or electro-absorption lasers handle distances up to 40km in 100G and to 80km in 10G with appropriate optical budget. Compatible 100G LR4 QSFP28 modules (Flexoptix P.10731) run approximately €120-180 each; 100G ER4 (Flexoptix P.10732) cost approximately €280-400 depending on reach variant. These are the cheapest optical transceivers that will cover metro spans. The problem is that each module occupies an independent fiber pair, and fiber in metro areas costs real money.
|
|
||||||
|
|
||||||
Dark fiber lease pricing in metro areas varies significantly by market but runs approximately €500-2,500 per fiber pair per month for intra-city spans of 5-15km in European markets. Frankfurt and Amsterdam, where carrier-neutral facilities concentrate, are at the lower end of this range due to competitive fiber market density; secondary markets like Leipzig, Eindhoven, or Salzburg run at the upper end. A network operator with 8 100G circuits between the same two data centers — which is not unusual once you include redundant paths, separate traffic classes, and capacity reserves — is paying for 8 fiber pairs, or €4,000-20,000 per month purely in fiber lease costs for that one A-to-B metro segment.
|
|
||||||
|
|
||||||
DWDM changes this arithmetic completely. ITU-T G.694.1 defines the standard DWDM channel grid with 100GHz spacing across the C-band, providing 40 usable channels between 1530nm and 1565nm, or with 50GHz spacing (now standard for 100G and above), 80 channels. A single fiber pair carrying DWDM can multiplex all 80 channels, each carrying 100G or 200G, over one fiber pair. Eighty 100G circuits over one fiber pair replaces 80 fiber pairs. At €1,000/pair/month, that is €80,000/month in fiber cost reduced to €1,000/month — a €79,000/month improvement. The DWDM optics cost for that scenario (80 QSFP28 DWDM modules at each end): approximately €800-1,200 per fixed-wavelength QSFP28 DWDM module from vendors like Lumentum or Flexoptix P.11101, so €64,000-96,000 for 80 modules at one end, paid once. The ROI at even 4 circuits sharing a fiber pair is positive within months.
|
|
||||||
|
|
||||||
The specific QSFP28 DWDM form factor comes in two distinct architectures with significantly different costs. Fixed-wavelength DWDM QSFP28 modules are pre-set to a single ITU channel at the factory — channel 33 at 193.1 THz (1550.92nm), for instance — and cannot be retuned without physical replacement. They cost approximately €800-1,500 each from established vendors. Tunable DWDM QSFP28 modules cover the full C-band (nominally channel 1 through 96 on 50GHz grid, though most implementations cover channels 17-61 for 100GHz spacing or channels 17-122 for 50GHz) and can be programmed to any channel via CMIS or SFF-8636 management interface. Tunable modules from Lumentum, Acacia (now Cisco), or available through Flexoptix run approximately €2,000-3,500 each. The inventory advantage of tunable is compelling: one SKU replaces 80 SKUs, which matters enormously for spare management.
|
|
||||||
|
|
||||||
The next tier up is CFP2-DCO (Digital Coherent Optics) for distances beyond what direct-detect QSFP28 DWDM can handle. CFP2-DCO modules from vendors like Coherent (formerly II-VI), Lumentum, and Acacia cover 80km+ with coherent detection, PM-QPSK or 16QAM modulation, and onboard DSP for dispersion compensation. These run €3,000-5,000 per module. For 100G-ZR+ in QSFP28 form factor, the OpenZR+ standard (implemented by Inphi Colorz-II, Acacia AC400, and the OpenZR+ MSA modules) achieves 120km with coherent DP-QPSK, fitting in a standard QSFP28 cage. These represent the current price-performance boundary for metro coherent: approximately €1,500-2,500 per module, 120km reach without external amplification, and QSFP28 form factor that fits existing switch hardware.
|
|
||||||
|
|
||||||
The ROI model needs to account for four specific financial variables: fiber lease cost per pair per month, number of parallel A-to-B circuits, distance (which determines whether direct-detect DWDM or coherent is needed), and the amortization period for optics investment. For a network with fewer than 4 parallel circuits between any given pair of sites at fiber lease costs below €800/pair/month, grey optics with multiple fiber pairs is usually cheaper over a 3-year horizon. Above 6 circuits, or when fiber lease cost exceeds €1,200/pair/month, DWDM pays back in under 18 months at 100G rates. The specific inflection point also shifts when rack space is constrained: a 48-port QSFP28 chassis running DWDM carries 48x100G over 2 fibers, while the same chassis with grey optics requires 48 fiber pairs terminated into patch panels that may consume 2-4U of patch panel space alone.
|
|
||||||
|
|
||||||
There is a practical distance limitation on direct-detect DWDM QSFP28 that surprises engineers migrating from DWDM line systems: without inline amplification, chromatic dispersion limits 100G NRZ DWDM to approximately 80km on standard SMF-28 (D = 17 ps/nm/km at 1550nm), and without integrated DCM (Dispersion Compensating Module), the accumulated dispersion at 80km is approximately 1,360 ps/nm, which is within direct-detect QSFP28 tolerance only with DSP-based EDC. The coherent QSFP28 ZR and ZR+ modules handle this via the DSP, but conventional direct-detect DWDM QSFP28 modules must operate within their specified reach. A 100G DWDM QSFP28 rated to "80km" on the data sheet means 80km with the specific dispersion budget they tested — span loss and dispersion must both be within spec. A circuit with 60km distance but aging fiber showing 0.35dB/km loss plus high PMD from repeated cable repairs may fall outside the module's budget even at shorter distance.
|
|
||||||
|
|
||||||
The organizational reality in most ISPs is that the DWDM transition happens piecemeal: one high-traffic corridor migrates first, then successive rollouts as lease renewals come up for other corridors. For network teams running this transition, Flexoptix provides one tangible operational advantage: they can program and test channel-specific DWDM modules against the customer's target platform before shipment, verifying not only that the wavelength is correct but that the EEPROM configuration will be recognized correctly on the specific NOS version in use. Ordering pre-programmed channel modules from a grey-market vendor that ships generic factory stock means you may receive a module that DOM reports correctly on a lab Arista but behaves differently on a Nokia 7250 IXR where the CMIS driver expects specific OIF field values. The fiber lease savings are too large to risk on untested optics.
|
|
||||||
@ -1,24 +0,0 @@
|
|||||||
---
|
|
||||||
title: "ESD Damage: The Silent Transceiver Killer That Doesn't Show Up on Day One"
|
|
||||||
type: tutorial
|
|
||||||
target_audience: technical
|
|
||||||
score: 9/10
|
|
||||||
---
|
|
||||||
|
|
||||||
ESD-damaged transceivers are one of the most expensive categories of avoidable failure in optical networking, and they are particularly insidious because the majority of ESD damage doesn't kill a module immediately. The module passes power-on tests, links up, reports nominal DOM values — and then fails three weeks later when you're troubleshooting an unrelated issue at 2 AM. Understanding the physics of latent ESD damage, recognizing its specific failure signatures, and maintaining proper handling discipline requires treating transceivers the way semiconductor fabs treat wafers: with systematic protocol rather than occasional care.
|
|
||||||
|
|
||||||
Optical transceivers are classified as ESD sensitive devices under JEDEC standard JESD22-A114F, specifically in sensitivity class 1C, which means they can sustain damage from discharges as low as 500V in the Human Body Model (HBM) test. The HBM models the discharge from a fingertip to a metal pin: a human body capacitance of approximately 100pF charged to body potential through a 1.5kΩ series resistance. On a dry day in an air-conditioned data center — relative humidity below 30%, polyester carpet, rubber-soled shoes — a walking technician accumulates 2,000-8,000V of triboelectric charge. This is not marginal. A brief contact between a bare finger and an exposed SFP28 electrical contact delivers a discharge energy that is 4-16 times the minimum damage threshold for the laser driver IC and transimpedance amplifier.
|
|
||||||
|
|
||||||
The reason latent failure dominates over immediate failure in ESD statistics is gate oxide breakdown mechanics. The laser driver and TIA circuits in a 25G or 100G transceiver use CMOS gate oxide layers typically 2-5nm thick. When a partial discharge — below the threshold that causes immediate catastrophic failure — reaches the gate oxide, it creates localized defects: electron traps and hole traps in the silicon dioxide lattice. The device continues to function because the defect density is not yet sufficient to cause measurable leakage current. Over days to weeks of normal electrical stress at operating voltage, the oxide degrades at the defect sites through a process called time-dependent dielectric breakdown (TDDB). The module that passed initial testing with TX power of -1.5 dBm now shows -3.8 dBm, then -6.0 dBm, then drops to the point where the link won't re-establish after a port flap. The failure has a long tail, which means it often gets misattributed to fiber contamination, cable degradation, or switch port issues.
|
|
||||||
|
|
||||||
The specific diagnostic signatures that distinguish latent ESD failure from other common failure modes are worth memorizing. ESD-damaged transmitter ICs typically show TX output power trending downward over days to weeks, often 0.5-2.5 dB below the module's nominal TX power at time of commissioning, without any corresponding change in temperature, supply voltage, or bias current (which DOM may not report accurately for this failure mode anyway). The RX side ESD signature is degraded receiver sensitivity — the module links up with BER in the acceptable range on a clean short fiber, but shows elevated pre-FEC BER on spans that were previously error-free. On a Cisco Nexus running NX-OS, the command "show interface ethernet 1/1 transceiver detail" will show RX power within nominal range but the link will flap intermittently when thermal cycling occurs during day/night temperature variation. Fiber contamination produces similar intermittent RX symptoms but will have a clear correlation with physical insertion events; ESD degradation occurs independently of any fiber plant disturbance.
|
|
||||||
|
|
||||||
The three most common ESD failure vectors in a data center context are: technicians handling modules during installation without wrist straps, modules being removed from anti-static bags and placed on non-conductive surfaces (cardboard shipping boxes, plastic trays, cloth on a workbench) where they can be charged by induction, and modules being removed from switch ports and set down on the top of a switch chassis that is at a different ground potential. The third scenario is common in field deployments where technicians swap modules quickly during a maintenance window without unpacking a ground strap every time. A module removed from a powered switch port retains charge from the switch backplane on its contacts; setting it on a metal chassis at equipment ground equalizes that charge through a fast discharge event right through the module's I/O pins.
|
|
||||||
|
|
||||||
Wrist strap usage is necessary but not sufficient, and most data center technicians implement it partially wrong. A wrist strap must be connected to the same ground reference as the equipment being worked on — not just to any convenient ground. A wrist strap connected to a building ground lug while working on equipment connected to a PDU-grounded chassis may still produce harmful transient voltages if there is a ground potential difference between the two reference points, which is common in older facilities with star-ground wiring issues. The correct procedure is wrist strap connected to the ESD mat, ESD mat connected to the chassis earth lug via a 1MΩ current-limiting resistor (to prevent shock hazard while providing charge equalization). The 1MΩ resistor is the standard recommendation in IPC-A-610 and JEDEC JESD625: it limits current from an inadvertent line voltage contact to below 0.5mA while still draining electrostatic charges at an acceptable time constant.
|
|
||||||
|
|
||||||
Anti-static bags warrant specific attention because their properties are widely misunderstood. A metallized anti-static bag (the silver or pink foil type) provides Faraday shielding that prevents electrostatic fields from penetrating to the device inside when the bag is properly sealed. A module placed on top of an anti-static bag — not inside it — receives essentially zero benefit from the bag. A module stored in a punctured or unsealed bag loses the shielding benefit at the opening. Pink polyethylene anti-static bags (the soft, slightly conductive foam variants) provide dissipative properties but not shielding — they bleed charge off a device placed on them but don't block external fields. For transceivers above €100/unit, the metallized shielding bags are the appropriate packaging for field storage and transport; the pink foam pouches are adequate for short-term bench use in a controlled ESD environment.
|
|
||||||
|
|
||||||
The cost arithmetic justifies investment in proper ESD infrastructure. A 400G QSFP-DD-DR4 transceiver (Flexoptix P.40101 or equivalent) costs approximately €350-500 per unit. An ESD-induced latent failure requiring replacement at 6 months post-installation incurs not just the module replacement cost but the labor and downtime cost of a maintenance window: minimum 2 hours for scheduling, change management documentation, and execution in a production environment, at enterprise internal charge rates of €150-250/hour. Total cost per ESD failure event: €700-1,250. An ESD control station — anti-static mat, grounded wrist strap with 1MΩ resistor, ionizing air gun for work on non-groundable assemblies, and a proper storage rack for used modules — costs approximately €150-200 as a one-time installation. This pays back in prevented failures on the second or third module that would otherwise have been damaged.
|
|
||||||
|
|
||||||
For data center operators conducting post-failure root cause analysis, the diagnostic that most reliably distinguishes ESD damage from end-of-life or contamination is the history of TX power trend. If DOM logs (available from syslog with "snmp-server enable traps transceiver" on Cisco or equivalent on other platforms) show a gradual monotonic decline in TX power over a 2-8 week period following a module installation event, ESD latent failure is the probable cause. Contamination produces immediate or weather-correlated RX power variation, not transmitter power decline. End-of-life laser aging typically produces TX decline over years, not weeks. An installation event that involved module handling without ESD control, followed by a gradually deteriorating TX power starting within the first few weeks, is a near-certain ESD failure event regardless of what the technician remembers about handling procedures.
|
|
||||||
@ -1,24 +0,0 @@
|
|||||||
---
|
|
||||||
title: "Tunable Coherent vs Fixed Wavelength: When Flexibility Is Worth the Premium"
|
|
||||||
type: comparison
|
|
||||||
target_audience: technical
|
|
||||||
score: 9/10
|
|
||||||
---
|
|
||||||
|
|
||||||
The decision between tunable and fixed-wavelength DWDM optics is rarely framed correctly in vendor conversations. The typical sales pitch for tunable emphasizes the "future-proof flexibility" without quantifying what that flexibility actually costs or under what specific network conditions it delivers a positive ROI. The inverse error is just as common: operators dismiss tunable as overpriced complexity and then discover that their fixed-wavelength spare management is costing them more than the tunable premium would have. Getting this decision right requires understanding not just the price differential but the operational and architectural conditions that make each choice rational.
|
|
||||||
|
|
||||||
Fixed-wavelength DWDM transceivers are manufactured with the laser operating at a specific ITU-T G.694.1 channel frequency. A module labeled "C33" operates at 193.1 THz, corresponding to 1550.92nm, and that is the only wavelength it will ever produce. The laser's operating temperature and bias current are factory-set to maintain that specific center frequency within ±2.5 GHz (the coherent DWDM alignment tolerance for 100GHz grid) or ±1.25 GHz for 50GHz grid operation. Fixed-wavelength QSFP28 DWDM modules from quality vendors like Lumentum, Acacia, and those available through Flexoptix cost approximately €800-1,500 per unit in single quantities, dropping to €500-900 in volume above 50 units. The lower cost versus tunable reflects simpler laser control electronics — no wavelength locking feedback loop, no channel table firmware, no tuning calibration during manufacturing.
|
|
||||||
|
|
||||||
Tunable DWDM modules achieve wavelength agility through a thermally-tuned distributed Bragg reflector (DBR) laser or an external-cavity laser design with a MEMS tunable filter. The full C-band tunable range is nominally 1528-1565nm (195.9 THz down to 191.7 THz), covering all 96 channels on 50GHz ITU spacing per G.694.1. In practice, most 100G QSFP28 tunable implementations cover channels 17 to 61 on 100GHz spacing (193.7 THz to 190.9 THz), which is sufficient for 40-50 usable DWDM channels — the practical maximum for metro CWDM multiplexers anyway. Full C-band tunable QSFP28 modules from Lumentum OCLARO LC25CW-20A series or the Flexoptix tunable QSFP28 cover the complete 96-channel grid and are priced at approximately €2,000-3,500 per unit. The premium over fixed-wavelength is roughly 2.5-4x per unit.
|
|
||||||
|
|
||||||
The inventory argument for tunable is the strongest one. A network operator maintaining 24 DWDM channels across 6 metro sites needs, in a fixed-wavelength world, 24 distinct SKUs plus spares for each — a sensible spare policy of 10-15% means carrying 3-4 spare units per channel, or 72-96 spare modules. Each spare module is tied to a specific wavelength and can only be used as a drop-in replacement for a failed module on that exact channel. The capital cost of spares inventory alone is 72 units x €1,000 average = €72,000, most of which sits on a shelf for the module's entire 7-10 year service life without generating any value. With a tunable module, one SKU covers all 24 channels. A spare inventory policy of 10% coverage requires only 3-4 units total: €3,500 x 4 = €14,000. The spare inventory savings alone — €58,000 in this scenario — exceed the total optics price premium for the tunable modules on a deployment of reasonable scale.
|
|
||||||
|
|
||||||
The operational argument for tunable is compelling in mesh and ring topologies where wavelength assignment may need to change without physical access. A carrier running a multi-ring metro topology with protected paths needs to pre-position spare capacity at each node. With fixed-wavelength modules, pre-positioning a spare at node C to cover a potential failure on node A requires that node C carry a spare on each wavelength currently active in the network — because you don't know at sparing time which wavelength the failure will affect. With tunable modules, a single spare module at node C can be remotely configured to any failed wavelength in minutes via NETCONF/YANG configuration, eliminating the need to physically dispatch a field technician to swap a wavelength-specific module. For a carrier with 40 nodes across a regional metro network, this represents a meaningfully different disaster recovery posture.
|
|
||||||
|
|
||||||
The startup latency of tunable modules deserves honest discussion because it is a real limitation that some vendors understate. When a tunable DWDM module powers up or when its target channel is changed via management interface, the laser must acquire lock to the new target frequency. This tuning and locking process typically takes 10-90 seconds depending on the module's thermal control loop design, the magnitude of the wavelength change (switching from channel 20 to channel 21 is faster than switching from channel 20 to channel 60), and the ambient temperature stability. A fixed-wavelength module, by contrast, is typically at stable operating output within 5-15 seconds of power-up since no frequency acquisition is required — the laser simply stabilizes at its preset operating point.
|
|
||||||
|
|
||||||
For automatic protection switching applications where a failed DWDM path needs to be restored in under 50ms (the typical SONET/SDH-legacy restoration target that some carrier SLAs still reference), tunable module re-wavelength provisioning is not a valid restoration mechanism. Protection switching on DWDM networks at this speed requires pre-provisioned protection paths using existing wavelengths, not real-time tuning. Tunable modules are a provisioning flexibility tool, not a sub-second restoration mechanism, and any proposal that describes them as such should be rejected.
|
|
||||||
|
|
||||||
The 50GHz vs 100GHz grid question intersects with the tunable vs fixed decision. High-density 50GHz grid operation requires tighter laser frequency stability (±1.25 GHz vs ±2.5 GHz for 100GHz), narrower optical passband filters in the OADM or multiplexer, and correspondingly stricter chromatic dispersion tolerance since narrower optical bandwidth means more sensitivity to nonlinear effects. Tunable modules certified for 50GHz operation carry a higher manufacturing cost due to tighter laser characterization during QA; the premium for 50GHz-capable tunable versus 100GHz-only tunable is typically €200-400. Most current metro deployments start on 100GHz grid with path to 50GHz grid densification as traffic grows — a tunable module with 50GHz capability is the rational choice if densification within 3-5 years is plausible.
|
|
||||||
|
|
||||||
What carriers actually deploy in production provides useful calibration. Tier-1 European carriers running large-scale metro DWDM typically use tunable coherent pluggables (primarily 100G and 200G CFP2-DCO or QSFP28 ZR+) for all interoffice connections where fiber cost makes wavelength sharing economically mandatory. For customer-facing access ports where each circuit is on a dedicated fiber pair anyway — DSL aggregation, business Ethernet handoffs — fixed-wavelength or even grey optics remain the cost-optimized choice since there's no wavelength-sharing advantage to exploit. The operator who deploys tunable everywhere including fiber-rich direct access links is paying a wavelength management premium without receiving the corresponding fiber lease savings benefit. The operator who deploys fixed-wavelength everywhere including dense metropolitan fiber corridors where 80+ circuits share infrastructure is paying thousands per month in avoidable fiber lease costs. The decision framework is simple: count the parallel circuits on each segment, calculate the fiber lease cost per pair, and let the numbers determine where the wavelength flexibility premium pays for itself.
|
|
||||||
@ -1,24 +0,0 @@
|
|||||||
---
|
|
||||||
title: "Forward Error Correction at 400G: What It Fixes, What It Can't, and Why Pre-FEC BER Matters"
|
|
||||||
type: technology_deep_dive
|
|
||||||
target_audience: technical
|
|
||||||
score: 9/10
|
|
||||||
---
|
|
||||||
|
|
||||||
Forward Error Correction is one of those topics where engineers learn just enough to be dangerous: they know FEC makes bad links work and they trust that a clean post-FEC BER means the link is healthy. Both beliefs are dangerously incomplete. At 400G speeds where RS-FEC is mandatory rather than optional, understanding the specific mathematical behavior of FEC — what it corrects, what it cannot correct, and crucially, what a "good" pre-FEC BER actually tells you about link health — is the difference between proactive link management and discovering a failing link at 3 AM when it finally crosses into uncorrectable territory.
|
|
||||||
|
|
||||||
Reed-Solomon FEC as implemented in IEEE 802.3bs Clause 91 for 400G-FR4 and 400G-DR4 uses the RS(544,514) codeword structure. Each codeword consists of 514 ten-bit information symbols and 30 ten-bit parity symbols, for a total of 544 symbols. The error correction capability of this code is t = 15 — it can correct up to 15 symbol errors per codeword with certainty. One "symbol error" in this context means any error pattern within a single 10-bit symbol, regardless of whether it's one corrupted bit or all ten bits. This is the theoretical machinery, and understanding its limits requires thinking about what happens to error distributions as link quality degrades.
|
|
||||||
|
|
||||||
The RS-FEC designed operating point is a pre-FEC BER of approximately 2 × 10⁻⁴. At this input error rate, statistical analysis shows that the probability of receiving a codeword with more than 15 symbol errors is vanishingly small — roughly 10⁻¹⁵ — so the post-FEC BER at the output is effectively zero. This is the regime where RS-FEC is doing exactly what it was designed to do: correcting the handful of symbol errors introduced by PAM4 signal imperfections, chromatic dispersion residuals, and thermal noise, while delivering a clean output to the MAC layer. IEEE 802.3bs selected this operating point deliberately — the 400G PAM4 modulation scheme was specified with RS-FEC as an integral assumption, meaning the optics and electrical interfaces are not required to deliver 10⁻¹² BER on their own. They only need to deliver 2 × 10⁻⁴ pre-FEC BER, and RS-FEC handles the remaining correction.
|
|
||||||
|
|
||||||
KP4-FEC (also known as KP-FEC or IEEE 802.3bs Clause 91 in its 50G-per-lane variant) is used for 50G-KR/CR and 50G-SR NRZ, as well as 100G-PAM4 in certain implementations. KP4 uses RS(544,514) with symbol size of 10 bits — technically identical to the 400G variant but applied to lower-speed lanes. KR4-FEC for 100G NRZ uses RS(528,514) with 14 parity symbols and t = 7 correction capability, which is why 100G-CR4 with KR4-FEC has a designed pre-FEC BER operating point of approximately 1 × 10⁻⁴ — tighter than KP4's 2 × 10⁻⁴ requirement, reflecting the lower PAM4 modulation noise versus NRZ at 25G per lane.
|
|
||||||
|
|
||||||
The error floor problem is where FEC behavior becomes non-obvious. If a link's pre-FEC BER exceeds roughly 1 × 10⁻³, the probability of receiving a codeword with more than 15 symbol errors climbs steeply. In this regime, RS-FEC cannot correct the codeword — it detects the uncorrectable error and has two choices: output the corrupted codeword as-is, or output a pattern of all-zeros or all-ones (an "error indication"). Most hardware implementations output the corrupted symbols, which means that when pre-FEC BER is so high that codewords become uncorrectable, the post-FEC BER may actually be worse than the pre-FEC BER. The FEC correction mechanism is adding burst errors from failed correction attempts to the already-high symbol error rate. This is mathematically inevitable, not a firmware bug: RS(544,514) with t=15 correction, when encountering codewords with 30-40 symbol errors, produces 30-40 output errors rather than correcting them. An engineer who sees a link with stable post-FEC BER of 10⁻⁸ and assumes the link is fine because "the errors are being corrected" may be looking at a link running at pre-FEC BER of 5 × 10⁻⁴ that is one dirty connector away from uncorrectable territory.
|
|
||||||
|
|
||||||
Accessing pre-FEC BER in production environments requires platform-specific CLI commands that are not universally implemented through SNMP MIBs or standard DOM registers. On Cisco Nexus NX-OS, the command is "show interface ethernet X/Y/Z phy" with the ber-counters keyword in Nexus 9000 series; on older NX-OS versions the RS-FEC counters are accessible via "show hardware internal errors fec interface". On Arista EOS, "show interface ethernet X/Y phy detail" or "show interfaces ethernet X/Y counters fec" exposes pre-FEC and post-FEC BER and symbol error counts. Juniper QFX/EX exposes FEC counters via "show pfe statistics traffic" with port-level drill-down, though the exact path varies by Junos major version. The absence of a standardized MIB path for pre-FEC BER is a genuine operational gap — it means automated monitoring of this critical health indicator requires vendor-specific collection.
|
|
||||||
|
|
||||||
The latency penalty of RS-FEC is real and context-dependent. The RS(544,514) encoder and decoder introduce a pipeline latency that is typically 100-150 nanoseconds for the decoder alone, with the encoder adding another 50-80ns. For 400G switch applications, this latency is fully accounted for in 802.3bs's maximum allowable latency budget and presents no operational issue. For ultra-low-latency trading applications where the switch cut-through latency budget is being measured in single-digit nanoseconds and FEC bypasses are a design consideration, the 150-200ns RS-FEC overhead is meaningful. However, FEC bypass is not possible for PAM4 400G links since the pre-FEC BER operating point of 2 × 10⁻⁴ requires correction — running 400G PAM4 without FEC would produce a post-FEC BER of 2 × 10⁻⁴, which is orders of magnitude above Ethernet's 10⁻¹² target. The FEC latency is an intrinsic property of the 400G architecture, not a configurable parameter.
|
|
||||||
|
|
||||||
The aging dimension is where pre-FEC BER monitoring delivers its highest operational value. A newly installed 400G-DR4 link on clean OS2 fiber with well-cleaned connectors will show pre-FEC BER in the range of 5 × 10⁻⁵ to 1 × 10⁻⁴ — well within the designed operating point with significant margin. As the optics age and laser output power gradually declines (typical VCSEL and DFB laser aging: 0.05-0.1 dB/year of TX power reduction), as connectors accumulate contamination particulate deposits between cleanings (each insertion event on an SC or LC connector deposits roughly 100-300 particles), and as fiber connectors experience micro-fracturing from repeated flexion, the pre-FEC BER drifts upward. A link that starts at 10⁻⁴ and shows 8 × 10⁻⁴ after 18 months of operation is consuming 80% of its FEC margin. Post-FEC BER is still zero; the link appears perfectly healthy to any monitoring that looks only at post-FEC counters. But a single additional degradation event — a dirty connector, a temperature excursion during a summer cooling failure, a 0.3 dB splice loss increase in the fiber plant — pushes that link into uncorrectable BER territory. The margin was consumed quietly while the monitoring dashboard showed green.
|
|
||||||
|
|
||||||
The operational conclusion is uncomfortable but important: post-FEC BER of zero is not a meaningful health indicator at 400G. Pre-FEC BER trending, monitored at minimum daily and ideally every 15 minutes, is the actual health metric for optical links in the PAM4 era. Any 400G monitoring strategy that relies solely on link-up/link-down states and post-FEC error counters is creating operational risk that will manifest at the worst possible time.
|
|
||||||
@ -1,22 +0,0 @@
|
|||||||
---
|
|
||||||
title: "Co-Packaged Optics: What CPO Actually Means for the Pluggable Transceiver Market"
|
|
||||||
type: hype_cycle
|
|
||||||
target_audience: technical
|
|
||||||
score: 9/10
|
|
||||||
---
|
|
||||||
|
|
||||||
The CPO narrative that dominated networking conferences from 2022 through 2024 was built on a genuine engineering insight wrapped in a timeline that was chronically optimistic. The insight is real: the fundamental constraint limiting I/O efficiency in switch ASICs at 51.2 Tbps and beyond is the electrical interface between the ASIC die and the optical transceiver, specifically the PCB traces, electrical connectors, and SerDes front-end circuitry that collectively introduce 10-15 dB of electrical insertion loss at 56 Gbaud PAM4 signaling rates. Co-Packaged Optics addresses this constraint by integrating the optical I/O directly into the switch ASIC package, eliminating most of that electrical path. The timeline claims — "CPO will displace pluggable by 2025" — were engineering theater, not engineering analysis.
|
|
||||||
|
|
||||||
The physics problem CPO solves is concrete. At 51.2 Tbps switching capacity, a merchant silicon ASIC (Broadcom Tomahawk 4 or equivalent) drives 512 SerDes lanes at 100 Gbps each to reach total fabric capacity. Each SerDes lane drives a signal from the die through the ASIC package substrate, across PCB traces of 5-15 cm, through an electrical connector (SFP, QSFP, or OSFP cage), and into the pluggable transceiver. The total electrical insertion loss at 56 Gbaud on a typical route is 8-14 dB, which the SerDes driver must overcome with pre-emphasis and equalization. This equalization consumes power: roughly 20-30 pJ per bit for the SerDes on the ASIC die, which at 51.2 Tbps becomes 1.0-1.5 kW of SerDes power alone. By moving the optical engine to within 2-3 mm of the ASIC die — co-packaged in the same flip-chip BGA package or in an adjacent silicon bridge die — the electrical path length drops to 3-5 mm of silicon interposer, reducing insertion loss to 1-2 dB. This reduces SerDes power by an estimated 60-75%, from roughly 25 pJ/bit to 6-10 pJ/bit.
|
|
||||||
|
|
||||||
Broadcom has publicly discussed the "Bailly" architecture for 102.4 Tbps CPO implementations, and Intel has demonstrated CPO chiplets with its own roadmap for integration into future Tofino successors. The claimed system-level power reduction is 3-4x for the I/O subsystem, which at hyperscale volumes translates to tens of megawatts of avoided data center power consumption. This is why Google, Meta, and Amazon have been funding CPO research — not because they care about per-unit optics cost, but because their power bills for switching I/O infrastructure are measured in hundreds of megawatts.
|
|
||||||
|
|
||||||
The manufacturing problem that makes the timeline claims unrealistic is multi-die package integration yield. A co-packaged optical ASIC combines the switch fabric die (approximately 900mm² in 5nm TSMC for Tomahawk 4 equivalent), silicon photonics transceiver dies (one per port group, typically), and the package substrate routing them together. The overall package yield is the product of individual die yields: if the fabric die yields at 85% and each of eight optical dies yields at 90%, the assembled package yield is 0.85 × (0.90)⁸ = 0.85 × 0.43 = 36%. A 36% package yield on a package that costs $5,000-8,000 in materials makes per-unit economics catastrophic during ramp. Pluggable transceivers can fail a manufacturing test and be discarded individually; in a CPO package, a failed optical die means a $5,000+ assembly goes to scrap. This is the yield calculus that silicon photonics manufacturers must solve before CPO reaches production economics, and it is why IBM and Intel's own internal presentations at OFC 2023 showed first-production-volume targets of 2027-2029, not 2025.
|
|
||||||
|
|
||||||
Field replaceability is the operational argument that keeps CPO off the procurement roadmap for most enterprise and carrier deployments through at least 2030. A pluggable transceiver failure — MTBF typically 500,000-1,000,000 hours for quality 400G modules — is resolved by a field technician removing the failed module (30-second operation) and inserting a replacement. A CPO switch failure is a board replacement or system swap: the optical I/O is permanently integrated, so a single failed optical port group requires maintenance of the entire chassis. MTTR for a pluggable failure in a managed environment is typically 2-4 hours including parts dispatch. MTTR for a CPO system failure requiring chassis swap is 8-24 hours minimum, plus the cost of maintaining a full-system hot spare. For carrier-grade infrastructure with 99.999% availability requirements, this MTTR difference disqualifies CPO entirely until on-site optical repair and testing capabilities develop to the point where individual photonic die replacement becomes feasible — a capability that doesn't exist in field maintenance practice anywhere today.
|
|
||||||
|
|
||||||
The burn-in testing problem deserves mention as a secondary manufacturing challenge. Standard pluggable transceiver manufacturing includes 24-168 hours of burn-in at elevated temperature under electrical stress, a process that screens for infant mortality failures before the module leaves the factory. In a CPO package, you cannot burn in the optical dies independently after they're co-packaged with the ASIC — the burn-in temperature required to screen optical components (85°C, 168 hours per Telcordia GR-468) would degrade the CMOS gate oxides in the switch fabric die. This forces CPO manufacturers to either burn in optical dies before assembly (limiting the screen effectiveness) or accept higher field infant mortality rates on deployed systems. Neither is an acceptable answer for infrastructure with 7-10 year operational life expectations.
|
|
||||||
|
|
||||||
The practical impact on 800G infrastructure buying decisions today is precisely zero. Pluggable 800G QSFP-DD (IEEE 802.3ck Clause 153 for 800G-DR8) and OSFP 800G modules are in production from InnoLight, Coherent, Lumentum, and Earing. These modules will be in service in data center deployments through 2033-2035. CPO will begin appearing in hyperscale pilot deployments around 2028 at 51.2T or 102.4T fabric capacity points where the power economics justify the operational trade-offs. The pluggable market will expand to 1.6T (224 Gbaud per lane, 8 lanes per QSFP) before CPO reaches commercial maturity. Anyone presenting CPO as a near-term threat to pluggable investments in 2024-2027 infrastructure is projecting technology roadmap aspirations, not product availability.
|
|
||||||
|
|
||||||
The correct framing for CPO in 2026 is: a genuine long-term architectural evolution for hyperscale switching fabrics with compelling power economics, currently in the late R&D and early pilot phase, with no production deployments at commercial volume, and with three unsolved engineering problems (yield, burn-in, replaceability) that prevent economically rational deployment at enterprise or carrier scale before approximately 2028-2030. Pluggable transceivers at 400G, 800G, and eventually 1.6T will remain the dominant form factor for all foreseeable purchasing decisions. The investment in 800G pluggable infrastructure today faces zero technological obsolescence risk from CPO within its expected service life.
|
|
||||||
@ -1,26 +0,0 @@
|
|||||||
---
|
|
||||||
title: "CMIS 4.0: Why 400G Transceiver Management Is Fundamentally Different from 100G"
|
|
||||||
type: technology_deep_dive
|
|
||||||
target_audience: technical
|
|
||||||
score: 9/10
|
|
||||||
---
|
|
||||||
|
|
||||||
When a 400G QSFP-DD module is installed in a switch port and the interface doesn't come up, the most common diagnosis attempt is "the module is bad." In a significant fraction of these cases, the module is fine and the problem is a CMIS implementation incompatibility between the module's management firmware and the switch platform's driver. This failure mode didn't exist with SFP+ or QSFP28 because SFF-8472 and SFF-8636 use simple register polling without a required state machine. CMIS introduces mandatory state machine sequencing — miss a step, skip an initialization transaction, or run an older driver against a newer module, and you get a port that stays in Low Power Mode indefinitely while producing no error message that points to the actual problem.
|
|
||||||
|
|
||||||
The Common Management Interface Specification (CMIS) was developed by the OIF (Optical Internetworking Forum) specifically for high-density optical modules where the complexity of per-lane configuration exceeded what SFF-8636 could support cleanly. CMIS 4.0 (the version most current QSFP-DD and OSFP modules implement) is a 200+ page specification covering a register map with 128 pages of 128 bytes each (versus SFF-8636's 256 bytes of lower memory plus 255 pages of 128 bytes each, nominally comparable but structurally different), a formally defined module state machine, per-lane application configuration through Application Select registers, and a DataPath activation sequence that the host system must explicitly complete.
|
|
||||||
|
|
||||||
The SFF-8636 register map — which served 40G QSFP+ and 100G QSFP28 — treated a module essentially as a collection of four optical engines with a shared management interface. Configuration was largely static: you read the capabilities, verify the DOM thresholds, and the module was operational. The only state management required was optional "Low Power Mode" via LPMode pin or register, and most platforms simply ignored it. A QSFP28 inserted into an SFF-8636-compliant host would in most cases start transmitting within 2-3 seconds of insertion without any host-side initialization sequence.
|
|
||||||
|
|
||||||
CMIS changes this fundamentally through its state machine. A CMIS module powers up in either ModuleLowPwr or ModuleReady state depending on the LPMode pin logic at insertion. To activate the optical transmitter and enable data traffic, the host must execute a specific sequence: write the appropriate AppSel (Application Select) code to lane-specific registers to configure modulation format and data rate, write the DataPathPwrUp bit for each lane group, and then poll the DataPath state register until it confirms DataPathActivated state. This sequence is not optional or advisory — it is the defined CMIS initialization procedure, and a module that has not completed this sequence will remain with TX disabled. The DataPath activation process typically completes within 5-30 seconds on a functioning module with a compliant host driver.
|
|
||||||
|
|
||||||
The AppSel mechanism is one of CMIS's most powerful and most commonly misconfigured features. Each CMIS module publishes an Application List (up to 15 applications) that describes the modulation formats, data rates, and lane configurations it supports. A 400G QSFP-DD module might list applications including: App1 = 400GBASE-DR4 (4 lanes, 100G NRZ), App2 = 400GBASE-FR4 (4 lanes, 100G PAM4), App3 = 2x200G (8 lanes, 26.5625 Gbaud PAM4), App4 = 8x50G breakout mode. The host must read this application list, select the appropriate AppSel code for the intended use case, and program it into the per-lane AppSel registers. If the host driver programs an invalid AppSel code — selecting application index 2 on a module where application 2 is 2x200G but the platform expects 400G-DR4 — the module will initialize, the DataPath will activate, but the modulation format mismatch will produce a link that reports up at the physical layer while generating constant bit errors at the FEC layer.
|
|
||||||
|
|
||||||
CMIS version mismatches between module and host driver are the specific failure mode that most operations teams encounter without recognizing. CMIS 3.0 and CMIS 4.0 share the same high-level architecture but differ in specific register behaviors and state machine transitions. CMIS 4.0 introduces the concept of "Advertisement Pages" for capabilities not present in CMIS 3.0, and certain AppSel and DataPath configuration fields have subtly different semantics between versions. A switch platform with a CMIS 3.0 driver attempting to initialize a CMIS 4.0 module may successfully complete the state machine transitions (both versions have the same basic ModuleLowPwr → ModuleReady → DataPathActivated sequence) but may fail to correctly program the AppSel configuration or may interpret CMIS 4.0-specific status bytes as error conditions. The symptom is typically a module that links up on some platforms and not others, or a module that works on one firmware version of a platform but not a previous version.
|
|
||||||
|
|
||||||
Cisco's NX-OS CMIS implementation has been actively developed across releases and the version history matters. NX-OS 9.3(7) introduced initial QSFP-DD CMIS support; NX-OS 9.3(9) and later significantly improved CMIS 4.0 state machine handling. Cisco Nexus 9336C-FX2 running 9.3(6) has documented issues with specific CMIS 4.0 modules where the DataPath activation polling times out after 10 seconds instead of waiting the full 30 seconds some modules require, leaving the port in a stuck partial-initialization state that appears as "sfpAbsent" in show interface outputs even when the module is physically present. The fix is a NOS upgrade, not a module swap.
|
|
||||||
|
|
||||||
Arista EOS has generally maintained strong CMIS implementation quality across its QSFP-DD portfolio. EOS 4.26.2F and later implement full CMIS 4.0 state machine support including the 30-second DataPath activation timeout. Arista's CMIS implementation is explicitly documented in their transceiver compatibility matrix, and EOS will log a specific message at CMIS initialization failure with the state machine step that failed — making it far easier to diagnose CMIS issues on Arista than on platforms that simply log "transceiver not recognized." For Arista operators, the command "show interfaces ethernet X/Y transceiver" with the detail keyword shows the raw CMIS DataPath state, making it visible whether the module is in DataPathActivated, DataPathDeinit, or an intermediate state.
|
|
||||||
|
|
||||||
Juniper Junos CMIS support has tracked behind Arista and Cisco in the QSFP-DD generation, with production-stable CMIS 4.0 support arriving in Junos 22.1R1 for the QFX5220 and QFX5130 series. Prior to this release, certain CMIS 4.0 modules would be recognized by Junos (the module would show in "show chassis pic") but the DataPath would not activate, producing a port that showed "Link status: Up" at the physical layer PIC view while reporting "Operational link speed: Unknown" at the logical interface level. This is a distinct failure signature from a failed module and from an MSA EEPROM issue — it is specifically a CMIS driver problem.
|
|
||||||
|
|
||||||
For network engineers deploying 400G QSFP-DD at scale, the diagnostic protocol for a port that won't come up should follow this order: first, verify the NOS version against the known CMIS support matrix for the specific module vendor and CMIS version (readable from the module's CMIS version byte at address 01h); second, check the CMIS DataPath state registers directly if the platform provides that visibility; third, verify AppSel configuration matches the intended application. Testing the module in a different platform before concluding it is defective is not just good practice — it is the only reliable way to distinguish module failure from host driver failure, and on CMIS-based 400G infrastructure, the host driver problem is considerably more common than the module failure problem.
|
|
||||||
@ -1,24 +0,0 @@
|
|||||||
---
|
|
||||||
title: "How to Evaluate a Compatible Transceiver Vendor: The 7 Questions That Actually Reveal Quality"
|
|
||||||
type: buying_guide
|
|
||||||
target_audience: sales
|
|
||||||
score: 9/10
|
|
||||||
---
|
|
||||||
|
|
||||||
The compatible transceiver market has a problem that its OEM equivalent does not: the barrier to entry is extremely low, and a vendor who cannot distinguish their quality from a factory-stock relabeler has every incentive to not raise the question. A company can procure generic SFP28 SR modules from a Shenzhen ODM, apply their own label, and sell them into enterprise data centers where they will work acceptably on Arista hardware and fail unpredictably on Cisco or Nokia platforms. The people who get hurt are the operations teams who spend hours debugging "transceiver not recognized" errors that could have been avoided by asking seven specific questions before placing the first purchase order.
|
|
||||||
|
|
||||||
The first question is: where does your EEPROM programming happen, and can you show me the programming record for my specific order? EEPROM programming is the step that determines whether a compatible module will be recognized as supported on a specific switch platform. Every module has a manufacturer-programmed EEPROM from the optical component factory; this factory EEPROM contains the component manufacturer's details and a generic vendor name, not the compatible vendor's platform-specific compatibility data. A quality compatible vendor reprograms this EEPROM in-house — changing vendor name, part number, OUI bytes, and platform-specific compatibility fields — using platform-specific templates developed and tested against actual switch hardware. Flexoptix programs at their facility in Karlsruhe; they can provide the exact EEPROM template version and target platform specification used for any given order. A vendor who answers "the modules come programmed from the factory" is telling you they're shipping factory-stock ODM product — the generic EEPROM may work fine on Arista, which does minimal EEPROM validation, and will fail on Cisco Catalyst 9500 or Nokia 7750 at a meaningfully non-zero rate.
|
|
||||||
|
|
||||||
The second question is: what is your burn-in protocol, duration, and temperature profile? Burn-in is the thermal and electrical stress screening process that identifies infant mortality failures before they reach the customer. The Telcordia GR-468 standard for optical transceiver reliability specifies 2,000 device hours at 85°C as the basis for MTBF projection, though the practical standard for incoming burn-in screening is typically 24-168 hours at 70-85°C under operational bias conditions. A 24-hour burn-in at 70°C will catch roughly 60-70% of infant mortality failures; a 168-hour burn-in at 85°C catches over 90%. FS.com and 10Gtek, which compete heavily on price, typically disclose 24-hour burn-in on their data sheets. Flexoptix and ProLabs run 168-hour extended burn-in on their production modules. That 7x difference in burn-in duration translates directly to the field failure rate in the first 90 days of operation — the period when infant mortality failures occur — and that field failure rate shows up in your operations team's time budget.
|
|
||||||
|
|
||||||
The third question is: do you publish actual measured TX power and RX sensitivity distributions for your modules, or only the MSA specification range? There is a meaningful difference between "TX power: -1.0 dBm to +3.5 dBm (SFF-8431 spec range)" and "TX power: 1.8 dBm ± 0.6 dBm (measured distribution from our production lot, n=500 units)." The MSA specification range defines the IEEE 802.3 compliance window; it does not tell you where in that range a given vendor's production typically sits. A module with a production center of -0.5 dBm TX power technically meets MSA spec (minimum is -1.0 dBm) but provides 1 dB less margin than a module centered at 1.5 dBm. In a long-reach application running close to the receiver sensitivity limit, that 1 dB difference is the difference between a solid link and an intermittently erroring one. Vendors who publish actual distribution data are doing production measurements; vendors who can only cite the MSA spec range are not doing lot-level characterization and don't know where their production centers.
|
|
||||||
|
|
||||||
The fourth question is: what is your production RMA rate, and can you break it down by SKU and customer platform? An RMA rate below 0.3% indicates a well-controlled manufacturing and QC process. An RMA rate of 1-2% indicates QC issues or EEPROM programming problems that show up as platform incompatibilities. An RMA rate above 3% is a red flag that usually indicates one or more of: factory-stock ODM product without adequate burn-in, EEPROM templates not validated against current NOS versions, or optical component sourcing from inconsistent suppliers. Most vendors will not publish RMA rates proactively; asking directly, and asking for platform-specific breakdowns, reveals whether they track this data at all. A vendor who doesn't track RMA rates by target platform cannot improve their EEPROM templates because they don't know which templates are producing failures.
|
|
||||||
|
|
||||||
The fifth question is: do you offer firmware or EEPROM update capability for modules in the field? Platform NOS upgrades occasionally change transceiver validation behavior — a Nexus upgrade from NX-OS 9.3(9) to 10.2(1)F may implement stricter checking of EEPROM fields that were previously ignored, causing previously-working modules to generate new warning messages or in edge cases to deactivate. A compatible vendor with in-house EEPROM programming capability can provide updated EEPROM firmware for affected modules, either through a field reprogramming tool (Flexoptix provides the Flasher tool for this purpose) or through module exchange. Vendors who rely entirely on factory-programmed ODM stock cannot respond to this need — their customers are simply stuck with whatever the factory template programmed until they buy new modules.
|
|
||||||
|
|
||||||
The sixth question is: can you provide a BER test report demonstrating performance on my specific target platform and NOS version? Not a generic "tested on Cisco Nexus" claim, but a specific test report showing: platform model (e.g., Nexus 9336C-FX2), NOS version (e.g., NX-OS 10.2(3)F), line card type (e.g., N9K-X9716D-GX), test methodology (BERT at 10⁻¹² threshold), and measured pre-FEC BER at maximum specified reach on the specific fiber type (OS2, SMF-28). A vendor who provides this level of documentation has an actual test infrastructure. A vendor who says "we're compatible with Cisco" without being able to produce test reports has not done the work. The practical significance: a module that works on a QFX5120-48Y but produces persistent pre-FEC BER of 5 × 10⁻⁴ on a Nexus 9300 due to a CDR tuning difference between the two platforms' host equalization is not "compatible" in any operationally meaningful sense — it's marginal.
|
|
||||||
|
|
||||||
The seventh question is: what is your supply chain for optical components, and can you guarantee source consistency across a large order or repeat orders? ODM-sourced modules from a given factory can change the underlying optical component supplier (laser diode, PIN photodiode, TIA IC) between production lots without changing the part number, because the EEPROM template and mechanical housing are identical. A well-run compatible vendor qualifies their optical components at the bill-of-materials level, not just the finished module level, and maintains component qualification certificates. This matters for large-scale deployments where you need to be confident that module 5,000 in a batch performs identically to module 1 — not just within the MSA spec range but within the same distribution that your initial bench testing characterized.
|
|
||||||
|
|
||||||
Applying these questions honestly against the competitive landscape: Flexoptix programs in-house in Karlsruhe, publishes their Flasher tool for field EEPROM updates, and maintains platform-specific EEPROM templates as a core competency rather than an afterthought. ProLabs similarly maintains in-house programming and publishes reasonable test documentation; they're a credible alternative for large enterprise accounts. FS.com and ATGBICS compete primarily on price and do well on high-volume standard SKUs like SFP28 SR and QSFP28 LR4 on Arista and Juniper, but their long-tail SKUs (CWDM QSFP28, specific DWDM channels, exotic reach variants) and their performance on Cisco Catalyst platforms with strict EEPROM validation are where the quality gap becomes visible. 10Gtek and Optcore are factory-stock resellers for the most part; acceptable for Arista-only environments where EEPROM validation is minimal, but not appropriate for mixed-vendor environments where CMIS implementation differences and EEPROM platform hooks create failure modes that generic templates don't address. The market hasn't commoditized to the point where all compatible vendors are equal, and the seven questions above are the instruments that reveal the differences.
|
|
||||||
@ -1,59 +0,0 @@
|
|||||||
---
|
|
||||||
title: "Silicon Photonics in 2026: What's Actually Shipping vs. What's Still a Slide Deck"
|
|
||||||
slug: "silicon-photonics-co-packaging-2026"
|
|
||||||
category: "Technology Trends"
|
|
||||||
tags: ["silicon photonics", "co-packaged optics", "CPO", "800G", "1.6T", "datacenter"]
|
|
||||||
seo_focus_keyword: "silicon photonics co-packaged optics 2026"
|
|
||||||
word_count_target: 1200
|
|
||||||
difficulty: advanced
|
|
||||||
---
|
|
||||||
|
|
||||||
The silicon photonics hype cycle has been running long enough that some of us are getting repetitive stress injuries from rolling our eyes at press releases. Every year, a new round of announcements claims the era of co-packaged optics (CPO) has arrived, that pluggable transceivers are dinosaurs awaiting the meteor, and that your entire optical inventory strategy needs rethinking. Most of it lands somewhere between premature and outright fiction. But underneath the noise, real things are happening — and in 2026, the signal-to-noise ratio has finally improved enough to have an honest conversation.
|
|
||||||
|
|
||||||
**What silicon photonics actually is, briefly**
|
|
||||||
|
|
||||||
Silicon photonics integrates optical components — waveguides, modulators, photodetectors — onto silicon substrates using standard CMOS fabrication. The appeal is obvious: leverage the trillion-dollar semiconductor manufacturing ecosystem to produce optical devices at semiconductor scale and cost. The challenge has always been that silicon is a poor light emitter (indirect bandgap), so lasers must be bonded or coupled from III-V materials, adding process complexity.
|
|
||||||
|
|
||||||
Co-packaged optics takes this a step further: instead of a pluggable transceiver in a front-panel cage, the optics are integrated directly onto the switch ASIC package, reducing the electrical path from chip to fiber to near-zero. This matters enormously above 800G, where driving high-speed SerDes signals across PCB traces and connectors becomes thermally and electrically expensive.
|
|
||||||
|
|
||||||
**What is actually shipping in 2026**
|
|
||||||
|
|
||||||
Let's be precise about "shipping." Shipping means you can buy it in volume, put it in a production network, and get vendor support when it breaks at 2 AM.
|
|
||||||
|
|
||||||
By that standard, silicon photonics transceivers in pluggable form factors — QSFP-DD and OSFP — are genuinely shipping and have been for a couple of years. Coherent 400ZR implementations from vendors like Cisco (QDD-400G-ZR-S), Ciena, and Lumentum's OEM supply chain all use SiPh modulator technology. The 400G FR4 and DR4 modules from multiple vendors — Intel, Inphi (now Marvell), II-VI (now Coherent) — are SiPh-based in varying degrees. This is not a future thing; you're probably already running silicon photonics in your network.
|
|
||||||
|
|
||||||
Where things get murkier is CPO itself. Intel's Integrated Photonics Solutions division demonstrated CPO integration with Tofino derivatives, and their "Co-Packaged Optics Reference Platform" made the rounds. Broadcom has shown CPO integration with their Tomahawk and Trident ASIC families. Arista announced CPO intent. None of this was available for general purchase in production quantities as of Q1 2026. The honest timeline for CPO in production fabrics is 2027–2028 for early adopters, 2029–2030 for broad enterprise availability. Anyone telling you otherwise has confused "sampling to hyperscalers under NDA" with "available."
|
|
||||||
|
|
||||||
**The CPO timeline reality check**
|
|
||||||
|
|
||||||
CPO faces problems that are not merely engineering — they are systemic. The most underappreciated one is serviceability. A pluggable transceiver can be hot-swapped in seconds. A co-packaged optical module is soldered or mechanically bonded to the switch ASIC. When it fails — and eventually, optics fail — you are replacing the entire switch or returning it to depot. For a hyperscaler with a dedicated spares and logistics operation, this is manageable. For an enterprise with 200 switches in three datacenters and a lean network team, the calculus looks very different.
|
|
||||||
|
|
||||||
Then there's the thermal problem. Co-packaging puts a significant heat source — optical transmitters dissipate several watts each, and a 51.2 Tbps switch has a lot of them — directly adjacent to the CMOS switching silicon. Managing this without degrading the ASIC's operational envelope requires sophisticated thermal co-design. Early CPO designs have shown thermal coupling issues that required chassis-level redesigns.
|
|
||||||
|
|
||||||
The fiber management story is also unresolved. Current pluggable deployments allow structured cable management with discrete transceivers. CPO requires fiber connection directly to the switch ASIC area, creating a new set of constraints for cabling density and bend radius management in high-density racks.
|
|
||||||
|
|
||||||
**Which vendors are worth watching seriously**
|
|
||||||
|
|
||||||
Intel is probably the most credible player in silicon photonics CPO, having acquired Inphi in 2021 and built significant IP. Their 2 µm process for photonics is mature by semiconductor standards. The concern is organizational: Intel's restructuring has shuffled the photonics division's priorities multiple times, and the roadmap continuity is not guaranteed.
|
|
||||||
|
|
||||||
Marvell (post-Inphi) has deep DSP expertise and is integrating SiPh transmit/receive into their coherent DSP chiplets. Their 400ZR and 800ZR coherent implementations are technically strong.
|
|
||||||
|
|
||||||
Broadcom's approach is more conservative: they are offering CPO as an option on future ASIC generations rather than making it the primary form factor. This is probably the right call for an ecosystem that has to serve both hyperscalers and the broader market.
|
|
||||||
|
|
||||||
On the startup side, Ayar Labs has been interesting — their in-package optical I/O approach targets chiplet interconnects rather than front-panel ports, which is a different problem. Lightmatter (now focusing on photonic computing, but with relevant interconnect IP) and Ranovus are worth monitoring.
|
|
||||||
|
|
||||||
The OEM ecosystem for coherent transceivers — Acacia (now Cisco), Lumentum, II-VI/Coherent — has largely converged on silicon photonics for the modulator, while maintaining III-V lasers. This hybrid approach is pragmatically solid and likely remains dominant through 2028.
|
|
||||||
|
|
||||||
**What it means for your procurement decisions right now**
|
|
||||||
|
|
||||||
If you're speccing a datacenter refresh today, buy pluggable. The 800G OSFP and QSFP-DD ecosystem is mature enough for production deployment, silicon photonics-based transceivers offer credible alternatives to traditional InP-based optics for coherent applications, and the cost curves are improving. None of this requires betting on CPO timelines.
|
|
||||||
|
|
||||||
If you're designing a greenfield hyperscale-class facility with a 2028+ production date, you should have CPO in your architecture conversations. Not because it will definitely be ready, but because it will likely be ready and you want to design switching room topology, cable plant, and sparing strategies that don't have to be completely unwound when it arrives.
|
|
||||||
|
|
||||||
The thermal envelope reality: current 51.2 Tbps ASICs like Broadcom's Tomahawk5 already push the limits of what pluggable optics can handle at full port density. At 102.4 Tbps and beyond, the physics increasingly favor tighter integration. CPO is not a marketing story — it's a thermodynamics argument, and thermodynamics tends to win.
|
|
||||||
|
|
||||||
**The one thing the press releases never say**
|
|
||||||
|
|
||||||
Silicon photonics manufacturing yield, particularly for CPO, remains below what's needed for commodity pricing. The integration of III-V lasers (still necessary for high-efficiency emission) with silicon waveguides involves bonding processes that are sensitive to temperature gradients and surface cleanliness. Until those yields improve significantly, CPO will carry a cost premium that makes it suitable only for applications where density and power savings outweigh hardware cost — which, at hyperscale, they already do, but at enterprise scale, not yet.
|
|
||||||
|
|
||||||
The honest summary: silicon photonics transceivers are real and in your network today. CPO is real engineering with real demonstrations. Volume production for non-hyperscale customers is a 2028 story at the earliest. Plan accordingly, ignore the press releases, and ask any vendor claiming "shipping CPO" exactly what their MTBF data looks like at 6 months of sustained operation.
|
|
||||||
@ -1,59 +0,0 @@
|
|||||||
---
|
|
||||||
title: "OSFP vs. QSFP-DD for 800G: The Port Density Math Nobody Shows You"
|
|
||||||
slug: "800g-osfp-vs-qsfp-dd-port-density"
|
|
||||||
category: "Hardware Selection"
|
|
||||||
tags: ["800G", "OSFP", "QSFP-DD", "port density", "switch selection", "thermal management"]
|
|
||||||
seo_focus_keyword: "800G OSFP QSFP-DD comparison port density"
|
|
||||||
word_count_target: 1200
|
|
||||||
difficulty: intermediate
|
|
||||||
---
|
|
||||||
|
|
||||||
When 800G arrived in production, it brought with it a form factor argument that remains genuinely unresolved. OSFP (Octal Small Form-factor Pluggable) and QSFP-DD (Quad Small Form-factor Pluggable — Double Density) both deliver 800G, both have real products on the market, and both have partisans who will tell you the other one is a dead end. The truth is more useful than either camp admits: they serve different optimization targets, and choosing between them requires doing actual math rather than accepting vendor narrative.
|
|
||||||
|
|
||||||
**The physical reality**
|
|
||||||
|
|
||||||
OSFP is larger. That's the starting point. An OSFP module is approximately 22.58 mm wide and 107.8 mm deep. A QSFP-DD is 18.35 mm wide and 89.4 mm deep. The extra width of OSFP is not an accident — it provides more surface area for heat dissipation and space for the eight 100G electrical lanes that connect to the host ASIC. OSFP was designed with the thermal requirements of 800G optics as a primary constraint, not an afterthought.
|
|
||||||
|
|
||||||
QSFP-DD is mechanically backwards-compatible with QSFP28 and QSFP+ cages, which is a significant installed base advantage. If you have 400G QSFP-DD infrastructure, you have the right cage geometry for 800G QSFP-DD modules — though the electrical and thermal specifications differ enough that you should not assume a cage designed for 400G QSFP-DD will simply handle 800G without validation.
|
|
||||||
|
|
||||||
**Port density: the actual numbers**
|
|
||||||
|
|
||||||
On a 1U switch with a standard 19-inch rack width, the front panel real estate is fixed. Let's work with a practical example: Arista's 7800R3 series and comparable Cisco Nexus 9000 platforms.
|
|
||||||
|
|
||||||
A 1U switch supporting OSFP typically achieves 32 OSFP ports. At 800G per port, that's 25.6 Tbps of front-panel bandwidth. A 1U switch supporting QSFP-DD at the same 800G speeds can often achieve 36 or even 40 ports in a dense implementation — approximately 32 Tbps.
|
|
||||||
|
|
||||||
That's a real 25% port density advantage for QSFP-DD, and it compounds in spine-layer deployments. If your spine tier runs 128 full-bisection uplinks to a leaf layer, the QSFP-DD spine switch requires fewer chassis units to deliver equivalent bandwidth. In a 3-tier fabric, the cumulative difference in rack units, power draws, and cabling can be significant.
|
|
||||||
|
|
||||||
However, this density advantage evaporates under thermal load. The maximum power dissipation per OSFP module is specified at 15W for current 800G modules, with some optical variants approaching 20W. QSFP-DD 800G modules target a 14W maximum, but many real-world implementations sit at 12–13W due to the tighter thermal budget imposed by the smaller form factor. Push the cage to 100% utilization with high-power coherent or long-reach optics, and QSFP-DD switches frequently hit thermal throttling thresholds that OSFP switches handle without incident.
|
|
||||||
|
|
||||||
**Thermal limits in practice**
|
|
||||||
|
|
||||||
The critical number to understand is the per-cage thermal budget, not the per-module number. Switch ASIC vendors like Broadcom publish thermal design power (TDP) for the ASIC itself, but the cage management system — the combination of cage size, airflow path, and heat sink geometry — determines whether you can realistically run all ports at full rated optical power.
|
|
||||||
|
|
||||||
For 800G SR8 short-reach optics (100m OM4 reach), module power consumption is typically 8–10W for OSFP and 7–9W for QSFP-DD. These are well within the thermal envelope of both form factors, and full port density is achievable.
|
|
||||||
|
|
||||||
For 800G DR8 optics (500m SMF reach, parallel fiber), modules run 12–14W. Both form factors handle this, but you should verify cooling configuration.
|
|
||||||
|
|
||||||
For 800G FR8 optics (2km SMF reach), some modules approach 18–20W. This is where OSFP's larger thermal mass becomes decisive. Running FR8 at full density in QSFP-DD is often not possible — vendors will explicitly rate those switches at 50–75% port utilization for high-power optical variants.
|
|
||||||
|
|
||||||
For coherent 800G ZR modules (80–120km), power consumption hits 20–25W. These modules are available in OSFP form factor (the 800ZR OSFP modules from Acacia/Cisco and Coherent) but not realistically in QSFP-DD for production use. If coherent is in your application, OSFP is not optional.
|
|
||||||
|
|
||||||
**When each makes sense: a practical decision guide**
|
|
||||||
|
|
||||||
Choose QSFP-DD when your application is primarily short to medium reach (SR8, DR8, FR4 at 800G), your fabric is bandwidth-density limited rather than thermal-limited, and QSFP backward compatibility with existing 400G infrastructure matters. DCI and hyperscale intra-datacenter fabrics running parallel SMF at distances under 2km are the sweet spot. The higher port density genuinely reduces capex in these scenarios.
|
|
||||||
|
|
||||||
Choose OSFP when you need coherent or extended-reach optics, when you're building a long-haul or metro aggregation layer that will run high-power DWDM modules, or when you expect optic generations to increase power consumption over the switch's service life. OSFP's larger thermal envelope is future insurance. The Arista 7130 series, Cisco 8000 series, and Juniper PTX series platforms all offer OSFP configurations for exactly this reason.
|
|
||||||
|
|
||||||
There is also a practical consideration about vendor roadmap alignment. OSFP is the form factor preferred by most coherent transceiver manufacturers for next-generation 1.6T implementations. The 1.6T OSFP specification is further along than the 1.6T QSFP-DD equivalent, in part because the thermal headroom required for 1.6T coherent simply doesn't fit in the QSFP-DD envelope. If you're designing infrastructure with a 5–7 year operational life, and that life includes 1.6T, OSFP gives you a more credible upgrade path.
|
|
||||||
|
|
||||||
**Breakout cabling and what it does to your math**
|
|
||||||
|
|
||||||
Both form factors support breakout: an 800G OSFP or QSFP-DD port can break out to 8×100G or 2×400G using an appropriate breakout cable or cassette. This is where the density comparison becomes context-dependent.
|
|
||||||
|
|
||||||
If you're using 800G ports as 2×400G breakouts for leaf-switch connectivity in a spine-leaf fabric, the QSFP-DD density advantage (more 800G ports = more breakout endpoints) can meaningfully increase your oversubscription headroom. If you're using breakout to 8×100G for server connectivity, the marginal density difference between OSFP and QSFP-DD per switch matters less than the cable management implications of running 8-fiber MPO breakout fans in a high-density rack.
|
|
||||||
|
|
||||||
**The cable plant consideration nobody mentions**
|
|
||||||
|
|
||||||
Both 800G DR8 and SR8 use parallel fiber — 16-fiber MPO16 connectors (or dual MPO8). This is a significant cable plant commitment. If your existing infrastructure uses duplex LC or MPO12, a migration to 800G at meaningful density requires rethinking your fiber trunk architecture, and neither OSFP nor QSFP-DD changes that. FR8 uses eight parallel SMF lanes in a dual-MPO configuration. Only FR4 (four wavelengths on two fibers) and coherent ZR maintain duplex LC compatibility, which is a strong argument for these form factors in campuses and enterprise metro rings where MPO infrastructure isn't already in place.
|
|
||||||
|
|
||||||
The bottom line is simple enough: for dense intra-DC 800G fabrics without coherent requirements, QSFP-DD wins on density. For anything involving coherent optics, high-power extended-reach modules, or roadmapping toward 1.6T, OSFP is the right platform. Buy the switch for the optical application, not the form factor preference.
|
|
||||||
@ -1,69 +0,0 @@
|
|||||||
---
|
|
||||||
title: "400ZR, OpenZR+, and ZR+: Cutting Through the Coherent Pluggable Confusion"
|
|
||||||
slug: "zr-zr-plus-coherent-pluggables-comparison"
|
|
||||||
category: "Coherent Optics"
|
|
||||||
tags: ["400ZR", "OpenZR+", "ZR+", "coherent", "DWDM", "pluggable", "metro", "long-haul"]
|
|
||||||
seo_focus_keyword: "400ZR OpenZR+ ZR+ coherent pluggable comparison"
|
|
||||||
word_count_target: 1200
|
|
||||||
difficulty: advanced
|
|
||||||
---
|
|
||||||
|
|
||||||
If you've spent any time speccing coherent wavelengths recently, you've encountered the naming problem. "ZR" appears in at least three distinct standards, each with different reach, modulation, interoperability, and price profiles — and the vendors marketing them have strong incentives to blur the distinctions. When a product is called "ZR+" by one vendor and "OpenZR+" by another, and the datasheet distances don't match either standard's specification, you're not being careless if you're confused. The ecosystem is genuinely messy.
|
|
||||||
|
|
||||||
Let's establish ground truth.
|
|
||||||
|
|
||||||
**400ZR: the interoperability standard**
|
|
||||||
|
|
||||||
OIF 400ZR (formally: OIF-400ZR-01.0) is an interoperability specification published by the Optical Internetworking Forum in 2020. It defines a coherent 400G interface targeting 80km reaches over single-span DWDM links using DP-16QAM modulation at a net data rate of 400 Gbps. The key design constraint was form factor: 400ZR was specified to fit in QSFP-DD and OSFP, enabling coherent optics in router and switch line cards rather than requiring dedicated transponder chassis.
|
|
||||||
|
|
||||||
The 400ZR specification is precise about what it requires: a target launch power of approximately 0 dBm, OSNR tolerance around 24.5 dB at FEC threshold, chromatic dispersion tolerance of ±2400 ps/nm, and compatibility with standard DWDM channel plans (50 GHz ITU-T grid). The FEC used is staircase FEC, chosen specifically for interoperability — you can mix 400ZR modules from different vendors on the same fiber pair and they will connect.
|
|
||||||
|
|
||||||
This last point is genuinely important and often undersold. The industry has a long history of "coherent" products that work perfectly in single-vendor deployments and fail to interoperate. 400ZR's explicit interoperability mandate, and the testing infrastructure OIF has built around it, means you can run Acacia 400ZR modules at one end and Lumentum 400ZR at the other end of an 80km span and get a functional link. That's not a trivial achievement.
|
|
||||||
|
|
||||||
The limitation is reach. 80km is a single amplifier span in a typical EDFA-amplified network. Multi-span, multi-amplifier metro and regional applications push 400ZR into margin deficit. For 120km, 200km, or continental-distance applications, 400ZR won't make the link budget without external amplification and careful OSNR management.
|
|
||||||
|
|
||||||
**OpenZR+: the flexible rate extension**
|
|
||||||
|
|
||||||
OpenZR+ is a multi-source agreement (MSA) specification, distinct from OIF, that extends the ZR concept to support multiple modulation formats and data rates. Specifically, OpenZR+ supports DP-QPSK at 100G, DP-8QAM at 200G, DP-16QAM at 300G and 400G, all on the same hardware platform through software-configurable DSP.
|
|
||||||
|
|
||||||
This rate flexibility is the core value proposition. An OpenZR+ module can be configured as 100G DP-QPSK for a 1500km terrestrial link (more robust modulation tolerates more OSNR degradation), 200G DP-8QAM for a 600km regional span, or 400G DP-16QAM for a short 80km metro hop. One SKU for multiple network applications.
|
|
||||||
|
|
||||||
OpenZR+ also specifies a maximum launch power of +1 dBm, slightly higher than 400ZR's 0 dBm target, giving marginally more headroom for longer spans. The FEC approach is generalized — OpenZR+ allows both staircase FEC and more advanced SD-FEC implementations, which is where interoperability gets complicated.
|
|
||||||
|
|
||||||
Here's the catch: OpenZR+ interoperability is specified at the 400G DP-16QAM operating point only, and even there, testing between different vendors' implementations has historically exposed edge cases. At 100G and 200G operating modes, OpenZR+ modules from different vendors may or may not interoperate, depending on DSP implementation choices. The MSA does not mandate the same level of cross-vendor testing that OIF requires for 400ZR. If you need a reliable multi-vendor deployment, 400ZR gives you stronger guarantees.
|
|
||||||
|
|
||||||
**ZR+: where the marketing fog thickens**
|
|
||||||
|
|
||||||
"ZR+" without the "Open" prefix is not a standard. It's a marketing term used by multiple vendors — primarily Cisco (Acacia) and Ciena (WaveLogic) — to describe their proprietary enhanced coherent pluggable products that go beyond 400ZR specifications. These products typically offer:
|
|
||||||
|
|
||||||
Higher reach: Cisco's QSFP-DD-400G-ZR+ targets 120km in single-span and can operate to 1000km+ with external amplification and rate adaptation. Ciena's WaveLogic 5 Nano in pluggable form pushes similar numbers.
|
|
||||||
|
|
||||||
Better sensitivity: Using proprietary soft-decision FEC and higher-performance DSPs (Acacia's Pico DSP, Ciena's WaveLogic silicon), vendor-specific ZR+ products achieve OSNR sensitivity several dB better than the 400ZR interoperability specification.
|
|
||||||
|
|
||||||
Multi-rate support: Like OpenZR+, most vendor ZR+ products support 100G/200G/300G/400G rate adaptation.
|
|
||||||
|
|
||||||
The cost: you are locked into single-vendor deployments for these wavelengths. A Cisco ZR+ module will not interoperate with a Ciena WaveLogic ZR+ at the endpoints of the same span, full stop. This matters enormously for disaggregated network architectures where router vendors and transponder vendors are mixed.
|
|
||||||
|
|
||||||
**Interoperability reality in 2026**
|
|
||||||
|
|
||||||
The ecosystem has matured, but the landmines remain. Here's the practical interoperability matrix:
|
|
||||||
|
|
||||||
400ZR (OIF) modules from any compliant vendor interoperate at 400G DP-16QAM on 80km single-span links. This has been validated extensively, including at OIF plugfests. If you're building a metro ring with multiple vendors' routers and need coherent 400G at moderate distances, this is the safe choice.
|
|
||||||
|
|
||||||
OpenZR+ modules interoperate reliably at 400G DP-16QAM between validated vendor pairs. The OIF OpenZR+ Testing Work Group has published interop matrices — check them. At lower rates, assume single-vendor operation unless you have specific test data.
|
|
||||||
|
|
||||||
Vendor ZR+ products are single-vendor propositions. The technical performance is often excellent — Acacia's modules in particular have a strong reputation for reach and sensitivity — but the ecosystem constraint is real. Plan for it.
|
|
||||||
|
|
||||||
One practical note: many network operators are deploying 400ZR for metro (<80km) and using vendor ZR+ or external transponder solutions for regional and long-haul applications. This hybrid approach optimizes interoperability where it matters (metro, multi-vendor dense deployments) while using vendor-specific performance advantages where reach demands it (regional and long-haul). There's nothing architecturally inconsistent about this; it just requires careful documentation so future engineers don't accidentally mix incompatible modules.
|
|
||||||
|
|
||||||
**What to specify for which application**
|
|
||||||
|
|
||||||
For data center interconnect (DCI) at 80km or under: 400ZR is the correct specification. Lower cost than proprietary solutions, real interoperability, and the reach is sufficient. Typical pricing for 400ZR QSFP-DD modules has dropped to the $2,500–$3,500 range in 2026, making them increasingly cost-competitive with longer-reach legacy solutions.
|
|
||||||
|
|
||||||
For metro rings and regional spans of 80–600km with amplification: OpenZR+ gives you rate flexibility that's genuinely useful for managing different span lengths in the same ring. Validate the specific vendor combination you're deploying against published interop matrices.
|
|
||||||
|
|
||||||
For high-performance long-haul or submarine-adjacent applications: proprietary ZR+ or purpose-built coherent line systems remain the technically correct choice. Don't fight the physics by forcing OpenZR+ into applications where you need 4–5 dB of additional OSNR headroom.
|
|
||||||
|
|
||||||
For anyone building a disaggregated ROADM-based network with open line system (OLS) architecture: 400ZR interoperability becomes critical infrastructure. The ability to swap client-side optics without replacing the line system is the core economic argument for disaggregation, and it only works if your coherent pluggables actually interoperate. Spec 400ZR, validate at plugfest conditions, and treat any vendor claiming "interoperability" without OIF certification with appropriate skepticism.
|
|
||||||
|
|
||||||
The naming mess will likely persist until the ecosystem consolidates around clear form factor and standards boundaries. Until then, ask vendors specific questions: which standard does this module conform to, what FEC implementation, what is the validated interop partner list, and what are the distance/power/OSNR test conditions behind the datasheet numbers.
|
|
||||||
@ -1,63 +0,0 @@
|
|||||||
---
|
|
||||||
title: "Class 1M Laser Safety: What the Label on Your DWDM Transceiver Actually Means"
|
|
||||||
slug: "laser-safety-class-1m-transceivers"
|
|
||||||
category: "Safety & Compliance"
|
|
||||||
tags: ["laser safety", "Class 1M", "DWDM", "IEC 60825", "fiber handling", "eye safety"]
|
|
||||||
seo_focus_keyword: "Class 1M laser safety DWDM transceivers"
|
|
||||||
word_count_target: 1200
|
|
||||||
difficulty: intermediate
|
|
||||||
---
|
|
||||||
|
|
||||||
There's a small label on most high-power DWDM transceivers that reads "Class 1M." Many engineers who handle these modules daily couldn't tell you what it means beyond a vague sense that "it's safe most of the time." That's not entirely wrong, but the nuance in that label matters — both for genuine safety and for understanding which precautions in your lab and datacenter are actually doing something versus which ones are ceremonial.
|
|
||||||
|
|
||||||
**The IEC 60825 laser classification system**
|
|
||||||
|
|
||||||
IEC 60825-1 is the international standard governing laser product safety classification. It establishes a hierarchy based on the combination of optical power, wavelength, pulse characteristics, and the accessible emission limit (AEL) for each class. The classification system runs from Class 1 (safe under all normal conditions of use) through Class 4 (capable of causing immediate serious eye and skin damage, potential fire hazard). Most transceivers fall into Class 1 or Class 1M.
|
|
||||||
|
|
||||||
Class 1 means the laser is inherently safe under all reasonable foreseeable conditions, including extended direct intrabeam viewing. The power level is below the threshold that can cause retinal damage even during prolonged exposure. Most short-reach datacom transceivers — 100GBASE-SR4, 10GBASE-SR, typical 25G gray optics — fall here. Wavelengths in the 850nm multimode range, powers in the -3 to +2 dBm range, pose no realistic eye hazard.
|
|
||||||
|
|
||||||
Class 1M adds a crucial qualifier: the laser is safe provided optical instruments such as magnifiers, microscopes, or collimating lenses are NOT used. The "M" stands for magnification. A Class 1M beam is typically either highly divergent (difficult to focus onto the retina naturally) or of large diameter (again, not efficiently focused by the eye's natural optics). But pass that beam through a magnifying eyepiece, and the convergence properties change dramatically — you're now potentially concentrating kilowatts per square centimeter onto a small retinal area.
|
|
||||||
|
|
||||||
**Why DWDM transceivers are Class 1M**
|
|
||||||
|
|
||||||
High-power DWDM transceivers — the 100G and 400G coherent modules used in carrier networks, metro rings, and long-haul transport — transmit in the 1550nm C-band range (approximately 1530–1565nm). At these wavelengths, the human eye's cornea is relatively transparent, and the focusing properties differ from the 850nm or 1310nm ranges used in shorter-reach applications.
|
|
||||||
|
|
||||||
The critical issue is optical power. A typical 100G coherent DWDM module may launch at +0 to +3 dBm (1 to 2 mW). That sounds modest. But "high-power" DWDM boosted outputs — think EDFA-launched signals post-amplification — can reach +17 dBm (50 mW) or higher. Even at nominal launch powers without amplification, the combination of 1550nm wavelength characteristics and the beam geometry from a single-mode fiber connector tip creates conditions where optical instrumentation could focus enough energy onto the retina to cause irreversible damage.
|
|
||||||
|
|
||||||
The Class 1M designation is therefore appropriate and precise: the unaided eye looking at an open single-mode fiber connector carrying a 1550nm DWDM signal at +0 to +3 dBm is not at significant risk. The beam diverges rapidly from the 9µm core, delivering sub-threshold irradiance at the retina. Add a common fiber inspection microscope — the same tool you use to check connector cleanliness — and the situation changes fundamentally.
|
|
||||||
|
|
||||||
**What precautions actually matter**
|
|
||||||
|
|
||||||
The most important practical rule is one that many field engineers know intellectually but occasionally violate under time pressure: never inspect a fiber connector face under magnification without first confirming the fiber is dark. Not "I think I turned off the port." Confirmed dark — power meter on the other end, DOM read-back showing zero TX power, or physical disconnection at the far end.
|
|
||||||
|
|
||||||
Optical fiber inspection microscopes — both bench-top models and handheld probes like the Fluke FiberInspector series or VIAVI FiberChek — concentrate the beam geometry in a way that creates genuine hazard from Class 1M sources. The same microscope you use to diagnose connector contamination will focus a live DWDM signal into a hazardous irradiance level. This is not theoretical; there are documented cases of eye injuries from live fiber inspection in carrier environments.
|
|
||||||
|
|
||||||
For routine operations, the precautions that actually matter are:
|
|
||||||
|
|
||||||
Confirm fiber status before inspection. This is non-negotiable and takes 30 seconds. Use a power meter, a DOM query, or both. Build this into your NOC procedure for any maintenance involving 1550nm or coherent connections.
|
|
||||||
|
|
||||||
Use appropriate inspection tools. Modern video-based inspection probes (VFL probes, or the camera-equipped fiber scopes) do not present direct optical path hazard because you're viewing a camera image rather than looking directly through optics. These are preferred for connector inspection precisely because they eliminate the Class 1M hazard path.
|
|
||||||
|
|
||||||
Laser safety eyewear has limited applicability at Class 1M. Standard laser goggles rated for 1550nm will block the wavelength — but they also make it impossible to do most fiber work, and the attenuation they provide may exceed the actual hazard level for most normal operations. The practical approach is to use them when working with known high-power amplified outputs (+17 dBm and above), and to rely on procedural controls (confirm dark) for standard transceiver outputs. Using eyewear as a substitute for confirming fiber status is the wrong approach.
|
|
||||||
|
|
||||||
**The specific case of fiber inspection after installation**
|
|
||||||
|
|
||||||
Installation and maintenance scenarios create the highest risk. When commissioning a DWDM system, you frequently need to inspect connectors while other wavelengths on the same fiber plant may be carrying live traffic. Even if the specific fiber pair you're working on is dark, adjacent fibers in the same duct or cable may be live. The mechanical hazard of accidentally contacting a live adjacent fiber connector during inspection work is low in well-organized patch bays but nonzero in messy cable environments.
|
|
||||||
|
|
||||||
The sensible operational protocol: establish a fiber handling zone for DWDM maintenance that requires two-person confirmation before any connector is handled — one person confirms dark status while the other does the physical work. This is standard in carrier central offices and is worth implementing in enterprise DWDM environments.
|
|
||||||
|
|
||||||
**The theater problem**
|
|
||||||
|
|
||||||
Some of the safety procedures that have grown up around laser handling are genuine protective measures. Others are theater. Knowing the difference matters, because theater creates compliance fatigue and can crowd out the genuinely important procedures.
|
|
||||||
|
|
||||||
Wearing general laser safety eyewear rated for 1550nm during routine switch port maintenance involving 1310nm short-reach optics is theater — the wavelength doesn't match, the power levels don't warrant it, and it reduces situational awareness without providing protection. Following a 14-step power-down checklist before touching a fiber connection on a datacenter 100GBASE-LR4 module running at +2 dBm is theater — the hazard at that power and wavelength does not require it.
|
|
||||||
|
|
||||||
Confirming fiber dark before microscope inspection of any single-mode connector is not theater. It's the specific precaution that maps to the specific hazard profile of Class 1M at 1550nm.
|
|
||||||
|
|
||||||
**An honest risk summary**
|
|
||||||
|
|
||||||
Class 1M DWDM transceivers at nominal output powers (0 to +3 dBm) present a real but conditional hazard. The condition is optical magnification — primarily fiber inspection microscopes. Remove that condition through procedural confirmation (confirm dark before inspection) or by using camera-based inspection tools, and you've eliminated the dominant risk pathway.
|
|
||||||
|
|
||||||
Amplified DWDM outputs (+10 dBm and above) warrant additional respect: laser safety eyewear is appropriate when working near bare fiber in amplified sections, and physical handling of fiber ends in amplified sections should always be with confirmed transmitter shutdown at the optical amplifier.
|
|
||||||
|
|
||||||
The 1550nm window is invisible to the human eye, which removes the reflexive blink response you get with visible lasers. There's no instinctive alarm. That's exactly why the procedural discipline matters more, not less, than it does with other laser classes.
|
|
||||||
@ -1,88 +0,0 @@
|
|||||||
---
|
|
||||||
title: "OSNR and Optical Link Budget: A Working Engineer's Calculation Guide"
|
|
||||||
slug: "osnr-link-budget-practical-guide"
|
|
||||||
category: "Network Engineering"
|
|
||||||
tags: ["OSNR", "link budget", "optical power", "EDFA", "metro", "long-haul", "margin"]
|
|
||||||
seo_focus_keyword: "OSNR optical link budget calculation guide"
|
|
||||||
word_count_target: 1200
|
|
||||||
difficulty: advanced
|
|
||||||
---
|
|
||||||
|
|
||||||
Optical link budgets are one of those topics where the theoretical treatment in textbooks and the practical reality of commissioning a metro ring diverge significantly. The math isn't particularly difficult, but knowing which numbers to trust, which safety margins to apply, and where real systems consistently underperform their datasheet specifications takes experience that no textbook provides. This article walks through the calculations you actually need for metro and regional planning, with the margin tables that vendor application notes tend to omit.
|
|
||||||
|
|
||||||
**Starting with power budget: the basics**
|
|
||||||
|
|
||||||
For a passive point-to-point link (no amplification), the optical power budget is straightforward. Received power equals transmitted power minus all losses in the path:
|
|
||||||
|
|
||||||
P_received = P_transmit − IL_fiber − IL_connectors − IL_splices − IL_components
|
|
||||||
|
|
||||||
Where:
|
|
||||||
- P_transmit is the transceiver output power (in dBm, from the TX power specification)
|
|
||||||
- IL_fiber is insertion loss from fiber attenuation (typically 0.35 dB/km for SMF-28 at 1310nm, 0.20 dB/km at 1550nm)
|
|
||||||
- IL_connectors is connector pair insertion loss (budget 0.5 dB per mated pair, though good APC connectors achieve 0.2–0.3 dB)
|
|
||||||
- IL_splices is splice loss (0.1 dB per fusion splice is achievable; budget 0.2 dB for conservative planning)
|
|
||||||
- IL_components adds patch panels, WDM multiplexers, splitters, and any passive inline components
|
|
||||||
|
|
||||||
The received power must exceed the transceiver's receiver sensitivity by the required margin. A 100GBASE-LR4 transceiver (1310nm CWDM4 or LAN-WDM) typically specifies a minimum receiver sensitivity of -10.6 dBm and a maximum input of +4.5 dBm. The transmitter output is +4 to +4.5 dBm. A 10km link with good fiber and typical connectors consumes about 3–4 dB, leaving the received signal well above sensitivity.
|
|
||||||
|
|
||||||
The headroom between your calculated received power and the receiver sensitivity floor is your margin. You want at least 3 dB of margin for a stable link; 4–5 dB is better for long-term fiber plant aging and component degradation.
|
|
||||||
|
|
||||||
**Where passive budget calculations break down**
|
|
||||||
|
|
||||||
For spans beyond about 80km, passive loss exceeds what most transceiver receiver sensitivities can accommodate. A 100km SMF-28 run at 1550nm accumulates 20 dB of fiber loss alone. Add connectors and components, and you're at 22–25 dB. Standard coherent 400ZR transceivers have receive sensitivity around -21 dBm and transmit at 0 dBm, giving a 21 dB passive link budget — barely adequate for 100km with no margin.
|
|
||||||
|
|
||||||
This is where OSNR becomes the meaningful metric rather than raw optical power.
|
|
||||||
|
|
||||||
**OSNR: signal-to-noise in amplified links**
|
|
||||||
|
|
||||||
In amplified optical systems using EDFAs (Erbium-Doped Fiber Amplifiers), the limiting factor is not absolute received power but the ratio of signal power to accumulated amplified spontaneous emission (ASE) noise — the Optical Signal-to-Noise Ratio.
|
|
||||||
|
|
||||||
OSNR is defined as the ratio of signal power to noise power measured in a reference bandwidth (typically 12.5 GHz or 0.1 nm, the two are approximately equivalent in the C-band). It's expressed in dB:
|
|
||||||
|
|
||||||
OSNR (dB) = P_signal − P_noise
|
|
||||||
|
|
||||||
For a single EDFA span, the OSNR contribution is approximately:
|
|
||||||
|
|
||||||
OSNR_span = P_launch − NF_EDFA − 10×log10(h×ν×B_ref) − L_span
|
|
||||||
|
|
||||||
Where:
|
|
||||||
- P_launch is the signal power entering the amplifier (in dBm)
|
|
||||||
- NF_EDFA is the EDFA noise figure (typically 4–6 dB for modern inline amplifiers)
|
|
||||||
- h×ν×B_ref is the noise photon floor: at 1550nm in 12.5 GHz bandwidth, 10×log10(h×ν×B_ref) ≈ −58 dBm (a constant you can treat as a reference value)
|
|
||||||
- L_span is the span loss in dB
|
|
||||||
|
|
||||||
For a practical example: a 80km SMF span with 16 dB loss, EDFA with 5 dB noise figure, and +0 dBm launch power:
|
|
||||||
|
|
||||||
OSNR_span ≈ 0 − 5 − (−58) − 16 = 37 dB
|
|
||||||
|
|
||||||
That's the OSNR at the output of the first EDFA. Each additional span adds noise, and OSNR degrades approximately as 10×log10(N_spans) for equal-span, equal-amplifier systems. Four spans: −6 dB. Eight spans: −9 dB. For a 400G DP-16QAM signal, you need approximately 24–26 dB OSNR at the receiver (the FEC threshold). Work backwards from there to determine how many spans are feasible.
|
|
||||||
|
|
||||||
**Practical margin tables for metro and regional planning**
|
|
||||||
|
|
||||||
The following represents conservative real-world planning margins, not best-case datasheet values. Actual performance will typically exceed these — they're designed to survive six years of fiber aging, connection rematings, and EDFA gain drift.
|
|
||||||
|
|
||||||
| Application | Span Length | Fiber Loss | EDFA NF | Target OSNR | Margin |
|
|
||||||
|---|---|---|---|---|---|
|
|
||||||
| Metro DCI, 400ZR | 80 km | 16 dB | 5 dB | 26 dB | 4 dB |
|
|
||||||
| Metro ring, 100G | 60 km | 12 dB | 5 dB | 22 dB | 5 dB |
|
|
||||||
| Regional, 400G OpenZR+ | 200 km (3 spans) | 16 dB/span | 5 dB | 24 dB | 3 dB |
|
|
||||||
| Long-haul, 100G DP-QPSK | 600 km (8 spans) | 15 dB/span | 5 dB | 16 dB | 3 dB |
|
|
||||||
| Raman-boosted, 400G | 120 km | 24 dB | 4 dB (eff.) | 26 dB | 4 dB |
|
|
||||||
|
|
||||||
The margin column accounts for: connector aging (+0.5 dB over 5 years), splice point accumulation (+0.3 dB), EDFA gain flatness variation (±0.5 dB), chromatic dispersion compensation imperfection (+0.5 dB), and polarization-mode dispersion (PMD) margin (+0.5 dB). Add these, round up, and 3 dB is genuinely tight; 5 dB is comfortable.
|
|
||||||
|
|
||||||
**Chromatic dispersion: the other constraint**
|
|
||||||
|
|
||||||
High-speed coherent modulation formats are sensitive to chromatic dispersion (CD). Standard SMF-28 has approximately 17 ps/(nm·km) CD at 1550nm. For a 400G DP-16QAM signal with 60 GHz baud rate, the CD tolerance of a typical coherent DSP is ±80,000 ps/nm. That sounds large — it's enough for 4,700 km of SMF-28 without compensation. Modern coherent DSPs (Acacia Pico, Marvell Canopus, Ciena WaveLogic 5) compensate dispersion digitally, eliminating the need for dispersion compensation fiber (DCF) that was mandatory in 10G-era deployments.
|
|
||||||
|
|
||||||
For 10G direct-detect transceivers (10GBASE-ER, 10GBASE-ZR), dispersion remains a real constraint. 10GBASE-ER at 1550nm specifies a maximum of 1,600 ps/nm CD tolerance. At 17 ps/(nm·km), that's about 94km before dispersion compensation is needed. This is why 10G long-haul deployments either use 1310nm (near zero dispersion wavelength, approximately 3 ps/(nm·km)) or require inline dispersion compensation.
|
|
||||||
|
|
||||||
**Common planning mistakes**
|
|
||||||
|
|
||||||
Trusting vendor datasheet OSNR sensitivity without applying a real-world penalty is the most common error. Datasheet values are typically measured with back-to-back configurations, calibrated test equipment, and ideal polarization conditions. Real links accumulate 1–2 dB of effective OSNR penalty from PDL (polarization-dependent loss), filter narrowing through cascaded ROADMs, and nonlinear optical effects at higher launch powers. Apply a 2 dB system penalty to any coherent link with ROADMs in the path.
|
|
||||||
|
|
||||||
ROADM filtering deserves special attention. Each ROADM passthrough adds approximately 0.5–1.0 dB of effective OSNR penalty due to filter bandwidth narrowing. A signal traversing eight cascaded ROADMs accumulates 4–8 dB of filtering penalty that must be included in the budget. Coherent DSPs compensate some of this through adaptive equalization, but not all.
|
|
||||||
|
|
||||||
Launch power optimization is often overlooked. Increasing launch power improves OSNR linearly — until nonlinear effects (self-phase modulation, cross-phase modulation, four-wave mixing) kick in and degrade it. The optimal launch power for a typical SMF-28 100km span is typically +0 to +2 dBm for 100G coherent. Above +4 dBm, nonlinear penalties start exceeding the OSNR improvement. The sweet spot depends on channel count, baud rate, and fiber type — this is worth computing explicitly rather than defaulting to maximum launch power.
|
|
||||||
|
|
||||||
Good link budgeting is iterative. Start with the margin tables, apply real-world penalties, check both power budget and OSNR, and revisit if the margin is below 3 dB. If you're within 1 dB of the OSNR threshold, you're operating in the territory where normal day-to-day variation in EDFA gain, fiber temperature, and connector condition can push you into errors.
|
|
||||||
@ -1,65 +0,0 @@
|
|||||||
---
|
|
||||||
title: "How to Detect Counterfeit Transceivers: EEPROM Forensics and the Grey Market Problem"
|
|
||||||
slug: "transceiver-counterfeit-detection"
|
|
||||||
category: "Procurement & Quality"
|
|
||||||
tags: ["counterfeit", "grey market", "EEPROM", "DOM", "authentication", "procurement", "OEM"]
|
|
||||||
seo_focus_keyword: "counterfeit transceiver detection EEPROM"
|
|
||||||
word_count_target: 1200
|
|
||||||
difficulty: intermediate
|
|
||||||
---
|
|
||||||
|
|
||||||
The transceiver grey market is large, well-organized, and not going away. Estimates suggest that 10–15% of enterprise transceiver procurement globally involves some degree of counterfeiting, remarking, or unauthorized reprogramming — the exact numbers are hard to pin down precisely because the fraud is, by design, difficult to detect. This isn't a problem that only affects procurement teams chasing bargains on eBay. It shows up in legitimate reseller channels, through authorized distributors with contaminated supply chains, and occasionally from what appear to be reputable secondary market vendors.
|
|
||||||
|
|
||||||
Understanding what "counterfeit" actually covers, how to detect it, and what the practical risks are is more useful than a generic warning about buying cheap.
|
|
||||||
|
|
||||||
**What "counterfeit" actually means in the transceiver market**
|
|
||||||
|
|
||||||
The term covers a spectrum. At one end: completely fabricated modules manufactured without any legitimate IP, using substandard optical components, with falsified EEPROM data claiming to be name-brand products. These are straightforward fraud. At the other end: legitimate transceiver hardware from a tier-1 manufacturer that has been reprogrammed — its EEPROM rewritten — to report as a different product. This second category is technically "remarked" or "reprogrammed" rather than counterfeit in the traditional sense, but the effect from the buyer's perspective is similar: you're not getting what you paid for.
|
|
||||||
|
|
||||||
Between these extremes sits a range of situations: genuine optical modules that have been failed out of hyperscale networks and refurbished without disclosure, modules made with subgrade components that meet original specs for 3–6 months before degrading, and modules with correct hardware but EEPROM programmed to impersonate OEM part numbers (so they pass basic digital ID checks on Cisco, Juniper, or Arista gear).
|
|
||||||
|
|
||||||
The OEM part number impersonation case is particularly common and worth understanding in detail. Router and switch vendors enforce "approved optics" lists through EEPROM checks: the switch reads the EEPROM and compares the vendor name, part number, and OUI (Organizationally Unique Identifier) against an approved list. If the check fails, the port may be disabled or generate warnings. The "compatible" transceiver market — legitimate vendors like Flexoptix, Finisar, InnoLight, and others who manufacture optical modules to the same functional specification — address this by programming EEPROM with appropriate vendor fields. The counterfeit market abuses the same mechanism to impersonate specific OEM part numbers without having the corresponding hardware quality.
|
|
||||||
|
|
||||||
**Physical inspection: what to look for**
|
|
||||||
|
|
||||||
Physical inspection is imperfect but useful as a first pass. Genuine Cisco SFP+ transceivers, for example, have specific label placement, font metrics, and holographic security elements that are difficult to fake well. The Cisco logo on genuine modules uses a specific pantone color that appears slightly different from the blue used on commodity replacements. Seam lines, surface finish on the housing, and pull tab quality are all indicators — counterfeit modules frequently have slightly rougher housing finishes, imprecise seam alignment, and pull tabs that feel different from originals.
|
|
||||||
|
|
||||||
The best reference for physical inspection is comparison against a known-good genuine module under good lighting. Side by side, differences that are subtle in isolation become obvious. Maintaining a reference sample for each OEM form factor you deploy is worthwhile if you're doing significant volume procurement.
|
|
||||||
|
|
||||||
Inspect the laser aperture area. Genuine high-quality modules have clean, precisely positioned fiber receptacles. Counterfeit modules sometimes show mechanical tolerances that are slightly off — you may feel a loose ferrule engagement or see contamination patterns that suggest the module has been disassembled and reassembled.
|
|
||||||
|
|
||||||
**EEPROM forensics: reading the data**
|
|
||||||
|
|
||||||
The SFF (Small Form Factor) Committee standards define the EEPROM structure for SFP (SFF-8472), SFP+ (SFF-8472), QSFP+ (SFF-8636), QSFP28, and QSFP-DD/OSFP (CMIS specification). Each module stores a standard set of identification fields that can be read via the host system's I2C interface or via external EEPROM readers.
|
|
||||||
|
|
||||||
Key fields to check in the EEPROM data:
|
|
||||||
|
|
||||||
Vendor Name (bytes 20–35 in SFF-8472): This should match the vendor on the physical label. Mismatches between physical labeling and EEPROM vendor name are a definitive red flag — no legitimate manufacturer does this.
|
|
||||||
|
|
||||||
Vendor OUI (bytes 37–39): A 24-bit organizationally unique identifier registered with the IEEE. You can verify whether the OUI actually belongs to the claimed vendor at the IEEE public registry (standards.ieee.org/products-programs/regauth/). A module claiming to be Cisco with an OUI that traces to an unknown Chinese ODM is suspicious.
|
|
||||||
|
|
||||||
Vendor Part Number (bytes 40–55): This should match the module's physical label. Reprogrammed modules frequently show part numbers that don't match the module's actual optical specifications — a module physically capable of 10GBASE-SR reprogrammed to claim it's a 10GBASE-LR, for example.
|
|
||||||
|
|
||||||
Serial Number (bytes 68–83): Genuine OEM modules have serial numbers that trace back to the manufacturer's production records. If you have access to OEM vendor support portals (Cisco TAC, Juniper JTAC), you can often verify whether a serial number is genuine. Duplicate serial numbers across multiple physical modules are a definitive sign of counterfeiting.
|
|
||||||
|
|
||||||
Checksum bytes: SFF-8472 includes CC_BASE and CC_EXT checksum bytes. Legitimate EEPROM programming always produces correct checksums. Counterfeit programming sometimes generates incorrect checksums due to incomplete EEPROM rewrites — this is detectable and is a clear red flag.
|
|
||||||
|
|
||||||
**Using DOM as a counterfeit indicator**
|
|
||||||
|
|
||||||
Digital Optical Monitoring (DOM/DDMI) data provides additional forensic value. Read TX power, RX power, bias current, supply voltage, and temperature from a suspect module and compare against the datasheet specification ranges.
|
|
||||||
|
|
||||||
A module claiming to be a 10GBASE-LR (nominal TX power +1 to +4 dBm at 1310nm) but reading TX power at −3 dBm is either failing or was never a genuine LR module. Temperature readings that are implausibly precise (exactly 25.000°C when the environment is 22°C) can indicate hardcoded DOM values rather than real sensor readout — a classic counterfeit tell.
|
|
||||||
|
|
||||||
Bias current is particularly diagnostic for laser quality. Genuine 10G DFB lasers operate at bias currents of 40–70 mA. Cheap FP (Fabry-Perot) lasers substituted in SR-range modules to impersonate LR parts often show different bias current profiles. DOM values that stay completely static across temperature changes also suggest hardcoded rather than measured values.
|
|
||||||
|
|
||||||
**What "reprogrammed OEM optics" actually are**
|
|
||||||
|
|
||||||
This is the grey area that generates the most confusion. An OEM optic — say, a Cisco GLC-LH-SMD — is manufactured by a third party (often Finisar, InnoLight, or another ODM) to Cisco's specification and programmed with Cisco EEPROM data. Cisco does not manufacture its own optics.
|
|
||||||
|
|
||||||
When a legitimate third-party manufacturer like Flexoptix makes a compatible module, they manufacture to the same functional specification and program appropriate EEPROM data. This is legal, this is disclosed, and the functional performance is typically identical.
|
|
||||||
|
|
||||||
When a grey market operator takes a genuine Flexoptix or generic ODM module and reprograms it to claim it's a Cisco GLC-LH-SMD — specifically to defeat Cisco's optics check — this is deceptive, potentially violates trademark law, and means the buyer paid OEM prices for non-OEM hardware without disclosure.
|
|
||||||
|
|
||||||
The distinction matters practically: reprogramming is not inherently a quality issue (the underlying hardware may be excellent), but the lack of disclosure about what you're actually receiving is. If you buy "compatible" or "third-party" optics from a reputable vendor, you know what you're getting. If you buy what appears to be an OEM optic and it turns out to be a reprogrammed ODM module, you've been deceived regardless of whether the hardware works.
|
|
||||||
|
|
||||||
The most reliable protection is procurement discipline: buy from vendors who clearly disclose the origin and EEPROM programming of their modules, and who provide documentation you can use to verify claims. Spot-check EEPROM data against labels. If a vendor can't tell you who manufactured the module's optical engine, that's a flag.
|
|
||||||
@ -1,79 +0,0 @@
|
|||||||
---
|
|
||||||
title: "DOM Deep Dive: What Every Parameter Actually Tells You About Your Link"
|
|
||||||
slug: "dom-digital-optical-monitoring-guide"
|
|
||||||
category: "Diagnostics & Monitoring"
|
|
||||||
tags: ["DOM", "DDMI", "digital optical monitoring", "SFF-8472", "diagnostics", "link troubleshooting"]
|
|
||||||
seo_focus_keyword: "DOM digital optical monitoring transceiver diagnostics"
|
|
||||||
word_count_target: 1200
|
|
||||||
difficulty: intermediate
|
|
||||||
---
|
|
||||||
|
|
||||||
Digital Optical Monitoring — also called DDMI (Digital Diagnostic Monitoring Interface), or simply DOM — is one of the most useful diagnostic tools in optical networking and one of the most underused. Most engineers know it exists and can recite "check DOM" as troubleshooting advice. Fewer can look at a set of DOM values, understand which ones are meaningful in context, and correctly distinguish a transceiver that's about to fail from one that's slightly out of optimal operating condition but stable.
|
|
||||||
|
|
||||||
The SFF-8472 standard defines the DOM interface for SFP/SFP+ modules. QSFP and QSFP28 use SFF-8636, and newer CMIS (Common Management Interface Specification) covers QSFP-DD, OSFP, and beyond. The measured parameters are largely consistent across standards: temperature, supply voltage, TX bias current, TX power, and RX power. Here's what each actually means and how to interpret it.
|
|
||||||
|
|
||||||
**Temperature**
|
|
||||||
|
|
||||||
The reported temperature is measured at the module's internal monitor circuit, not necessarily the optical subassembly or the laser junction itself. It reflects the thermal environment the module's electronics are experiencing.
|
|
||||||
|
|
||||||
Normal operating range for commercial-grade modules is 0–70°C case temperature, with the internal sensor typically reading 5–15°C above ambient due to self-heating. A 25°C ambient datacenter environment typically produces internal module temps of 35–45°C. Industrial-grade modules are rated to −40°C to +85°C.
|
|
||||||
|
|
||||||
What temperature anomalies tell you: consistently high temperatures (>65°C internal) suggest inadequate airflow in the cage, a cage with blocked front bezel area, or a very high-power module in a thermally stressed chassis position. Temperatures that drift steadily upward over weeks without HVAC changes suggest slow cage blockage or degrading module thermal contact. Temperatures that spike suddenly without environmental explanation can precede module failures — thermal runaway in the laser driver circuit is a failure mode that DOM temperature can catch early.
|
|
||||||
|
|
||||||
**Supply voltage**
|
|
||||||
|
|
||||||
The supply voltage measurement reads the 3.3V supply rail powering the module's electronics. Nominal is 3.3V; acceptable range is typically 3.135V to 3.465V (±5%).
|
|
||||||
|
|
||||||
Undervoltage conditions (supply below 3.1V) cause instability in the laser driver circuits and TX power fluctuations. Overvoltage above 3.465V can damage module components over time. In practice, supply voltage issues usually trace back to the host switch's SFP cage power delivery or a long cable run with voltage drop for active copper or active optical cables.
|
|
||||||
|
|
||||||
A supply voltage that's consistently at the low end of spec across all modules in a chassis — say, 3.18–3.20V — and normal at 3.28V for modules in a different chassis is worth investigating. The switch's power supply regulation quality varies by vendor and platform, and some older chassis show supply droop under high module count loads.
|
|
||||||
|
|
||||||
**TX bias current**
|
|
||||||
|
|
||||||
This is the DC current flowing through the laser diode to establish its operating point. It's one of the most diagnostically valuable DOM parameters because it reflects the laser's actual operating condition.
|
|
||||||
|
|
||||||
Laser diodes age. As they age, they require increasing bias current to maintain the same output power. The automatic power control (APC) circuit in the transceiver increases bias current to compensate for reduced laser efficiency. TX bias current that's trending upward — even if TX power remains stable — is an early indicator of laser aging.
|
|
||||||
|
|
||||||
Typical bias currents: 10G DFB laser for LR/ER applications runs 40–70 mA nominal. At end of life, bias current may climb to 90–110 mA before the APC circuit can no longer compensate and TX power starts dropping. An SFP+ LR module showing 95 mA bias current when it was 50 mA at installation three years ago has burned through most of its compensation headroom and is a candidate for proactive replacement.
|
|
||||||
|
|
||||||
Short-reach VCSEL lasers (used in 850nm SR applications) have different bias characteristics: typically 4–8 mA, lower temperature sensitivity, and different aging profiles. Sudden jumps in VCSEL bias current are less gradual — they often indicate a mode stability issue rather than smooth aging.
|
|
||||||
|
|
||||||
**TX power**
|
|
||||||
|
|
||||||
TX power is the optical power in dBm being launched from the transceiver's transmitter port into the fiber. This is the most directly actionable DOM parameter for link health.
|
|
||||||
|
|
||||||
Each transceiver has specified TX power bounds. A 10GBASE-LR module specifies TX power between −1 and +3.5 dBm. A reading of +2 dBm is nominal. A reading of −4 dBm on that same module is already outside specification and indicates either laser degradation or APC circuit failure.
|
|
||||||
|
|
||||||
TX power should be stable over time. Gradual downward drift combined with rising bias current, as described above, is classic laser-end-of-life. Sudden sharp drops in TX power without corresponding bias current changes often indicate contamination on the optical connector face — the transceiver is trying to maintain laser power, but the dirty connector is absorbing or scattering light.
|
|
||||||
|
|
||||||
TX power fluctuations — power that varies by more than 0.5 dBm over seconds or minutes — indicate laser instability. This can be thermal (not enough time at operating temperature, first-order thermal stabilization not complete), mechanical (fiber connector not properly seated, cable strain inducing microbending), or electrical (noisy supply rail causing laser driver instability).
|
|
||||||
|
|
||||||
**RX power**
|
|
||||||
|
|
||||||
RX power is the optical power in dBm being received at the module's input port. This measures what's arriving from the far end after traversing the fiber path.
|
|
||||||
|
|
||||||
RX power combined with TX power from the far-end DOM gives you the end-to-end link loss, which you can compare against your expected loss from the link budget calculation. If your calculated path loss is 5 dB and the measured loss (far-end TX minus near-end RX) is 8 dB, something in the fiber path has changed — likely a dirty or damaged connector, a degraded splice, or fiber damage.
|
|
||||||
|
|
||||||
Low RX power — below the receiver sensitivity specification — will cause bit errors and eventual link failure. High RX power — above the receiver's input overload level — causes saturation and nonlinear distortion that also generates errors. Both are detectable from DOM before they reach the alarm threshold on the link itself.
|
|
||||||
|
|
||||||
**Using DOM to diagnose link issues before traffic impact**
|
|
||||||
|
|
||||||
The most valuable DOM workflow is trending, not spot-checking. A single DOM reading tells you the current state. DOM readings recorded over time — daily, or correlated with your monitoring system's polling — tell you trajectory.
|
|
||||||
|
|
||||||
Build a baseline for every transceiver in your critical links: TX power, RX power, bias current, temperature, and supply voltage at initial installation. Then monitor for:
|
|
||||||
|
|
||||||
TX power declining more than 1 dB from baseline: investigate laser health, check bias current trend.
|
|
||||||
|
|
||||||
RX power declining more than 2 dB from baseline with stable far-end TX: check fiber path for new connectors, moved cables, or physical changes in the cable route.
|
|
||||||
|
|
||||||
Bias current increasing more than 15 mA from baseline with stable TX power: flag for replacement within 6–12 months.
|
|
||||||
|
|
||||||
Temperature increasing more than 10°C from baseline: check chassis airflow and cage blockage.
|
|
||||||
|
|
||||||
Supply voltage drifting more than 0.15V from baseline: investigate chassis power delivery.
|
|
||||||
|
|
||||||
**Alarm and warning thresholds**
|
|
||||||
|
|
||||||
SFF-8472 defines four threshold levels for each DOM parameter: high alarm, high warning, low warning, low alarm. These are programmed by the transceiver manufacturer and accessible via the EEPROM. Most monitoring systems expose only whether a parameter is "in alarm" — but reading the actual threshold values is informative. A TX power low warning threshold set at −4 dBm on a module specifying −1 to +3.5 dBm nominal is a loose threshold that won't warn you until the module is well outside specification. Tighten your monitoring system's alert policy to match the module specification, not just the manufacturer's programmed thresholds (which are often set conservatively to minimize false alarms).
|
|
||||||
|
|
||||||
DOM is not a crystal ball. Catastrophic failures — connector fractures, fiber cuts, sudden laser failure from electrostatic damage — don't announce themselves in DOM trends. But the slow degradation modes that account for the majority of transceiver failures leave clear fingerprints. If you're not regularly reading and trending DOM data on production links, you're leaving predictive diagnostics on the table.
|
|
||||||
@ -1,71 +0,0 @@
|
|||||||
---
|
|
||||||
title: "400G DR4 vs. FR4 vs. LR4: The Reach-Cost-Fiber Tradeoff Matrix"
|
|
||||||
slug: "400g-dr4-fr4-lr4-comparison"
|
|
||||||
category: "Hardware Selection"
|
|
||||||
tags: ["400G", "DR4", "FR4", "LR4", "QSFP-DD", "OSFP", "campus", "DCI", "fiber selection"]
|
|
||||||
seo_focus_keyword: "400G DR4 FR4 LR4 comparison distance tradeoff"
|
|
||||||
word_count_target: 1200
|
|
||||||
difficulty: intermediate
|
|
||||||
---
|
|
||||||
|
|
||||||
If you've tried to spec 400G transceivers recently and found yourself staring at a confusing alphabet soup of DR4, FR4, LR4, PLR4, and related variants, you're not alone. IEEE and MSA committees have produced a proliferation of 400G standards that overlap in confusing ways, and vendor datasheets don't always make the tradeoffs obvious. The honest answer is that each of these has a specific application niche, and buying the wrong one — usually over-speccing for reach you don't need — costs real money at scale.
|
|
||||||
|
|
||||||
**The three mainstream variants and what they actually are**
|
|
||||||
|
|
||||||
400GBASE-DR4 (IEEE 802.3bs) uses four parallel single-mode fiber lanes, each carrying 100G using NRZ modulation at 1310nm using four distinct wavelengths in a very narrow band. The "D" refers to "Datacenter Reach" — the specification target is 500 meters. The physical interface uses MPO-12 connectors with 8 fibers (4 TX, 4 RX) or dual MPO-8 configurations depending on the cabling plant. Maximum optical power at the transmitter is approximately +3 dBm per lane, with a receiver sensitivity around −6.9 dBm per lane.
|
|
||||||
|
|
||||||
400GBASE-FR4 uses four wavelengths (CWDM4: 1271, 1291, 1311, 1331 nm) multiplexed onto a single fiber pair with duplex LC connectors. Each wavelength carries 100G, and the four wavelengths are combined at the transmitter by a thin-film WDM element and separated at the receiver by the same. Target reach is 2 kilometers over OS2 single-mode fiber. TX power is similar to DR4 per wavelength, but the WDM element introduces approximately 1.5–2 dB of additional insertion loss compared to a direct parallel approach.
|
|
||||||
|
|
||||||
400GBASE-LR4 is the extended version of FR4: same CWDM4 wavelength plan, same duplex LC fiber interface, same WDM multiplexing architecture, but specified to 10 kilometers. Achieving 10km requires higher transmitter power and better receiver sensitivity than the 2km FR4 specification. LR4 modules are significantly more expensive than FR4, primarily due to the higher-power laser requirements and tighter fabrication tolerances.
|
|
||||||
|
|
||||||
There are also 400GBASE-PLR4 (parallel 500m using PSM4 wavelength plan, eight fibers) and 400GBASE-LR8 (eight wavelengths, 10km, but more commonly seen in 400G CWDM8 MSA form), but for most practical datacenter and campus deployments, DR4/FR4/LR4 covers the relevant options.
|
|
||||||
|
|
||||||
**The cost differential at volume**
|
|
||||||
|
|
||||||
Numbers change, but the relative cost structure has been consistent. As of 2025–2026 street pricing in reasonable volumes (50+ units):
|
|
||||||
|
|
||||||
| Module | Interface | Reach | Approx. Street Price |
|
|
||||||
|---|---|---|---|
|
|
||||||
| 400G DR4 | MPO-12, parallel | 500 m | $350–$550 |
|
|
||||||
| 400G FR4 | Duplex LC, CWDM4 | 2 km | $600–$900 |
|
|
||||||
| 400G LR4 | Duplex LC, CWDM4 | 10 km | $1,200–$1,800 |
|
|
||||||
|
|
||||||
The DR4-to-FR4 price gap reflects the WDM multiplexing components inside the FR4 module — each thin-film filter element is precisely manufactured, and WDM integration at this density is more expensive than parallel fiber. The FR4-to-LR4 gap reflects the higher-power laser components required for 10km reach.
|
|
||||||
|
|
||||||
When you're deploying 200+ transceivers in a spine-leaf refresh, these differences are meaningful. A fabric that could use DR4 but was speced with FR4 "for future flexibility" wastes $50,000–$100,000 in upfront hardware costs. The flexibility rarely materializes — if you later need longer reach, you replace the modules; you don't reuse FR4 modules in a DR4 application because the fiber plant is parallel anyway.
|
|
||||||
|
|
||||||
**The fiber plant decision drives everything**
|
|
||||||
|
|
||||||
This is the constraint that datasheets underemphasize. DR4 requires parallel fiber: eight individual single-mode fibers (or MPO-12 assembly) for each link. FR4 and LR4 require a single fiber pair — two fibers, duplex LC connectors, exactly what most enterprise fiber plants already have in place.
|
|
||||||
|
|
||||||
If your datacenter was built with a structured cabling plant using LC duplex patch panels and OS2 trunk cable, FR4 and LR4 are the natural choices. Every port is a direct cable run, and your existing fiber management infrastructure handles it without change.
|
|
||||||
|
|
||||||
If you're building a new fabric from scratch, or have already moved to MPO-based trunk cabling, DR4 is the cost-effective option for intra-datacenter distances. MPO12/MTP trunk cables with pinned and unpinned ends, breakout cassettes at the patch panel, and 8-fiber allocation per 400G DR4 link — this is a modern high-density cabling approach that many new datacenter builds have already standardized on.
|
|
||||||
|
|
||||||
**Decision tree for the common scenarios**
|
|
||||||
|
|
||||||
Intra-datacenter, same building, distances 10–500 meters: DR4 is the correct answer. New builds should standardize on MPO parallel fiber cabling to enable DR4. Cost savings over FR4 are real, and 500m is sufficient for any intra-row or cross-aisle switch-to-switch path in a standard enterprise or colocation datacenter footprint.
|
|
||||||
|
|
||||||
Campus interconnect or building-to-building links under 2km: FR4 with existing LC duplex OS2 infrastructure. If you already have a fiber-optic building ring with LC duplex patch panels, FR4 drops into that infrastructure cleanly with no fiber plant changes. The WDM cost premium is justified by eliminating fiber plant modifications.
|
|
||||||
|
|
||||||
Metro or extended campus links 2–10km: LR4 is the relevant option. At these distances, the laser power requirements preclude DR4 and make FR4 marginal. LR4 at +3 to +5 dBm per wavelength handles 10km with comfortable margin on OS2 fiber.
|
|
||||||
|
|
||||||
Beyond 10km: 400G coherent (400ZR, OpenZR+) is the appropriate solution. LR4 at 10km is close to its optical power budget limit, and attempting to extend it further with optical amplification runs into dispersion and wavelength-specific issues with the CWDM4 channel plan.
|
|
||||||
|
|
||||||
**The breakout use case changes the calculus**
|
|
||||||
|
|
||||||
A significant fraction of 400G spine-switch port usage involves breakout: one 400G port broken out to four 100G ports for leaf-switch or server uplinks. In this scenario, the fiber plant question takes on new dimensions.
|
|
||||||
|
|
||||||
400G DR4 to 4×100G DR breakout uses a breakout MPO-12 to 4× duplex LC fan-out cable. Each 100G lane runs on a single fiber pair to a separate device. This is cleanly supported and very common in DCI and hyperscale deployments.
|
|
||||||
|
|
||||||
400G FR4 breakout to 4×100G is more complex because the four wavelengths are WDM-multiplexed. Breakout requires a WDM demultiplexer module to split the wavelengths to separate fiber pairs — this is supported via passive CWDM demux cassettes, but adds cost and complexity compared to the DR4 parallel breakout.
|
|
||||||
|
|
||||||
If a significant portion of your 400G ports will be used as 4×100G breakouts, DR4 is strongly preferred from a cabling simplicity standpoint. The parallel fiber architecture maps cleanly to the breakout topology.
|
|
||||||
|
|
||||||
**One thing that surprises people: the LR4 launch power limitation**
|
|
||||||
|
|
||||||
400GBASE-LR4 specifies per-wavelength launch power of approximately 2–4.5 dBm — higher than FR4 to achieve 10km. This creates a potential issue if your fiber path is significantly shorter than 10km. Connecting two LR4 modules with a 200m patch cord creates a received power near the receiver overload threshold, which generates optical saturation and link errors. LR4 modules in short-reach applications typically require an attenuator at the receive port — usually a 5–10 dB inline attenuator on the LC connector — to bring received power within spec.
|
|
||||||
|
|
||||||
This is well-known but frequently forgotten during lab setups and short-distance cross-connects. If your 400G LR4 link shows high BER or won't link at all over a short fiber run, check the receive power before you start blaming the module.
|
|
||||||
|
|
||||||
The three main 400G variants — DR4, FR4, LR4 — map cleanly to three application domains: intra-datacenter, campus, and metro. Match the module to the distance and fiber plant, do the cost math at volume, and resist the temptation to over-spec "just in case."
|
|
||||||
@ -1,67 +0,0 @@
|
|||||||
---
|
|
||||||
title: "CWDM vs. DWDM vs. LWDM vs. MWDM: What Each Is Actually For in 2026"
|
|
||||||
slug: "wavelength-division-multiplexing-primer"
|
|
||||||
category: "Technology Primer"
|
|
||||||
tags: ["CWDM", "DWDM", "LWDM", "MWDM", "WDM", "channel plan", "metro", "datacenter"]
|
|
||||||
seo_focus_keyword: "CWDM DWDM LWDM MWDM comparison wavelength division multiplexing"
|
|
||||||
word_count_target: 1200
|
|
||||||
difficulty: intermediate
|
|
||||||
---
|
|
||||||
|
|
||||||
Wavelength division multiplexing is one of those topics that starts simply — you're sending multiple colors of light down one fiber — and then branches into a confusing taxonomy of acronyms as you get deeper. CWDM, DWDM, LWDM, MWDM, and their various hybrids all exist because different applications have different requirements for channel count, channel spacing, amplification, cost, and reach. Knowing which is appropriate for which application is practical knowledge, not academic.
|
|
||||||
|
|
||||||
**The core concept: why WDM at all**
|
|
||||||
|
|
||||||
A single-mode optical fiber has enormous bandwidth — theoretically around 50 THz in the low-loss windows. A single 100G signal occupies a tiny fraction of this. WDM exploits the remaining capacity by transmitting multiple distinct wavelengths (channels) simultaneously on the same fiber. At the transmitter, separate optical sources at different wavelengths are combined by a WDM multiplexer. At the receiver, a demultiplexer separates them back to individual detectors.
|
|
||||||
|
|
||||||
This matters for two reasons: it multiplies the capacity of existing fiber plants (avoiding costly new cable deployments), and it enables the construction of amplified long-haul systems where a single EDFA can simultaneously amplify dozens of DWDM channels.
|
|
||||||
|
|
||||||
**CWDM: coarse wavelength division multiplexing**
|
|
||||||
|
|
||||||
CWDM uses widely-spaced channels — 20 nm spacing — defined in ITU-T G.694.2 across the range 1270–1610 nm. This gives 18 channels total, though practical deployments typically use 8 channels in the 1470–1610 nm range (the extended L-band and C-band portions of the CWDM grid) because these wavelengths fall within the low-attenuation window of standard SMF.
|
|
||||||
|
|
||||||
The advantage of 20 nm spacing is relaxed wavelength stability requirements for the laser sources. CWDM transceivers use uncooled DFB lasers — no thermoelectric cooler (TEC) to stabilize the laser temperature and therefore the wavelength. This makes CWDM transceivers significantly cheaper than their DWDM equivalents. The CWDM4 channel plan (1271/1291/1311/1331 nm) used in 100GBASE-CWDM4 and 400GBASE-FR4 is a practical application of this: four channels on a single fiber pair, using inexpensive uncooled lasers.
|
|
||||||
|
|
||||||
The limitation is amplification. CWDM channels span multiple fiber loss windows, and erbium-doped fiber amplifiers (EDFAs) only amplify in the C-band (1530–1565 nm) and L-band (1565–1625 nm). CWDM channels outside these windows cannot be amplified by standard EDFAs, which limits CWDM to passive applications — typically under 80 km without amplification. This is fine for intra-datacenter, campus, and metro access applications; it's a hard limit for long-haul.
|
|
||||||
|
|
||||||
**DWDM: dense wavelength division multiplexing**
|
|
||||||
|
|
||||||
DWDM uses the 50 GHz (nominally 0.4 nm) or 100 GHz (0.8 nm) ITU-T G.694.1 channel grid in the C-band and L-band. The standard 50 GHz C-band grid supports 80 channels from 1529.55 to 1567.14 nm. Extended C-band implementations push toward 96 channels.
|
|
||||||
|
|
||||||
Tight channel spacing requires thermally stabilized lasers — cooled DFB or external cavity lasers with precise wavelength locking. This is why DWDM transceivers cost substantially more than CWDM: the TEC, the wavelength monitor, and the associated control circuitry add cost, power consumption, and complexity.
|
|
||||||
|
|
||||||
The payoff is amplification compatibility. All 80 DWDM C-band channels sit within the EDFA gain bandwidth. A single EDFA boosts all channels simultaneously, enabling cascaded-amplifier long-haul systems carrying 4–8 Tbps of total capacity on a single fiber pair. This is the infrastructure that carries intercontinental internet traffic.
|
|
||||||
|
|
||||||
DWDM also enables ROADMs (Reconfigurable Optical Add-Drop Multiplexers) — wavelength-selective switches that can route individual channels to different destinations without converting to electrical signals. ROADM-based mesh networks are the foundation of modern carrier transport infrastructure.
|
|
||||||
|
|
||||||
For enterprise networks, DWDM is typically deployed in metro rings and regional WAN infrastructure where you need to carry multiple 10G, 100G, or coherent 400G wavelengths on a shared fiber plant. The economics work when you have 4+ channels to multiplex over a route where laying additional fiber is expensive.
|
|
||||||
|
|
||||||
**LWDM: lane wavelength division multiplexing**
|
|
||||||
|
|
||||||
LWDM is a more recent MSA-defined channel plan developed specifically for high-speed parallel datacenter interconnect applications. It uses 12 channels on a 6.25 nm spacing in the range 1269.23–1331.97 nm. The "L" refers to "Lane" — LWDM was designed for applications where each lane of a high-speed electrical interface (like 400G or 800G) maps to a distinct optical wavelength.
|
|
||||||
|
|
||||||
LWDM-based transceivers appear in 400G and 800G modules aimed at extended intra-datacenter and DCI applications. The 8-wavelength subset (LWDM8) at 800G provides eight 100G lanes on a single fiber pair, extending the duplex LC fiber plant to higher speeds without switching to parallel MPO cables.
|
|
||||||
|
|
||||||
The practical advantage over CWDM is denser packing in a narrower wavelength window: LWDM fits 12 channels in the 60 nm span that CWDM covers with only 4 channels. The disadvantage compared to DWDM is still the amplification limitation — LWDM channels are in the O-band (1310nm vicinity) and cannot be amplified by standard C-band EDFAs.
|
|
||||||
|
|
||||||
**MWDM: medium wavelength division multiplexing**
|
|
||||||
|
|
||||||
MWDM is a Chinese-origin MSA developed primarily by China Mobile and Huawei for 5G fronthaul applications. It uses 6 wavelengths on 7 nm spacing in the range 1264.5–1299.5 nm. The "M" stands for "Middle" in the O-band, where chromatic dispersion is near zero — important for 5G fronthaul applications with tight latency requirements over multi-kilometer distances.
|
|
||||||
|
|
||||||
MWDM is relatively niche outside of 5G fronthaul deployments in China and some APAC markets. Its relevance for enterprise network engineers in Western markets is limited, but it appears in discussions of mobile backhaul and fronthaul architectures. The key characteristics — 6 channels, O-band, zero-dispersion wavelength, uncooled lasers — make it cost-effective for short to medium distance fronthaul links.
|
|
||||||
|
|
||||||
**Where each fits in 2026 network architectures**
|
|
||||||
|
|
||||||
CWDM occupies the passive metro access and intra-datacenter niche with cost as the primary driver. CWDM4 specifically (used in FR4 and CWDM4 100G modules) has become the de-facto standard for datacenter 100G and 400G duplex fiber applications under 2km. The 18-channel passive CWDM metro add/drop systems from vendors like CommScope and AFL enable point-to-point capacity multiplication on existing fiber pairs at attractive price points.
|
|
||||||
|
|
||||||
DWDM is the backbone of carrier transport and the correct choice for anything requiring amplification, ROADMs, or more than 4 channels on a shared fiber route. In enterprise contexts, DWDM metro rings connect campus buildings or datacenter sites over carrier-grade fiber. 400ZR coherent DWDM pluggables are making DWDM accessible without dedicated transponder chassis.
|
|
||||||
|
|
||||||
LWDM is finding a place in 400G and 800G DCI applications where the installed fiber plant is duplex LC and the operator wants to avoid a migration to MPO parallel fiber. 400G FR4 is CWDM4-based; 800G FR8 is LWDM-based. If you're planning an 800G refresh in a facility with duplex LC infrastructure, LWDM (FR8 form factor) is the relevant standard.
|
|
||||||
|
|
||||||
MWDM is specific to 5G fronthaul. If that's your application, it's the right answer. If it's not, it's noise.
|
|
||||||
|
|
||||||
**The passive vs. active WDM distinction**
|
|
||||||
|
|
||||||
One more divide worth understanding: passive WDM systems use thin-film filter multiplexers and demultiplexers with no active components — no amplifiers, no electronic control. They're inexpensive, reliable, and completely protocol-agnostic. Active WDM systems add EDFAs, ROADMs, and management electronics. They're more expensive and complex but enable much longer distances and flexible wavelength routing.
|
|
||||||
|
|
||||||
For most enterprise applications — CWDM metro links, DWDM building interconnects under 80km — passive WDM is the appropriate and cost-effective choice. The decision to add active components (amplifiers, ROADMs) is driven by distance and the need for in-service wavelength provisioning, not by the channel plan itself.
|
|
||||||
@ -1,69 +0,0 @@
|
|||||||
---
|
|
||||||
title: "Commercial vs. Industrial vs. Extended Temp Transceivers: What the Grades Actually Mean"
|
|
||||||
slug: "optical-transceiver-temperature-grades"
|
|
||||||
category: "Hardware Selection"
|
|
||||||
tags: ["temperature grade", "industrial", "commercial", "extended temp", "outdoor", "TCO", "reliability"]
|
|
||||||
seo_focus_keyword: "industrial temperature grade optical transceiver"
|
|
||||||
word_count_target: 1200
|
|
||||||
difficulty: intermediate
|
|
||||||
---
|
|
||||||
|
|
||||||
Temperature grade is one of the most frequently misapplied transceiver specifications in enterprise purchasing. The typical pattern runs like this: someone decides that since the network is "critical infrastructure," they should buy the highest-grade components available. They spec industrial-temperature transceivers for their climate-controlled datacenter because it sounds more robust. They pay 40–80% more for hardware that provides no functional benefit in their application. Meanwhile, somewhere in the same organization, access-layer switches in a genuinely harsh environment are populated with commercial-grade optics because "they were cheaper," and those are the ones failing at inconvenient intervals.
|
|
||||||
|
|
||||||
Getting temperature grades right is not complicated, but it requires understanding what the specification actually measures and matching it to the real thermal environment of the deployment.
|
|
||||||
|
|
||||||
**The temperature grade hierarchy**
|
|
||||||
|
|
||||||
Optical transceivers are specified to one of several case temperature ranges. The most common are:
|
|
||||||
|
|
||||||
Commercial: 0°C to +70°C. This is the standard for most datacenter and office-environment transceivers. The vast majority of SFP, SFP+, QSFP28, and QSFP-DD modules sold globally are commercial grade.
|
|
||||||
|
|
||||||
Extended: −5°C or −10°C to +85°C. Some vendors define extended temperature as 0°C to +85°C (just widening the upper bound), while others include a modest below-freezing lower bound. The terminology is inconsistent across manufacturers, so check the actual numbers rather than relying on the label.
|
|
||||||
|
|
||||||
Industrial: −40°C to +85°C. The genuine industrial grade specification covers operation from −40°C to the same +85°C upper bound. This is what you need for outdoor installations, unheated enclosures, vehicles, and industrial control environments.
|
|
||||||
|
|
||||||
Some vendors offer "wide temperature" or "rugged" variants at −40°C to +100°C or similar, primarily for military and automotive applications. These are niche and priced accordingly.
|
|
||||||
|
|
||||||
**What the specification actually guarantees**
|
|
||||||
|
|
||||||
The temperature specification covers case temperature — the temperature of the module housing — not ambient air temperature and not the module's internal junction temperatures. In a forced-air cooled switch chassis with good airflow, the module case temperature is typically 10–20°C above inlet air temperature due to self-heating. If your datacenter runs at 24°C inlet and your chassis is well-cooled, module case temperatures of 35–45°C are typical. Commercial grade (70°C maximum) has 25–35°C of headroom in that scenario.
|
|
||||||
|
|
||||||
The specification does not guarantee identical performance across the rated temperature range in all parameters. TX power and receiver sensitivity are specified at nominal temperature (25°C) and at temperature extremes with wider tolerances. A commercial-grade transceiver at 65°C (5°C below its rated maximum) will typically show slightly reduced transmitter power and slightly degraded receiver sensitivity compared to its room-temperature performance. Not enough to matter in a normal installation with appropriate link margin, but worth knowing.
|
|
||||||
|
|
||||||
Industrial-grade modules use different component selections — wider-temperature-range laser diodes, TECs designed for a larger operating range, and sometimes higher-tolerance resistors and capacitors — that maintain specified performance across the full range. The cost premium reflects genuine component differences, not just marketing.
|
|
||||||
|
|
||||||
**Where commercial grade is definitively adequate**
|
|
||||||
|
|
||||||
Any installation inside a building with working HVAC meets the commercial grade thermal requirement with substantial margin. Datacenters, computer rooms, wiring closets, and standard office environments in temperate climates virtually never see air temperatures above 40°C even with HVAC degradation. Module case temperatures in these environments stay well within the 70°C commercial grade limit.
|
|
||||||
|
|
||||||
This includes most "critical" datacenter infrastructure. Calling something critical infrastructure does not change its thermal environment. A SFP28 25G SR module in a Tier IV datacenter has the same thermal environment as one in a small office server room. The criticality argument, if there is one for temperature grade, applies to the redundancy architecture (dual power, redundant paths, spare modules on site), not the transceiver temperature rating.
|
|
||||||
|
|
||||||
Even enterprise outdoor cabinets in temperate climates (central Europe, most of the US outside desert regions) often fall within commercial or extended temperature range. An outdoor cabinet in Germany will rarely exceed 40°C internal temperature even in direct summer sunlight with a solar shield. A proper thermal analysis of the expected temperature range is more useful than defaulting to industrial grade.
|
|
||||||
|
|
||||||
**Where industrial grade is actually necessary**
|
|
||||||
|
|
||||||
Industrial temperature transceivers are genuinely necessary in specific deployment categories:
|
|
||||||
|
|
||||||
Outdoor installations without climate-controlled enclosures in regions with extreme temperatures. Desert environments (Gulf region, Southwest US, Australia inland) can see ambient air temperatures of 45–50°C, and unventilated outdoor cabinets can reach 70–80°C internal temperature in direct sun. Commercial-grade modules at 75°C case temperature are operating outside specification; industrial grade modules at +85°C are within spec, though with reduced margin.
|
|
||||||
|
|
||||||
Cold climate outdoor installations. Northern Canada, Russia, Scandinavia — outdoor cabinets without heaters can reach −30°C to −40°C in winter. Commercial-grade transceivers do not specify operation below 0°C. They may work, but you are operating outside the manufacturer's guaranteed range and will see degraded performance (wavelength shift in uncooled lasers, increased noise in photodetector circuits, potential condensation issues on power cycling).
|
|
||||||
|
|
||||||
Industrial environments with variable temperature: manufacturing floors, process control environments, outdoor telco street cabinets, and vehicle-mounted networking equipment. The common thread is that the thermal environment cannot be reliably controlled to datacenter standards.
|
|
||||||
|
|
||||||
Optical transceivers in telecom access equipment deployed at curb-level or on utility poles. The ETSI and NEBS standards that govern outdoor telecom equipment require industrial temperature compliance, and equipment deployed in those environments must meet those standards for support and warranty reasons independent of whether the temperature ever actually reaches the limits.
|
|
||||||
|
|
||||||
**The TCO reality: doing the math**
|
|
||||||
|
|
||||||
Industrial-grade SFP28 25G transceivers typically carry a 40–80% price premium over commercial-grade equivalents. A commercial-grade 25G SFP28 SR module at $45 becomes $65–$80 in industrial temperature spec. For a 200-port deployment, that's $4,000–$7,000 in premium for a datacenter installation where the temperature constraint will never be approached.
|
|
||||||
|
|
||||||
Contrast this with the cost of a field failure. A failed industrial installation in a −35°C environment requires a service truck roll, potentially in winter conditions, plus the cost of the replacement hardware. The cost differential of a proper upfront industrial-grade spec is trivial compared to an emergency service call.
|
|
||||||
|
|
||||||
The TCO argument, therefore, is symmetric: don't pay industrial premiums for commercial-environment installations, but don't economize on commercial-grade hardware in applications that genuinely need industrial specification. The failure cost in outdoor industrial environments is high enough that the premium pays for itself in avoided incidents.
|
|
||||||
|
|
||||||
**Extended temperature as a middle ground**
|
|
||||||
|
|
||||||
Extended temperature modules (typically 0°C to +85°C, or −10°C to +85°C) occupy a useful middle ground for indoor applications with less controlled thermal environments: unheated warehouse spaces, outdoor-rated but partially conditioned cabinets in mild climates, and industrial control rooms that are temperature-controlled but not to datacenter standards.
|
|
||||||
|
|
||||||
The upper bound extension to 85°C (from commercial's 70°C) is the practically relevant improvement for indoor industrial applications where equipment loading and poor airflow can push case temperatures beyond 70°C. Manufacturing environments where large amounts of heated equipment operate in the same room as networking hardware frequently benefit from the extended upper temperature rating.
|
|
||||||
|
|
||||||
For most planning purposes: datacenter and standard office wiring closet → commercial. Indoor industrial, partially conditioned outdoor → extended. Outdoor in climate extremes, genuinely uncontrolled temperature environments → industrial. Match the specification to the actual thermal environment, not the criticality perception of the installation.
|
|
||||||
@ -1,59 +0,0 @@
|
|||||||
---
|
|
||||||
title: "Spine-Leaf Transceiver Strategy: Speed Tiers, Breakout Math, and When to Mix"
|
|
||||||
slug: "spine-leaf-transceiver-strategy"
|
|
||||||
category: "Network Architecture"
|
|
||||||
tags: ["spine-leaf", "datacenter fabric", "breakout", "400G", "100G", "SR4", "DR4", "FR4"]
|
|
||||||
seo_focus_keyword: "spine leaf fabric transceiver strategy breakout"
|
|
||||||
word_count_target: 1200
|
|
||||||
difficulty: intermediate
|
|
||||||
---
|
|
||||||
|
|
||||||
Spine-leaf is the dominant fabric architecture for modern datacenters, and it has been for about a decade. The topology is well-understood: every leaf switch connects to every spine switch, no switch-to-switch traffic traverses more than two hops, and scale-out happens by adding leaf switches (for host density) or spine switches (for bandwidth). What's less consistently understood is the optics strategy that makes the economics work — specifically, how to tier transceiver speeds across the fabric, how to do the breakout math correctly, and when mixing optic types within the same layer is a pragmatic trade-off versus a long-term maintenance headache.
|
|
||||||
|
|
||||||
**The bandwidth math that determines transceiver tiers**
|
|
||||||
|
|
||||||
In a standard spine-leaf design, each leaf switch has some number of downlink ports facing servers and some number of uplink ports connecting to the spine layer. The ratio of downlink bandwidth to uplink bandwidth determines the oversubscription ratio — a critical design parameter that affects performance under load.
|
|
||||||
|
|
||||||
A typical enterprise approach runs 4:1 oversubscription: if you have 48 downlinks at 25G per leaf (1,200 Gbps total downlink capacity), you need 300 Gbps of spine-facing uplinks at minimum, which might be 3 ports of 100G. Hyperscale and performance-sensitive applications target 2:1 or even 1:1 (non-blocking).
|
|
||||||
|
|
||||||
The transceiver tier selection follows directly from this math. If your server-facing downlinks are 25G (SFP28), your leaf-to-spine uplinks are typically 100G or 400G depending on your oversubscription target and leaf port count. If your downlinks are 100G (QSFP28, for high-performance computing or storage), your uplinks should be 400G or 800G to maintain reasonable oversubscription ratios.
|
|
||||||
|
|
||||||
The spine tier typically runs at the highest available speed the ASIC generation supports. For a current-generation spine build (2024–2026), that means 400G ports connected to leaf uplinks, potentially with 800G between spine tiers in multi-stage fabrics.
|
|
||||||
|
|
||||||
**Transceiver selection for each layer**
|
|
||||||
|
|
||||||
Leaf-to-server (downlinks): These are typically the highest-density ports in your fabric, frequently using SFP28 25G SR or SFP56 50G SR optics. For 25G SR in a standard rack where servers and leaf switches share the same rack, 1–3 meter direct-attach copper (DAC) or active optical cables (AOC) are common for short in-rack connections. For top-of-rack switches with longer runs, 25G SR (100m OM4 reach) is the standard choice.
|
|
||||||
|
|
||||||
Leaf-to-spine (uplinks): This is where the transceiver selection matters most economically. The distance between leaf switches and spine switches in a well-designed datacenter is typically 10–30 meters within a pod, occasionally stretching to 100 meters across a large datacenter floor. These distances are well within 100GBASE-SR4 reach (100m OM4, 150m OM5) and 400GBASE-DR4 reach (500m OS2). The fiber type in your installed cable plant determines which option you use.
|
|
||||||
|
|
||||||
For multimode OM3/OM4 infrastructure: 100G SR4 and 400G SR4 are the relevant choices. Cost-effective, mature, and well-supported.
|
|
||||||
|
|
||||||
For single-mode OS2 infrastructure: 100G LR4 or DR4 and 400G DR4 or FR4. The DR4 option (MPO-12 parallel SMF) is cheaper than FR4 but requires parallel fiber infrastructure; FR4 uses duplex LC.
|
|
||||||
|
|
||||||
Spine-to-spine (for multi-stage or multi-tier spines): typically the same optic type as leaf-to-spine but at higher aggregate speeds. In multi-stage fabrics where superspine connects to multiple spine tiers, these links may need FR4 or LR4 if the inter-tier distance exceeds DR4's 500m reach.
|
|
||||||
|
|
||||||
**Breakout math: the right way to calculate fiber requirements**
|
|
||||||
|
|
||||||
Breakout is the technique of splitting one high-speed port into multiple lower-speed ports. A 400G QSFP-DD port broken out 4× gives you 4×100G ports. A 400G port broken out 8× gives you 8×50G. Breakout is useful when your spine ports run faster than your leaf uplinks, allowing one expensive spine port to serve multiple leaf uplinks.
|
|
||||||
|
|
||||||
The cable count math is what most planning guides skip. A 400G DR4 to 4×100G breakout uses a breakout MPO-12 to 4× duplex LC fanout assembly. Each 400G DR4 port consumed at the spine side results in 4 duplex LC connections at the leaf side — 4 separate fiber pairs to 4 different leaf switches, all terminating at one spine port via the breakout MPO.
|
|
||||||
|
|
||||||
Calculate your fiber plant requirements this way: for a 32-port spine switch using 400G DR4 ports, if you break out every port 4×, you have 128 leaf uplink endpoints. Each endpoint requires one fiber pair (duplex LC or two fibers of an MPO assembly). Your spine switch needs 32 MPO-12 cables, each fanning out to 4 duplex LC connections. The cable management for 32 MPO-12 breakout fans in a single rack position requires planning — it's a lot of cable.
|
|
||||||
|
|
||||||
For 2× breakout (400G to 2×200G), the fiber management is simpler: a breakout MPO-12 to 2× MPO-8 or a dual-port breakout assembly. Less common but useful for high-speed storage or compute interconnects.
|
|
||||||
|
|
||||||
**When mixing SR4, DR4, and FR4 in the same fabric makes sense**
|
|
||||||
|
|
||||||
The standard advice is to standardize on one optic type per fabric layer. This is operationally sound: uniform spare inventory, simpler troubleshooting, less room for error during maintenance. But real deployments often have constraints that make mixing pragmatic.
|
|
||||||
|
|
||||||
The most common scenario: a datacenter with a mixed fiber plant. The core of the building has OS2 single-mode trunk cable (installed for future proofing or inherited from a previous design), but the horizontal runs to server racks use OM4 multimode. In this case, spine-to-spine connections use 400G DR4 or FR4 (single-mode), while leaf-to-server connections use 25G SR or 100G SR4 (multimode). The mixing is across logical layers, not within the same layer — different transceiver types on different port types, not random mixing on identical ports.
|
|
||||||
|
|
||||||
Within a single layer — say, mixing 400G SR4 and 400G DR4 on different spine-to-leaf links — creates problems: different spare inventories, potential for wrong insertion (the physical form factor is identical; only the optic matters), and operational complexity when troubleshooting. If you're going to mix within a layer, do so with clear documentation, physical or logical port labeling, and spare management that accounts for both types.
|
|
||||||
|
|
||||||
The scenario where mixing within a layer is genuinely justified: expanding an existing fabric where the new leaf switches are in a different physical location, requiring longer runs than the original optic type supports. Adding a new pod to a datacenter that requires 400G FR4 (2km) when the existing fabric uses 400G SR4 (100m OM4) is a legitimate reason to mix. Just manage the operational complexity explicitly.
|
|
||||||
|
|
||||||
**Standardization as a long-term cost driver**
|
|
||||||
|
|
||||||
Standardization reduces costs in ways that aren't always obvious upfront. A consistent transceiver standard across your fabric means: one spare part number for leaf uplinks (or two, if you have a multimode and single-mode split), one DOM monitoring profile applied uniformly, one vendor qualification to maintain, and operational staff who can correctly handle any port without consulting documentation.
|
|
||||||
|
|
||||||
The calculus changes when a new generation makes standardization impossible without a forklift upgrade. Moving from a 100G SR4 leaf-to-spine design to a 400G DR4 design is a port-for-port replacement — the QSFP28 form factor of 100G SR4 does not fit in QSFP-DD 400G ports. When you upgrade the spine and leaf ASICs, you're changing all the uplink optics anyway. Plan fabric optic standardization to last one hardware generation (typically 5–7 years), not forever.
|
|
||||||
@ -1,75 +0,0 @@
|
|||||||
---
|
|
||||||
title: "Proactive Transceiver Replacement: The MTBF Data, DOM Thresholds, and the Real Cost Calculus"
|
|
||||||
slug: "roa-replacing-optics-proactively"
|
|
||||||
category: "Operations & Reliability"
|
|
||||||
tags: ["MTBF", "DOM", "proactive replacement", "reliability", "lifecycle", "operations"]
|
|
||||||
seo_focus_keyword: "proactive transceiver replacement MTBF DOM thresholds"
|
|
||||||
word_count_target: 1200
|
|
||||||
difficulty: intermediate
|
|
||||||
---
|
|
||||||
|
|
||||||
Replace-on-alarm is the default operational mode for optical transceivers in most enterprise networks. Something fails, a link goes down, a technician replaces it, and everyone moves on. It's understandable — proactive replacement programs require investment and discipline, and the "if it ain't broke" instinct is strong. But for networks where link downtime has real operational consequences, the economics of proactive replacement look different than they first appear.
|
|
||||||
|
|
||||||
This is not a philosophical argument for perfect infrastructure. It's a cost analysis.
|
|
||||||
|
|
||||||
**What MTBF numbers actually mean**
|
|
||||||
|
|
||||||
Transceiver manufacturers publish MTBF (Mean Time Between Failures) figures ranging from 100,000 to over 2,000,000 hours depending on the product and calculation methodology. These numbers require interpretation.
|
|
||||||
|
|
||||||
MTBF is a statistical prediction of the mean time between failures for a population of devices under specified operating conditions, calculated using component-level reliability models (typically Telcordia SR-332 or MIL-HDBK-217). A 2,000,000-hour MTBF does not mean an individual module will operate for 228 years. It means that across a large population of modules, the average time between failures should be approximately 2,000,000 hours — or at 8,760 hours per year, about 228 years per module. In a fleet of 2,000 modules, you'd expect roughly one failure per year in a constant-hazard model.
|
|
||||||
|
|
||||||
The critical limitation: MTBF models assume steady-state operation at nominal conditions. They do not model wear-out failure modes that dominate at end of service life. Optical transceivers have at least two components with distinct wear-out profiles: laser diodes (subject to gradual efficiency degradation as described in the DOM article) and electromechanical connectors (subject to fatigue from repeated mating cycles).
|
|
||||||
|
|
||||||
Real-world transceiver failure rates follow a bathtub curve, not a constant hazard rate. Early failures from manufacturing defects cluster in the first few hundred hours (infant mortality). A long stable operating period follows. Then wear-out failure rates begin increasing as laser diodes exhaust their operational headroom, typically after 7–10 years of continuous operation for standard datacenter modules, somewhat less for high-power DWDM optics under continuous high-temperature stress.
|
|
||||||
|
|
||||||
Published MTBF figures are most meaningful for the stable middle period of the bathtub curve. They tell you approximately nothing about when wear-out begins or how quickly the failure rate climbs thereafter.
|
|
||||||
|
|
||||||
**DOM thresholds that predict failure**
|
|
||||||
|
|
||||||
The DOM parameters most useful for predicting failure are TX bias current trend and TX power. The mechanics are described in the DOM deep-dive article; the operational question is: at what threshold values should a proactive replacement be triggered?
|
|
||||||
|
|
||||||
For standard DFB laser-based transceivers (SFP+, SFP28, QSFP28 LR/ER variants):
|
|
||||||
|
|
||||||
TX bias current exceeding 90% of the high alarm threshold is a strong predictor that the module will fail within 3–12 months. If the high alarm threshold is 80 mA, a reading of 72 mA (90% threshold) should trigger replacement scheduling. This is a proactive signal, not an emergency — there's still operational margin, but the trend is unfavorable.
|
|
||||||
|
|
||||||
TX power declining more than 2 dB from the baseline value recorded at installation, with corresponding high bias current, indicates that the APC compensation headroom is being consumed. Again, not immediate failure, but a 6–12 month horizon is realistic.
|
|
||||||
|
|
||||||
For VCSEL-based transceivers (SFP, SFP+, QSFP28 SR variants at 850nm):
|
|
||||||
|
|
||||||
VCSELs have different aging profiles. They tend to fail more suddenly than edge-emitting DFBs, but they also have longer operational lives under typical conditions. The most useful VCSEL DOM indicator is TX power — gradual decline below −3 dBm from a nominal range of −1 to +2.5 dBm (for 10GBASE-SR) suggests wear-out. Sudden TX power drops in VCSELs are more often contamination or mechanical events than laser aging.
|
|
||||||
|
|
||||||
Temperature is a compounding factor. Modules operating consistently above 60°C internal temperature accumulate laser aging more quickly than those operating at 45°C. Modules in chassis with marginal airflow or partially blocked cage areas should be inspected more frequently and replaced sooner.
|
|
||||||
|
|
||||||
**The cost analysis: replace-on-alarm vs. scheduled replacement**
|
|
||||||
|
|
||||||
Replace-on-alarm costs include: the cost of the downtime event itself (labor for emergency response, business impact from link unavailability), the cost of the replacement hardware at unplanned-purchase pricing, and any secondary costs from cascaded failures (traffic rerouting load, backup path congestion).
|
|
||||||
|
|
||||||
Scheduled proactive replacement costs include: the cost of the replacement hardware (purchasable in advance at bulk or planned-procurement pricing), the labor for planned maintenance window replacement (during scheduled downtime), and the residual value of replaced modules that haven't actually failed yet.
|
|
||||||
|
|
||||||
For an enterprise network where each significant link outage incurs 2–4 hours of NOC labor plus potential business interruption costs, the math often favors proactive replacement starting around year 7 for modules in continuous high-availability service. The specific break-even depends on your organization's downtime cost model.
|
|
||||||
|
|
||||||
A practical calculation: suppose a 10GBASE-LR SFP+ module costs $45 in planned procurement. An emergency procurement costs $95 (rush pricing). A link outage costs 3 hours of NOC labor at $80/hour fully loaded, plus whatever business impact applies. The hardware cost differential ($50) is covered after one avoided outage. The labor differential starts covering the proactive replacement cost after roughly two avoided outages. For modules in high-utilization critical paths, the break-even is typically 2–3 years before expected wear-out failure rates increase.
|
|
||||||
|
|
||||||
**A practical proactive replacement program**
|
|
||||||
|
|
||||||
The program doesn't need to be elaborate. Three operational elements cover most of the value:
|
|
||||||
|
|
||||||
First, establish DOM baselines at installation. For every transceiver in a critical link — define "critical" based on your network topology, not by every port — record the initial TX power, bias current, supply voltage, and temperature in your asset management system. This takes five minutes per link at installation time and provides the reference for trend monitoring.
|
|
||||||
|
|
||||||
Second, implement DOM trending in your monitoring stack. Most modern NMS platforms (Kentik, Auvik, PRTG, LibreNMS, and others) can poll SNMP interfaces for DOM values and graph trends over time. Set alert conditions for:
|
|
||||||
- Bias current rising above 80% of high alarm threshold
|
|
||||||
- TX power declining more than 1.5 dB from baseline
|
|
||||||
- Temperature consistently above 65°C internal
|
|
||||||
- Any parameter entering warning or alarm range
|
|
||||||
|
|
||||||
Third, implement an age-triggered review. Modules in critical links that have been operating for 7+ years, or that show DOM trend alerts, enter a replacement queue for the next maintenance window. This is distinct from emergency replacement — it's planned, documented, and executed during scheduled maintenance.
|
|
||||||
|
|
||||||
**Which links actually need this level of attention**
|
|
||||||
|
|
||||||
Not every link warrants a proactive replacement program. The operational cost of maintaining DOM trending and replacement schedules is non-trivial, and applying it uniformly to 5,000 access ports in an enterprise campus is probably not justified.
|
|
||||||
|
|
||||||
The reasonable scope: core and distribution layer uplinks in datacenter and campus environments, WAN links and circuit-facing ports where outages affect connectivity for large user populations, spine-to-leaf uplinks in datacenter fabrics where a link failure changes oversubscription ratios materially, and storage network interconnects where path redundancy may be limited.
|
|
||||||
|
|
||||||
Access-layer switch-to-desktop connections, patch panels in non-critical areas, and any link with sufficient redundancy that a single failure causes no service impact are reasonable candidates for replace-on-alarm.
|
|
||||||
|
|
||||||
The discipline that matters most is consistency: if you decide to monitor DOM on core links, actually monitor it, respond to the trends, and close the loop when replacement is indicated. A monitoring system that generates alerts that are routinely ignored is worse than no monitoring system, because it creates the illusion of diligence while providing none of the protection.
|
|
||||||
@ -1,59 +0,0 @@
|
|||||||
---
|
|
||||||
title: "OEM Optic Lock-In Exposed: How It Works, What 'Compatible' Actually Means, and Your Options"
|
|
||||||
slug: "cisco-juniper-arista-optic-lock-in"
|
|
||||||
category: "Procurement & Vendor Strategy"
|
|
||||||
tags: ["OEM lock-in", "compatible optics", "EEPROM", "Cisco", "Juniper", "Arista", "vendor neutral"]
|
|
||||||
seo_focus_keyword: "Cisco Juniper Arista OEM optic lock-in compatible transceivers"
|
|
||||||
word_count_target: 1200
|
|
||||||
difficulty: intermediate
|
|
||||||
---
|
|
||||||
|
|
||||||
The OEM optic lock-in debate has been running for fifteen years, and it hasn't been resolved by technical progress or legal precedent. It's been resolved by market pressure. Most major switch vendors have moderated their most aggressive lock-in mechanisms, but the topic still generates enough confusion, misinformation, and occasional genuine customer harm that it deserves a clear-eyed examination.
|
|
||||||
|
|
||||||
**How the technical enforcement works**
|
|
||||||
|
|
||||||
Cisco, Juniper, and Arista all use variants of the same mechanism: when a transceiver is inserted into a port, the switch platform reads the module's EEPROM over the I2C management interface and compares the vendor name, vendor OUI, and part number against an internal allowlist. What happens when a module doesn't match the allowlist varies significantly by vendor, platform, and software version.
|
|
||||||
|
|
||||||
Cisco's implementation on IOS and IOS-XE platforms can generate warnings ("ROMMON: NVRAM corruption is detected") or error messages about unsupported SFPs, and in some configurations can disable the port. The widely-known command `service unsupported-transceiver` in Cisco IOS enables non-Cisco optics on most platforms and was added after significant customer pressure. Cisco's official position is that this command voids your optics support entitlement, not your switch support contract — a distinction that is technically valid but has been used inconsistently by TAC engineers.
|
|
||||||
|
|
||||||
On Cisco's NCS and ASR 9000 series running IOS-XR, the enforcement is different: XR has a whitelist-based approach where modules are specifically approved per platform, and the list is controlled by Cisco's release train. Third-party optics on IOS-XR platforms are more constrained than on IOS-XE, and `service unsupported-transceiver` is not universally available.
|
|
||||||
|
|
||||||
Juniper's approach on Junos uses EEPROM vendor validation and generates syslog warnings for non-Juniper optics. Juniper does not typically disable ports for non-qualified optics on EX and QFX series platforms — they log warnings and rely on support policy enforcement rather than technical lockout. On PTX and MX series, qualification requirements are more strictly enforced in software.
|
|
||||||
|
|
||||||
Arista historically had a more permissive stance: Arista EOS accepted third-party optics with minimal restriction, and Arista's official support documentation acknowledged that "compatible" optics from third-party vendors are acceptable. This positioned Arista favorably with price-conscious buyers and remains part of their market differentiation. The EOS `transceiver unsupported` category exists but triggers warnings rather than port shutdown on most platforms.
|
|
||||||
|
|
||||||
**The EEPROM programming reality**
|
|
||||||
|
|
||||||
Third-party transceiver manufacturers — legitimate ones — address the allowlist check by programming their EEPROM with vendor and part number fields that will pass the switch's validation. This is not hacking or counterfeiting; it's the same approach used by every ODM manufacturer that builds optics for Cisco, Juniper, or Arista under contract. The underlying hardware is manufactured by a relatively small number of optical component ODMs (InnoLight, Lumentum, Fabrinet, II-VI/Coherent, Eoptolink) who supply modules to OEMs and third-party brands alike.
|
|
||||||
|
|
||||||
When Flexoptix (as an example) programs a module to work on Cisco equipment, they are programming EEPROM fields that identify the module appropriately for the platform and ensure the DOM data maps correctly. The module itself is manufactured to the same IEEE/SFF standards as any genuine Cisco-branded module. There is no deception involved when the purchaser buys a "Cisco-compatible" optic from a reputable third-party vendor — the product is described accurately.
|
|
||||||
|
|
||||||
The legal landscape in this area is reasonably well-settled. The EU's competition framework, and to a lesser degree US competition law, prohibits using technical tie-in mechanisms to force customers to purchase accessories. No major switch vendor has successfully sued a legitimate third-party optics vendor for EEPROM compatibility programming. The warranty argument — that using third-party optics voids your switch warranty — is legally weak in most jurisdictions and has been challenged successfully. Using a compatible optic does not constitute modification of the switch platform.
|
|
||||||
|
|
||||||
**What "compatible" actually means: the quality spectrum**
|
|
||||||
|
|
||||||
"Compatible" covers a wide spectrum of quality, and this is where the OEM vendors' concerns have some genuine basis.
|
|
||||||
|
|
||||||
At the high-quality end: established third-party optical vendors who manufacture or source from qualified ODMs, apply rigorous incoming inspection, provide DOM data, and stand behind the product with real warranty support. These modules are functionally equivalent to OEM-branded modules, often come from the same manufacturing sources, and provide identical performance. Flexoptix, FS.com's qualified product lines, Lumentum's channel products, and similar vendors operate here.
|
|
||||||
|
|
||||||
At the low-quality end: grey market modules with unknown provenance, modules manufactured to minimal specifications with low-grade components, and counterfeits. These exist in the market, they cause real network problems, and they are why the "compatible optics" category has a mixed reputation among network engineers who have had bad experiences.
|
|
||||||
|
|
||||||
The distinction between these categories is not visible from the vendor name "compatible" on a switch warning message. A Cisco TAC engineer who sees an incompatible optic warning has no idea whether it's a high-quality Flexoptix module or a counterfeit from an unknown source. Their default response is to ask you to replace it with a Cisco-branded module, which is a supportable recommendation regardless of the underlying quality.
|
|
||||||
|
|
||||||
**The practical guidance for your environment**
|
|
||||||
|
|
||||||
For datacenter environments with strict uptime requirements and full vendor support contracts: buy OEM optics for links where TAC involvement during outages is likely and where you want to eliminate the "was it the optic?" question from support interactions. The price premium is real but bounded, and the operational simplicity has value.
|
|
||||||
|
|
||||||
For enterprise and campus environments running mainstream Cisco, Juniper, or Arista platforms where you want competitive pricing without sacrificing reliability: reputable third-party vendors with a clear lineage (who manufactures the optical engine, what quality certifications apply, what warranty they provide) are a reasonable choice. Enable the relevant service command, document it in your change management system, and brief your TAC contacts so they don't immediately redirect you to optic replacement during troubleshooting.
|
|
||||||
|
|
||||||
For Arista shops: the permissive EOS approach means the lock-in argument barely applies. Arista has competed on this basis and the operational overhead of managing the compatibility concern is minimal.
|
|
||||||
|
|
||||||
For carriers and service providers running IOS-XR, Junos on MX/PTX, or Nokia SR OS: the qualification requirements are more stringent, the support contract structure more rigid, and the cost of a support escalation involving optic compatibility questions is higher. OEM optics or formally qualified third-party optics (often available from your NEM via qualified vendor programs) are the safer operational choice.
|
|
||||||
|
|
||||||
**What the "optic tax" actually costs**
|
|
||||||
|
|
||||||
Cisco's SFP28 25G SR optics are listed at approximately $250–$350 in list price. Street prices with typical enterprise discount are $120–$180. Equivalent third-party modules from reputable vendors are $35–$60. The differential per module is $60–$120. For a 48-port leaf switch fully populated with 25G SR optics, the differential is $2,880–$5,760 per switch.
|
|
||||||
|
|
||||||
For a datacenter with 40 leaf switches, the optic cost differential across the fabric is $115,000–$230,000. Even accounting for some increased operational overhead from managing compatibility, that is a number worth taking seriously. Organizations that have done this calculation and made it visible to leadership tend to find that the "it's just simpler to use OEM optics" argument becomes less compelling.
|
|
||||||
|
|
||||||
The OEM vendors know this, which is why the enforcement mechanisms have become softer over time. The market has made the trade-offs clear, and the vendors who continue aggressive lock-in face real competitive disadvantage. The residual lock-in that persists is in support policy, not primarily in technical enforcement — and support policy is negotiable in ways that EEPROM checks are not.
|
|
||||||
@ -1,76 +0,0 @@
|
|||||||
---
|
|
||||||
title: "OM3 vs. OM4 vs. OM5 Multimode Fiber: Actual Performance Differences and When to Upgrade"
|
|
||||||
slug: "multimode-fiber-om3-om4-om5-guide"
|
|
||||||
category: "Physical Infrastructure"
|
|
||||||
tags: ["OM3", "OM4", "OM5", "multimode fiber", "wideband", "850nm", "SWDM", "datacenter cabling"]
|
|
||||||
seo_focus_keyword: "OM3 OM4 OM5 multimode fiber comparison upgrade"
|
|
||||||
word_count_target: 1200
|
|
||||||
difficulty: intermediate
|
|
||||||
---
|
|
||||||
|
|
||||||
Most datacenter cabling discussions treat fiber grade as a binary choice between "old multimode that needs replacing" and "current multimode that's fine." The reality involves meaningful performance differences between OM3, OM4, and OM5 that affect what speeds you can run at what distances — and a legitimate question about whether OM5's wideband capability is worth the premium for new installations. The answer depends on where you are in your cabling lifecycle and what speed tier you're planning for.
|
|
||||||
|
|
||||||
**The physics: why OM grades matter**
|
|
||||||
|
|
||||||
All multimode fiber guides light using total internal reflection in a graded-index core with a nominal 50µm diameter. The performance differences between grades come primarily from the modal bandwidth — specifically, the effective modal bandwidth (EMB), which characterizes how well the fiber supports high-speed transmission with the VCSEL laser sources used in multimode transceivers.
|
|
||||||
|
|
||||||
Modal dispersion is the fundamental limitation of multimode fiber. Different light modes travel at different velocities, spreading a pulse over time and limiting the maximum bandwidth-distance product. The graded-index core profile minimizes this by slowing higher-order modes and accelerating lower-order modes, bringing them closer to the same transit time. Grading quality — how precisely the refractive index profile matches the theoretical optimum — directly determines modal bandwidth.
|
|
||||||
|
|
||||||
OM3: minimum EMB of 2000 MHz·km. Maximum EMB in practice for production cable is typically 2000–3500 MHz·km.
|
|
||||||
|
|
||||||
OM4: minimum EMB of 4700 MHz·km — more than double OM3's minimum. High-performance OM4 cable reaches 6000–8000 MHz·km.
|
|
||||||
|
|
||||||
OM5: minimum EMB of 3500 MHz·km at 850nm, but critically, also specified at 953nm with a minimum EMB of 1850 MHz·km. OM5's primary distinction is its expanded wavelength range for wideband multimode applications (SWDM4), not simply higher modal bandwidth.
|
|
||||||
|
|
||||||
**Distance limits by speed and grade**
|
|
||||||
|
|
||||||
The practical consequence of these bandwidth differences is reach capability for each speed tier:
|
|
||||||
|
|
||||||
| Speed | OM3 Reach | OM4 Reach | OM5 Reach |
|
|
||||||
|---|---|---|---|
|
|
||||||
| 10G (SR) | 300 m | 400 m | 400 m |
|
|
||||||
| 25G (SR) | 70 m | 100 m | 100 m |
|
|
||||||
| 40G (SR4) | 100 m | 150 m | 150 m |
|
|
||||||
| 100G (SR4) | 70 m | 100 m | 100 m |
|
|
||||||
| 200G (SR4) | 50 m | 70 m | 70 m |
|
|
||||||
| 400G (SR8) | N/A | 50 m | 50 m |
|
|
||||||
| 100G (SWDM4) | N/A | N/A | 300 m (over 2 fibers) |
|
|
||||||
| 400G (SWDM4) | N/A | N/A | 150 m (over 2 fibers) |
|
|
||||||
|
|
||||||
The reach differences between OM3 and OM4 matter most in the 100m range — the standard for in-row and cross-aisle datacenter connections. OM3's 70m reach for 100G SR4 and 25G SR constrains configurations where servers and switches are not in adjacent racks, or where the structured cabling adds patch cord length beyond the direct distance.
|
|
||||||
|
|
||||||
Most modern datacenter structured cabling with OM3 can handle 25G SR and 100G SR4 for in-rack and adjacent-rack connections, but cross-datacenter-floor runs — particularly in large enterprise datacenters where distance from servers to a central MDF exceeds 70m — push the OM3 limits for 100G.
|
|
||||||
|
|
||||||
**The OM3-to-OM4 upgrade decision**
|
|
||||||
|
|
||||||
If your existing infrastructure is OM3 and you're deploying 25G server-facing ports and 100G/400G uplinks, the question is whether OM3 will support your target speeds across all link lengths in your facility.
|
|
||||||
|
|
||||||
The honest answer for most enterprise datacenter environments: OM3 is probably sufficient for 25G server access and 100G uplinks in standard ToR (Top-of-Rack) architectures where the horizontal run is under 50 meters. If your facility has cross-row distances over 70 meters, or your cabling plant includes patch panel hops that add 10–15 meters to nominal runs, OM4 provides meaningful headroom.
|
|
||||||
|
|
||||||
For 400G SR8 (which requires parallel 8-lane OM4, 50m maximum), OM3 is not specified and should not be used. If 400G SR8 is in your roadmap, OM4 is a prerequisite.
|
|
||||||
|
|
||||||
Upgrade cost considerations: replacing structured cabling is expensive — labor typically exceeds hardware cost for any fiber replacement project of scale. If your OM3 plant is less than 10 years old, physically sound, and within spec for your current speed requirements, replacing it for the modest reach improvement of OM4 is difficult to justify financially. If you're planning a datacenter refresh that involves moving switches or rewiring racks, incorporate OM4 in that project. Don't do a standalone fiber replacement for OM3-to-OM4.
|
|
||||||
|
|
||||||
**OM5: the wideband case**
|
|
||||||
|
|
||||||
OM5, standardized in TIA-492AAAE and published in 2016, was developed primarily to enable wideband multimode applications using SWDM4 (Short Wavelength Division Multiplexing with 4 channels). SWDM4 uses four wavelengths — 850nm, 880nm, 910nm, 940nm — to multiplex four 25G or 100G lanes on two fibers (duplex LC) instead of the 8 or 12 fibers required by parallel SR4 or SR8 applications.
|
|
||||||
|
|
||||||
The SWDM4 value proposition is significant: 100G or 400G at usable distances over duplex LC fiber infrastructure that's already widely deployed. For organizations with a large investment in duplex LC multimode infrastructure who want to reach 100G or 400G without a parallel MPO cabling migration, OM5 + SWDM4 transceivers are the path.
|
|
||||||
|
|
||||||
The practical catch: SWDM4 transceivers are more expensive than SR4 equivalents, and the ecosystem remains smaller than the parallel SR4 mainstream. 100G SWDM4 QSFP28 modules are available from multiple vendors at around $180–$280, versus $35–$80 for 100G SR4. The cabling savings (fewer fibers, existing duplex LC infrastructure reused) can offset this depending on the scale of the deployment, but the calculation is not always favorable.
|
|
||||||
|
|
||||||
OM5 cable itself typically costs 10–20% more than OM4 cable of comparable quality. For new datacenter builds that are standardizing on MPO parallel fiber anyway, OM5 offers no advantage over OM4 — the parallel SR applications (SR4, SR8) perform identically on OM4 and OM5. OM5 is specifically valuable when you are planning SWDM4 deployments or want maximum flexibility for future wideband applications.
|
|
||||||
|
|
||||||
**Color coding and field identification**
|
|
||||||
|
|
||||||
Fiber grade is identified by jacket color in TIA standards: OM3 is aqua (turquoise), OM4 is violet/eggplant, OM5 is lime green. OS2 single-mode is yellow. This color coding helps during physical inspection and fiber plant audits, though non-standard colors appear in some structured cabling brands and legacy installations.
|
|
||||||
|
|
||||||
When auditing a mixed-vintage fiber plant, don't assume color alone. If the jacket color is faded, non-standard, or unlabeled, continuity and loss testing combined with EMB characterization gives the authoritative answer. The cost of a fiber characterization pass before committing to a high-speed upgrade is far less than the cost of failed link commissioning on fiber that turned out to be the wrong grade.
|
|
||||||
|
|
||||||
**The practical recommendation**
|
|
||||||
|
|
||||||
New builds today should deploy OM4 as the baseline for multimode applications. It's the cost-effective standard, widely available, and provides adequate headroom for 25G/100G/400G applications within typical datacenter distances. If you specifically plan SWDM4 or want future-proofing for wideband multimode, OM5 is worth the modest premium.
|
|
||||||
|
|
||||||
Existing OM3 plants: evaluate reach requirements carefully before replacing. OM3 remains viable for 25G and 100G in many deployment scenarios. Plan OM4 replacement in the context of broader infrastructure refresh cycles.
|
|
||||||
|
|
||||||
Existing OM4 plants: there is no compelling reason to replace OM4 with OM5 for parallel SR applications. The upgrade scenario that makes sense is adding OM5 runs specifically for SWDM4 connections in locations where parallel fiber infrastructure is impractical.
|
|
||||||
@ -1,85 +0,0 @@
|
|||||||
---
|
|
||||||
title: "Enterprise Transceiver Lifecycle Management: Inventory, Standardization, and the Real Cost of a Fragmented Fleet"
|
|
||||||
slug: "transceiver-lifecycle-management-enterprise"
|
|
||||||
category: "Operations & Lifecycle"
|
|
||||||
tags: ["lifecycle management", "inventory", "standardization", "EoL", "fleet management", "enterprise"]
|
|
||||||
seo_focus_keyword: "enterprise transceiver lifecycle management inventory"
|
|
||||||
word_count_target: 1200
|
|
||||||
difficulty: intermediate
|
|
||||||
---
|
|
||||||
|
|
||||||
Most enterprise networks have a transceiver problem they haven't fully recognized yet. It looks like this: the datacenter runs six different SFP+ variants across three switch generations, the campus has four different 10G LR optics from three vendors, spare parts are scattered across three storage locations, and when a link fails at 11 PM, the technician on call spends 45 minutes determining whether the right spare actually exists before driving to get it. This is the real cost of a fragmented optic fleet — not the unit price of any individual module, but the accumulated operational tax of unmanaged diversity.
|
|
||||||
|
|
||||||
Transceiver lifecycle management is not glamorous infrastructure work. It rarely appears on a network roadmap. But for organizations with 500+ optics deployed across production infrastructure, the operational and financial impact of getting it right (or wrong) is substantial.
|
|
||||||
|
|
||||||
**What lifecycle management actually involves**
|
|
||||||
|
|
||||||
Lifecycle management for optical transceivers covers four distinct phases that are often handled separately — or not at all — in enterprise environments:
|
|
||||||
|
|
||||||
Procurement and standardization: which SKUs are approved, how purchasing is handled, how vendor selection is made.
|
|
||||||
|
|
||||||
Asset tracking: knowing where every module is, what it is, what firmware/DOM baseline applies, and when it was installed.
|
|
||||||
|
|
||||||
Operational monitoring: the DOM trending and alert management discussed in the proactive replacement article.
|
|
||||||
|
|
||||||
End-of-life planning: managing manufacturer EoL announcements, replacement roadmapping, and fleet upgrade cycles.
|
|
||||||
|
|
||||||
Each phase has failure modes that cost money and operational stability.
|
|
||||||
|
|
||||||
**The standardization argument**
|
|
||||||
|
|
||||||
The case for transceiver standardization is straightforward: every distinct SKU in your inventory represents a separate spare quantity to maintain, a separate vendor relationship to manage, and a separate set of documentation for operational staff. Multiplied across dozens of SKUs in a fragmented fleet, the management overhead is real.
|
|
||||||
|
|
||||||
Consider a 200-switch campus network that has accumulated the following 10G uplink optics across a decade of procurement: Cisco GLC-LH-SMD, Cisco SFP-10G-LR, Finisar FTLX1471D3BTL, Oplink TRS5020EN, and three different SKUs from a secondary market vendor. These modules are all functionally similar — 10G LR, duplex LC, OS2 fiber — but they have different DOM threshold values, different EEPROM vendor fields (which affects switch platform behavior), potentially different compatibility matrices for different switch generations, and definitely different support contact points.
|
|
||||||
|
|
||||||
A standardization effort reduces this to one approved SKU (or one per relevant use case: SR for short reach, LR for long reach) with one vendor relationship, one spare quantity to manage, and one set of monitoring profiles. The one-time labor cost of the standardization analysis and policy documentation is recovered in the first year through reduced operational complexity.
|
|
||||||
|
|
||||||
**How to inventory what you actually have**
|
|
||||||
|
|
||||||
Starting a lifecycle management program requires knowing the current state. For organizations without a disciplined asset tracking history, this means an inventory pass.
|
|
||||||
|
|
||||||
Automated inventory is the right approach at scale. Most modern network management platforms can poll SNMP for EEPROM data — specifically, the ifMfgName, ifSerialNum, and ifPartNumber OIDs available via the ENTITY-MIB or platform-specific MIBs. A Cisco-based network can be polled via SNMP to return the vendor, part number, serial number, and DOM values for every installed transceiver. Arista's eAPI provides the same data in JSON. Juniper's Junos supports NETCONF queries against the interface hardware table.
|
|
||||||
|
|
||||||
The output of an automated inventory sweep gives you a spreadsheet-equivalent of every installed transceiver: chassis, slot, port, vendor, part number, serial number, DOM values at time of poll, and installation indicator (you may be able to infer approximate installation date from uptime data or change management records).
|
|
||||||
|
|
||||||
This data is the foundation for everything else. Without it, lifecycle planning is guesswork.
|
|
||||||
|
|
||||||
**The hidden cost of EoL mismanagement**
|
|
||||||
|
|
||||||
Transceiver manufacturers publish end-of-life (EoL) notices with varying lead times — typically 6–12 months notice before last time to buy (LTBOB) and 18–36 months before end of support. Large OEMs like Cisco publish these on their Product Lifecycle pages with reasonable predictability. Third-party vendors are less consistent.
|
|
||||||
|
|
||||||
The failure mode is straightforward: an organization is running 200 units of a specific SFP28 module. The manufacturer announces EoL. The organization misses the announcement. The LTBOB date passes. The modules start failing (they're 7 years old; the bathtub curve is bending upward). Replacement procurement finds the module discontinued with no direct equivalent available. The replacement has a different part number, possibly different EEPROM vendor fields, and may require compatibility verification on the installed switch platform. Emergency procurement at scarcity pricing adds 30–40% to unit cost.
|
|
||||||
|
|
||||||
This scenario is not hypothetical. It plays out regularly in enterprises that don't track EoL status. The consequence is unnecessary cost and operational risk that a $200/year EoL monitoring subscription (or 4 hours of quarterly manual review) would have prevented.
|
|
||||||
|
|
||||||
**A practical lifecycle management framework**
|
|
||||||
|
|
||||||
For an organization with 500–5000 transceivers across campus and datacenter, the following framework is implementable without dedicated staff:
|
|
||||||
|
|
||||||
Tier 1 (critical path links): full DOM monitoring with trend alerts, proactive replacement at DOM threshold or age >7 years, documented spare quantities at 10% of deployed count minimum. This tier covers datacenter core/spine, WAN circuit-facing ports, and any link where outage causes direct business impact.
|
|
||||||
|
|
||||||
Tier 2 (important but redundant links): DOM monitoring without active trending alerts, reactive replacement with pre-positioned spares, age-triggered review at 8 years. Distribution layer uplinks, datacenter leaf-to-server for high-availability clusters.
|
|
||||||
|
|
||||||
Tier 3 (access and edge): replace-on-alarm, centralized spares rather than per-site, EoL monitoring only.
|
|
||||||
|
|
||||||
The tier assignment is a one-time exercise that maps to your network's logical topology. Tier 1 represents maybe 15–20% of your port count but 80% of your downtime risk.
|
|
||||||
|
|
||||||
**Spare inventory: the right quantity and location**
|
|
||||||
|
|
||||||
Spare transceiver strategy suffers from two failure modes: too few spares (discovered at 2 AM when the only spare is at another site) and too many spares (locked-up capital in modules that age out before use).
|
|
||||||
|
|
||||||
A working heuristic for Tier 1 spare quantities: 10% of deployed count per SKU, minimum 2 units, maximum 10 units for any single site. This handles the realistic range of simultaneous failures in most environments without building excessive inventory.
|
|
||||||
|
|
||||||
For Tier 2 and Tier 3, consolidated regional spares rather than per-site inventories reduce total spare count while maintaining reasonable replacement times. A regional spare kit with 5 units of each common SKU, staged at a central location with 4-hour delivery to all covered sites, is operationally adequate for non-critical links.
|
|
||||||
|
|
||||||
Physical spare storage matters. Transceivers are sensitive to static discharge, contamination, and temperature cycling. Store spares in their original packaging or ESD-safe containers, in a temperature-controlled environment, with the dust caps on connectors. Spares that have been stored loose in a toolbox for two years may have contaminated connector faces and degraded optical performance — you don't want to discover this during an emergency replacement.
|
|
||||||
|
|
||||||
**The fleet fragmentation trap and how to exit it**
|
|
||||||
|
|
||||||
The fragmented fleet rarely happens intentionally. It accumulates over time: each hardware refresh picks the best-priced optic available at the time, each emergency replacement uses whatever's available, each acquisition brings a different standard. Exiting the fragmentation trap requires an explicit decision to standardize, a defined migration path, and the organizational discipline to enforce purchasing policy going forward.
|
|
||||||
|
|
||||||
The migration path doesn't require a forklift replacement of all non-standard modules. It uses natural attrition: as modules fail or are replaced for other reasons, they are replaced with the approved standard SKU. New deployments follow the standard without exception. Within one to two hardware generations (7–10 years), the fleet converges.
|
|
||||||
|
|
||||||
The organizational discipline requirement is the hardest part. Someone needs to own the approved SKU list, approve exceptions, and enforce it through procurement processes. Without organizational ownership, the fragmentation reaccumulates within two years of any standardization effort.
|
|
||||||
|
|
||||||
The networks that manage this well treat optical transceivers like any other significant infrastructure component: documented standards, tracked assets, managed lifecycle, owned procurement. The ones that don't spend their operational budget cleaning up consequences.
|
|
||||||
@ -1,87 +0,0 @@
|
|||||||
---
|
|
||||||
title: "Cisco QSFP-28 Compatibility Enforcement: What NX-OS and IOS-XE Actually Check"
|
|
||||||
slug: "cisco-qsfp28-compatibility-list-nxos-iosxe"
|
|
||||||
type: deep-dive
|
|
||||||
category: "Vendor Compatibility"
|
|
||||||
tags: ["Cisco", "QSFP28", "NX-OS", "IOS-XE", "100G", "third-party optics", "compatibility"]
|
|
||||||
seo_focus_keyword: "Cisco QSFP28 compatibility NX-OS IOS-XE"
|
|
||||||
---
|
|
||||||
|
|
||||||
Every few months someone opens a ticket with us because their third-party QSFP-28 works fine in a Nexus 9300 but refuses to initialize in a Catalyst 9500. Same optic, same manufacturer, same part number. The answer is rarely simple, but there's a consistent logic underneath it once you understand what Cisco's compatibility stack actually checks at each layer.
|
|
||||||
|
|
||||||
## The PID Check and What It Covers
|
|
||||||
|
|
||||||
Cisco's transceiver enforcement begins with the Product ID (PID) string stored in the EEPROM at byte offset 168 of SFF-8472 (for SFP+) or in the Vendor Name/Vendor PN fields in SFF-8636 (for QSFP28). When a Cisco platform recognizes a transceiver, it queries the Vendor Name (bytes 148–163) and Vendor Part Number (bytes 168–183). It then performs a lookup against a compatibility matrix that is maintained in the platform's ROMMON/firmware and updated with software releases.
|
|
||||||
|
|
||||||
The PID check itself has two modes depending on platform and software version. On older NX-OS releases — 7.x and early 9.x — it was essentially a hard block: if the PID wasn't in the table, the port came up in err-disabled state or the transceiver showed as "not supported." On NX-OS 9.3(5) and later, Cisco introduced a tiered approach where unrecognized PIDs generate a syslog warning but don't necessarily disable the port. The behavior varies by line card, though, and that's where things get complicated.
|
|
||||||
|
|
||||||
## NX-OS: Checking What You've Got
|
|
||||||
|
|
||||||
On a Nexus platform, the canonical command for transceiver status is:
|
|
||||||
|
|
||||||
```
|
|
||||||
show interface ethernet 1/1 transceiver
|
|
||||||
```
|
|
||||||
|
|
||||||
This gives you the basic DOM (Digital Optical Monitoring) readout: temperature, voltage, TX/RX power, and bias current. More useful for compatibility diagnosis is:
|
|
||||||
|
|
||||||
```
|
|
||||||
show interface transceiver details
|
|
||||||
```
|
|
||||||
|
|
||||||
And on newer platforms, specifically for the compatibility state:
|
|
||||||
|
|
||||||
```
|
|
||||||
show hardware internal dev-port-map
|
|
||||||
```
|
|
||||||
|
|
||||||
But the most directly useful command when you're troubleshooting a compatibility failure is:
|
|
||||||
|
|
||||||
```
|
|
||||||
show interface ethernet 1/1 transceiver details | include supported
|
|
||||||
```
|
|
||||||
|
|
||||||
The output will tell you whether the optic is in one of three states: "calibrated and DOM-monitored," "unsupported transceiver type," or—the ambiguous one—"calibrated but unsupported." That last state means the module is electrically functional, DOM is readable, but Cisco won't commit to supporting it.
|
|
||||||
|
|
||||||
For IOS-XE platforms (Catalyst 9000 series), the equivalent is:
|
|
||||||
|
|
||||||
```
|
|
||||||
show interfaces GigabitEthernet0/0/0 transceiver
|
|
||||||
show interfaces transceiver supported-list
|
|
||||||
```
|
|
||||||
|
|
||||||
The `supported-list` command is genuinely useful: it outputs the PIDs in the platform's compatibility table for your specific chassis and line card combination, which saves the guesswork.
|
|
||||||
|
|
||||||
## Why the Same Optic Behaves Differently Across Line Cards
|
|
||||||
|
|
||||||
This is where most network engineers get confused. Cisco's compatibility matrix isn't monolithic—it's per-platform-per-ASIC. A Nexus 9300-EX line card uses a Cisco custom ASIC (referred to as Cloudscale in their documentation) with different firmware than the older Nexus 9300 non-EX cards running Trident-based ASICs. Each has its own compatibility table, and those tables are updated on different cadences.
|
|
||||||
|
|
||||||
The practical implication: a 100GBASE-SR4 QSFP-28 from a Flexoptix-programmed module (properly coded with the right OUI and vendor strings for Cisco compatibility) may work perfectly in a Nexus 93180YC-FX but generate a "transceiver type not supported" on a Nexus 9564PX. The difference isn't the optic—it's that the 9564PX's line card firmware has a more restrictive compatibility table that was updated in NX-OS 9.3(7) to add support for that particular PID string.
|
|
||||||
|
|
||||||
On Catalyst 9500 and 9600 platforms running IOS-XE, the ASICs are again different (Cisco Silicon One in the 9600 series), and the validation logic is embedded partly in the FPGA bitstream. Firmware-level updates to IOS-XE 17.x progressively loosened some of the stricter checks, but the 9600 series running 17.3.x still rejects certain third-party optics that work fine on 9500 platforms at 17.6.x.
|
|
||||||
|
|
||||||
## The Bypass Approach: What Works and What Doesn't
|
|
||||||
|
|
||||||
Cisco doesn't officially document a "third-party optic bypass" for NX-OS or IOS-XE in the same way Juniper documents `no-ddmi` or Arista offers `xcvr-unsupported-digital-data`. For NX-OS, the closest thing is the `service unsupported-transceiver` global command, which has been available since NX-OS 7.0(3)I6(1):
|
|
||||||
|
|
||||||
```
|
|
||||||
switch(config)# service unsupported-transceiver
|
|
||||||
```
|
|
||||||
|
|
||||||
This command doesn't disable the PID check entirely—it changes the platform's response from hard failure to soft warning. The port will come up, DOM data will be displayed, but the syslog will log `%PLATFORM-5-UNSUPPORTED_TRANSCEIVER` repeatedly. Depending on your NOC's alerting, this can get noisy fast.
|
|
||||||
|
|
||||||
On IOS-XE, there's no equivalent global override. The platform will either accept the optic or it won't. Cisco's position is that IOS-XE platforms require transceivers in the compatibility list. In practice, the compatibility list is updated frequently enough in major 17.x releases that this becomes a software management problem: if a third-party optic doesn't work, upgrading the platform software from 17.6 to 17.9 sometimes adds support without any other changes.
|
|
||||||
|
|
||||||
## DOM Thresholds and False Alarms
|
|
||||||
|
|
||||||
Even when a third-party QSFP-28 gets past the PID check, you can still end up with spurious threshold violations. Cisco's DOM display reads the standard SFF-8636 threshold fields, but then applies additional platform-level sanity checks. If the EEPROM thresholds in your optic are set too broadly—say, a receive power high-alarm threshold of +3 dBm when the link is running at -2 dBm—Cisco platforms will sometimes generate alarm conditions even though the optical power level is perfectly normal for the application.
|
|
||||||
|
|
||||||
The fix here is at programming time, not configuration time. If you're sourcing compatible third-party QSFP-28s, the EEPROM vendor strings and DOM threshold fields need to be programmed appropriately for the target platform. A well-programmed Cisco-compatible QSFP-28 SR4 should have its EEPROM Vendor Name field set to "CISCO-FLEXOPTIX" (or the appropriate OUI), TX/RX power thresholds consistent with the SR4 application (high alarm at +2.4 dBm, low warning at -7.3 dBm, per the 802.3bm spec), and the connector type byte set to 0x0C (MPO-12).
|
|
||||||
|
|
||||||
## Practical Checklist Before You Deploy
|
|
||||||
|
|
||||||
Before inserting a third-party QSFP-28 into a Cisco platform, it's worth taking 90 seconds to verify: first, check whether `service unsupported-transceiver` is already configured globally (some operators enable this as a matter of policy); second, verify your NX-OS or IOS-XE version against the transceiver vendor's compatibility matrix—this should be explicit, not implied; third, run `show platform` to confirm which ASIC generation your line card uses, since the same chassis can have multiple generations of line cards with different compatibility tables.
|
|
||||||
|
|
||||||
If a third-party optic fails after all that, pull it out and re-examine the EEPROM content with an SFF-8636 reader before assuming the optic is defective. Nine times out of ten, the PID string or vendor name field doesn't match what the platform's firmware expects, and that's a reprogramming problem, not a hardware problem.
|
|
||||||
|
|
||||||
The other 10 percent of the time, it's a genuine hardware compatibility issue—usually a CDR (Clock and Data Recovery) circuit that doesn't meet Cisco's internal PHY requirements for a specific ASIC. In those cases, no amount of EEPROM programming will fix it, and the right answer is a different SKU.
|
|
||||||
@ -1,70 +0,0 @@
|
|||||||
---
|
|
||||||
title: "Juniper EX vs QFX Optic Behavior: Why the Same Transceiver Works on QFX5100 but Alarms on EX4600"
|
|
||||||
slug: "juniper-optic-unlock-ex-qfx-series"
|
|
||||||
type: deep-dive
|
|
||||||
category: "Vendor Compatibility"
|
|
||||||
tags: ["Juniper", "EX4600", "QFX5100", "JunOS", "no-ddmi", "third-party optics", "transceiver alarms"]
|
|
||||||
seo_focus_keyword: "Juniper EX QFX transceiver compatibility"
|
|
||||||
---
|
|
||||||
|
|
||||||
Juniper has a reputation for being more open with third-party transceivers than Cisco, and in broad strokes that's accurate. But "more open" doesn't mean "consistent," and the differences between the EX and QFX product lines on this front have burned enough engineers that it's worth a detailed examination. A transceiver that initializes cleanly on a QFX5100 and passes DOM data without complaint can generate persistent alarm logs on an EX4600, even when the optical performance is identical.
|
|
||||||
|
|
||||||
## The Source of the Divergence: Different Chassis Management Paths
|
|
||||||
|
|
||||||
The QFX series and EX series share JunOS as their operating system, but the underlying chassis management frameworks differ significantly. QFX platforms—particularly the QFX5100, QFX5110, and QFX5200—use a stripped-down chassis management daemon that was designed with data center density in mind. Third-party transceiver support was treated as a practical necessity early in the QFX product lifecycle, partly because the hyperscale data center customers buying these platforms demanded it.
|
|
||||||
|
|
||||||
EX platforms, particularly the EX4600 and to some extent the EX4300 series, carry more of the enterprise-grade chassis management heritage from the EX4200 and EX8200 lineage. The chassis daemon on these platforms performs additional validation steps when a transceiver is inserted. In particular, it checks the EEPROM vendor fields against a soft compatibility list and, depending on JunOS version, will raise a `XCVR_UNSUPPORTED` or `XCVR_DOM_UNSUPPORTED` alarm for any transceiver not in that list—even if the port comes up and traffic passes normally.
|
|
||||||
|
|
||||||
## Specific JunOS Version Behavior
|
|
||||||
|
|
||||||
On JunOS 18.x running on EX4600 hardware, the behavior is: unrecognized transceivers initialize, the port goes up, but the chassis daemon logs repeated alarms of the form:
|
|
||||||
|
|
||||||
```
|
|
||||||
CHASSISD_XCVR_MODULE_UNSUPPORTED: FPC 0 PIC 0 PORT 1: Unsupported optics
|
|
||||||
```
|
|
||||||
|
|
||||||
These alarms don't take the port down, but they fire every few minutes, and on a switch with 48 ports of third-party optics, the syslog volume becomes operationally disruptive.
|
|
||||||
|
|
||||||
On JunOS 20.2R3 and later, Juniper introduced more granular alarm suppression for the EX4600, and the frequency of these unsupported-optics alarms decreased substantially. On JunOS 21.4 and 22.x, the behavior on EX4600 more closely approximates the QFX5100 behavior: the alarm fires once at insertion time and is then suppressed unless the optical parameters go out of range.
|
|
||||||
|
|
||||||
On QFX5100 running the same JunOS version as an EX4600, the unsupported-optic alarm typically fires once and doesn't repeat, because the QFX chassis daemon's alarm re-evaluation timer is set much longer. This is documented nowhere obvious—you discover it by comparing syslog archives from both platforms.
|
|
||||||
|
|
||||||
## The no-ddmi Workaround
|
|
||||||
|
|
||||||
Juniper provides a configuration knob specifically for third-party optics:
|
|
||||||
|
|
||||||
```
|
|
||||||
set chassis no-ddmi-information-polling
|
|
||||||
```
|
|
||||||
|
|
||||||
Or, for per-interface suppression (available on some platforms and JunOS versions):
|
|
||||||
|
|
||||||
```
|
|
||||||
set interfaces xe-0/0/1 optics-options no-alarm
|
|
||||||
```
|
|
||||||
|
|
||||||
The `no-ddmi-information-polling` command at the chassis level tells JunOS to stop polling DDMI (Digital Diagnostics Monitoring Interface) data from all transceivers. This eliminates the alarm-generation cycle because the chassis daemon never fetches the data that triggers the unsupported-module check.
|
|
||||||
|
|
||||||
The tradeoff is significant: you lose all DOM visibility across the entire chassis. Temperature, TX power, RX power, bias current—none of it appears in `show interfaces diagnostics optics`. For a 48-port ToR switch where you're relying on DOM data to catch degrading optics before they cause an outage, this is a meaningful operational sacrifice. We generally recommend against `no-ddmi-information-polling` as a blanket solution; it's a blunt instrument that trades one problem for another.
|
|
||||||
|
|
||||||
The per-interface `no-alarm` option is more surgical, but it's only available in JunOS 20.x and later, and its behavior differs between EX and QFX platforms. On QFX5100, `no-alarm` suppresses the DDMI-related alarms but preserves DOM data visibility. On EX4600 running pre-21.4 JunOS, the same configuration option suppresses alarms but also disables DOM polling on that interface, which is the opposite of the intended behavior.
|
|
||||||
|
|
||||||
## DOM Data Discrepancies Between EX and QFX
|
|
||||||
|
|
||||||
There's another subtle difference: the DOM data resolution. QFX5100 presents DDMI data in the standard SFF-8472/SFF-8636 floating-point format with full precision. EX4600, particularly on pre-20.x JunOS, rounds some DOM values to integer precision in its internal representation before displaying them. This means a transceiver measuring -3.2 dBm RX power shows as -3 dBm on the EX4600 and -3.20 dBm on the QFX5100.
|
|
||||||
|
|
||||||
This matters in practice for threshold alarm evaluation. If your low-warning threshold is set to -3.0 dBm and actual RX power is -3.1 dBm, the QFX triggers a warning alarm while the EX4600 doesn't—not because the optical power is different, but because of integer rounding in the EX's DOM display path.
|
|
||||||
|
|
||||||
## What the EEPROM Needs to Say
|
|
||||||
|
|
||||||
For transceivers targeting Juniper EX platforms specifically, the EEPROM programming needs to be more careful than for QFX. The Vendor Name field (SFF-8636 bytes 148–163) must match a known-good string. Juniper's internal compatibility check is primarily against the Vendor OUI (bytes 165–167, the IEEE OUI) and a soft-match on the Vendor PN. A properly Juniper-coded QSFP-28 should have the Vendor Name field set appropriately and the Vendor OUI matching the registered OUI of the transceiver manufacturer or the coded-for OUI in Juniper's list.
|
|
||||||
|
|
||||||
The Extended Specification Compliance byte (SFF-8636 byte 192) must also be set correctly. For 100GBASE-SR4, this byte should be 0x01. Leaving it at 0x00 or setting it to an application-specific value that Juniper doesn't recognize will cause the EX4600's chassis daemon to categorize the transceiver as "undefined," which triggers more aggressive alarm behavior than a recognized-but-unlisted optic.
|
|
||||||
|
|
||||||
## Which Platforms Are Actually Consistent
|
|
||||||
|
|
||||||
If you're deploying in a mixed EX/QFX environment and want consistent third-party optic behavior, the QFX5110 and QFX5120 are the most predictable on current JunOS (22.x/23.x). Both platforms have received the most attention in terms of third-party optic compatibility updates, and the chassis daemon behavior on these platforms more closely resembles the documented specification.
|
|
||||||
|
|
||||||
EX4300-48MP and EX4400 series behave better than the EX4600 on this front, partly because they run a newer chassis management stack. If you're stuck with EX4600 hardware and can't upgrade JunOS past 18.x for some platform-compatibility reason, the practical answer is to accept the alarm noise, filter it in your SIEM, and verify optical performance via periodic manual DOM polling rather than relying on automated alarm escalation.
|
|
||||||
|
|
||||||
The fundamental issue is that Juniper's product line spans hardware that was designed over roughly a 15-year period, and "consistent third-party optic support" was retrofitted onto platforms that didn't originally prioritize it. The EX4600 in particular was designed when Juniper's position on third-party optics was more restrictive than it is today. What you're seeing when an optic works on QFX5100 but alarms on EX4600 is that history playing out in your syslog.
|
|
||||||
@ -1,73 +0,0 @@
|
|||||||
---
|
|
||||||
title: "Arista EOS Optical Compatibility: Reading the xcvr Errors and Understanding the Open Stance"
|
|
||||||
slug: "arista-eos-optic-compatibility-xcvr-errors"
|
|
||||||
type: deep-dive
|
|
||||||
category: "Vendor Compatibility"
|
|
||||||
tags: ["Arista", "EOS", "QSFP28", "SFP28", "xcvr-missing", "DOM", "third-party optics"]
|
|
||||||
seo_focus_keyword: "Arista EOS optic compatibility xcvr errors"
|
|
||||||
---
|
|
||||||
|
|
||||||
Arista built its reputation partly on being the switch vendor that doesn't fight you about transceivers. For most of the company's history, that reputation was well-earned. EOS has historically been permissive about third-party optics by default—no service contract dependency, no hard blocks based on PID strings, no require-branded-optics enforcement. But "permissive by default" has always come with qualifications, and the error messages EOS generates around optics deserve a closer reading than they usually get.
|
|
||||||
|
|
||||||
## The Two Error Strings That Matter
|
|
||||||
|
|
||||||
When an optic has issues on an Arista platform, you'll typically see one of two error conditions in `show interfaces status` or `show interfaces ethernet 1/1 transceiver`:
|
|
||||||
|
|
||||||
**xcvr-missing**: The port has no transceiver installed, or the installed transceiver isn't being detected at all. This sounds obvious but is sometimes misleading—it can also appear when a transceiver is physically present but failing the electrical handshake. If you're seeing `xcvr-missing` on an occupied port, the first check is whether the optic is fully seated. The second check is whether the optic is electrically compatible with the port type. Inserting an SFP+ into a QSFP28 port with an adapter, for example, will sometimes show `xcvr-missing` rather than a type-mismatch error.
|
|
||||||
|
|
||||||
**xcvr-dom-not-supported**: This is the more interesting one. It appears when EOS can detect the transceiver and bring the port up, but the DDMI (Digital Diagnostic Monitoring Interface) data isn't readable—either because the transceiver doesn't implement the A2 page of SFF-8472 (for SFP+) or the upper memory pages of SFF-8636 (for QSFP28), or because the A0 address byte 92 (the "diagnostic monitoring type" byte) doesn't correctly indicate that real-time monitoring data is available.
|
|
||||||
|
|
||||||
A port showing `xcvr-dom-not-supported` can still pass traffic normally. The error is cosmetic in terms of link operation, but it means you have no visibility into optical power levels, temperature, bias current, or voltage from that interface.
|
|
||||||
|
|
||||||
## How Arista's Open Stance Works Mechanically
|
|
||||||
|
|
||||||
On EOS 4.20 and later, Arista's default behavior is to accept any transceiver that presents valid SFF-8472 or SFF-8636 EEPROM data and passes the MSA electrical interface tests. There is no PID whitelist enforcement in the default configuration. This is the architecture-level decision that distinguishes Arista from Cisco: the transceiver acceptance logic in EOS is capability-based rather than identity-based.
|
|
||||||
|
|
||||||
What EOS does check is the connector type byte, the transceiver type byte, and the compliance codes. If you plug a 10GBASE-SR SFP+ into a port configured for 25G, EOS will correctly refuse to bring the port up—not because of a compatibility blacklist, but because the speed negotiation doesn't match. This is correct behavior, not a third-party restriction.
|
|
||||||
|
|
||||||
On EOS versions prior to 4.15, there was a transitional period where some 40G QSFP+ ports would generate `xcvr-dom-not-supported` for any optic not in Arista's internal EEPROM vendor list, even if DOM data was actually present and readable. This was addressed in 4.15.2F, which rewrote the DDMI polling logic to query the A2 page directly rather than checking against a vendor list first.
|
|
||||||
|
|
||||||
## EOS Versions That Changed the Strictness
|
|
||||||
|
|
||||||
The most significant loosening came in EOS 4.20.x, which introduced explicit support for the "unknown transceiver" state: the port operates normally, DOM data is displayed if available, and the only indication of an unrecognized transceiver is a low-severity log entry rather than an operational alarm. Before 4.20, some platforms (particularly the 7050CX series) would disable DOM polling entirely for unrecognized transceivers, leading to the `xcvr-dom-not-supported` condition even when the optic was perfectly functional.
|
|
||||||
|
|
||||||
EOS 4.26 introduced `transceiver management`, a configuration subsystem that lets you explicitly set per-interface transceiver expectations. This is mostly useful for enforcing that specific ports always have the correct optic type installed—a data center compliance use case—but it also introduced `transceiver management permitted-xcvr-type` which, if misconfigured, can make an otherwise permissive EOS installation selectively restrictive. If you've upgraded to 4.26 or later and suddenly have new compatibility issues that didn't exist on 4.22, check whether someone has enabled transceiver management policies.
|
|
||||||
|
|
||||||
## DOM Display on Arista: What You Actually See
|
|
||||||
|
|
||||||
The `show interfaces ethernet 1/1 transceiver` command on EOS produces a clean output for supported optics:
|
|
||||||
|
|
||||||
```
|
|
||||||
Ethernet1/1 transceiver is present
|
|
||||||
type is 100GBASE-SR4
|
|
||||||
Manufacturer is Flexoptix
|
|
||||||
SN is FX123456789
|
|
||||||
Temperature is 32.07 Celsius
|
|
||||||
Tx Power is 2.73 dBm
|
|
||||||
Rx Power is -1.47 dBm
|
|
||||||
Bias Current is 55.17 mA
|
|
||||||
```
|
|
||||||
|
|
||||||
For an optic generating `xcvr-dom-not-supported`, the same command produces the top block (type, manufacturer, serial) but the DOM fields are absent. This means the EEPROM A0 page is readable (so transceiver type detection works) but the A2 page or upper memory pages are not properly configured.
|
|
||||||
|
|
||||||
The check command for distinguishing a real DOM problem from an EEPROM programming issue is:
|
|
||||||
|
|
||||||
```
|
|
||||||
show interfaces ethernet 1/1 transceiver detail
|
|
||||||
```
|
|
||||||
|
|
||||||
The `detail` output includes the raw EEPROM compliance bytes and indicates whether the DOM capability flag is set. If DOM capability is not asserted in the EEPROM but physical monitoring data is present on the A2 page, EOS won't poll it—the capability flag is authoritative. Fixing this requires reprogramming the optic's EEPROM byte 92 to correctly assert A2 page monitoring capability.
|
|
||||||
|
|
||||||
## Arista vs. Cisco: A Practical Comparison
|
|
||||||
|
|
||||||
The difference in the field is stark. A Cisco Catalyst 9500 with a third-party QSFP-28 that has the wrong Vendor PN in its EEPROM will refuse to bring the port up, full stop, unless you've upgraded to a JunOS version that added that PID to the compatibility table. An Arista 7050CX3 with the same optic will bring the port up, display whatever DOM data is available, and generate a log entry that says essentially "I don't recognize this specific optic but it looks electrically fine."
|
|
||||||
|
|
||||||
This matters operationally. With Arista, an improperly programmed EEPROM degrades your operational visibility but doesn't cause an outage. With Cisco, it can cause an outage until the programming is corrected or the software version is updated.
|
|
||||||
|
|
||||||
The practical lesson is that even on Arista platforms, optic EEPROM quality matters—just for different reasons. On Cisco, a bad EEPROM causes port failures. On Arista, it causes monitoring gaps. Neither outcome is acceptable in production.
|
|
||||||
|
|
||||||
## The 400G Wrinkle
|
|
||||||
|
|
||||||
On Arista's 400G platforms (7060X4, 7080X4, and similar), the permissive EOS stance runs into a hardware constraint: OSFP and QSFP-DD modules use Arista's in-house thermal management system to maintain optic temperatures within safe operating bounds. The thermal management system needs to know the optic type to set fan curves correctly. For recognized optics, this happens automatically. For unrecognized QSFP-DD optics, EOS 4.28 and later will accept the optic but fall back to a conservative (higher fan speed) thermal profile.
|
|
||||||
|
|
||||||
This isn't a compatibility block—the port comes up—but in a dense deployment, the fan speed increase across a full chassis of unrecognized 400G optics generates enough acoustic noise to be a practical concern in some environments. If 400G deployment is in your near-term roadmap, verifying that your optic vendor's 400G QSFP-DD PIDs are recognized by Arista is worth more than a cursory check.
|
|
||||||
@ -1,54 +0,0 @@
|
|||||||
---
|
|
||||||
title: "100GBASE-SR4 Over OM3/OM4/OM5: Real-World Distance Limits vs. What the Spec Sheet Says"
|
|
||||||
slug: "100g-sr4-multimode-distance-limits-om3-om4-om5"
|
|
||||||
type: analysis
|
|
||||||
category: "Fiber & Cabling"
|
|
||||||
tags: ["100GBASE-SR4", "OM3", "OM4", "OM5", "multimode fiber", "QSFP28", "distance limits"]
|
|
||||||
seo_focus_keyword: "100GBASE-SR4 distance limits OM3 OM4"
|
|
||||||
---
|
|
||||||
|
|
||||||
The 802.3bm specification is clear enough on paper. 100GBASE-SR4 over OM3 runs 70 meters. Over OM4, 100 meters. Over OM5, 150 meters with SWDM4 (though that's a different standard). Network engineers quote these numbers in design documents, procurement teams buy fiber accordingly, and then somewhere in the commissioning process someone discovers the link won't train at 85 meters over OM3 fiber that, on paper, should make it. The gap between specification distance and operational distance has causes, and most of them are predictable.
|
|
||||||
|
|
||||||
## The Spec's Assumptions Are Ideal
|
|
||||||
|
|
||||||
The 802.3bm distance specifications are derived from a link power budget model that assumes: a specific fiber bandwidth-distance product (OM3 is specified at 2000 MHz·km at 850 nm, OM4 at 4700 MHz·km), maximum connector loss of 1.5 dB per mating, a maximum of 2 connectors per channel, launch conditions within the restricted launch area (RLA) definition, and no significant bend-induced loss.
|
|
||||||
|
|
||||||
Pull any one of those assumptions and the 70-meter or 100-meter number becomes optimistic. In practice, few real-world fiber installations are running 2 connectors total in a 70-meter run. Data center fiber infrastructure typically involves a patch panel at each end plus the patch cords, so you're looking at a minimum of 4 connector matings, not 2. Each pair of connectors at 1.5 dB loss per mating adds 3 dB that wasn't in the original spec budget, and SR4 only has about 2.6 dB of total link budget margin at the OM3 rated distance.
|
|
||||||
|
|
||||||
Do the math: four connector matings at 1.5 dB apiece consumes 6 dB of budget on connectors alone, while the entire SR4 specification only allocates 1.9 dB for connector loss (at the specified 2-connector assumption). This is why SR4 links fail at distances well inside the specification—the specification assumes an installation quality that doesn't match typical data center cabling practice.
|
|
||||||
|
|
||||||
## Modal Bandwidth: The Real Ceiling
|
|
||||||
|
|
||||||
For SR4, the distance-limiting factor under real-world conditions isn't usually fiber attenuation—850 nm over OM3 or OM4 has attenuation of roughly 3.5 dB/km and 3.0 dB/km respectively, which at sub-100m distances is trivially low. The limiting factor is modal bandwidth.
|
|
||||||
|
|
||||||
100GBASE-SR4 runs four lanes at 25.78125 Gbps each over parallel fiber (or wavelength division via SWDM). At 25G per lane, the NRZ signal has a bandwidth requirement that approaches the upper boundary of OM3's effective modal bandwidth at distances beyond about 60 meters. OM3's minimum effective modal bandwidth (EMB) of 2000 MHz·km translates to approximately 2.0 GHz at 1 km, or equivalently 2000 GHz at 1 meter—which sounds like a lot until you realize that 25G NRZ requires something like 12.5 GHz of bandwidth and EMB scales inversely with distance.
|
|
||||||
|
|
||||||
At 70 meters over OM3, you're operating at a modal bandwidth of roughly 28 GHz—just barely sufficient for 25G NRZ with the standard's margin assumptions. If your specific fiber spool has EMB closer to the minimum specification (some OM3 fiber is closer to 2000 MHz·km than to the typical installed value of 3500 MHz·km), 60 meters can become the practical limit rather than 70.
|
|
||||||
|
|
||||||
OM4 fiber, with a minimum EMB of 4700 MHz·km, gives you considerably more headroom—at 100 meters, effective bandwidth is around 47 GHz, which provides genuine margin for real-world losses.
|
|
||||||
|
|
||||||
## Connector Loss: The Dominant Variable
|
|
||||||
|
|
||||||
In practice, most SR4 link failures before the rated distance trace to connector loss rather than fiber bandwidth. The IEC 61754-7 specification for MTP/MPO connectors allows up to 0.5 dB insertion loss per mating (the standard defines high-performance as under 0.35 dB). But field-installed MPO connectors in data centers frequently measure 0.8–1.2 dB, especially after several matings and moderate contamination.
|
|
||||||
|
|
||||||
An SR4 link running 70 meters over OM3 with four connector matings at 1.0 dB average would see 4 dB of connector loss alone—approaching the full link power budget of roughly 1.9 dB channel insertion loss plus the 0.7 dB power penalty budget. That link will either fail to train or will operate with essentially zero margin, making it sensitive to any further optical degradation.
|
|
||||||
|
|
||||||
The connector insertion loss problem is compounded in SR4 specifically because it's parallel optics: a 4x25G MPO interface means all 8 fibers in a 12-fiber MPO (4 TX, 4 RX, 4 unused for SR4) must have acceptable loss simultaneously. A single fiber with 2 dB connector loss will cause that lane's power level to drop below the receiver's sensitivity floor even while the other three lanes are fine.
|
|
||||||
|
|
||||||
## Bend Radius and Where It Sneaks Up
|
|
||||||
|
|
||||||
Bend-induced loss in multimode fiber is often overlooked because OM3/OM4 has relatively good bend performance—but it's not zero. The minimum bend radius for conventional OM3/OM4 is typically 30mm under pulling tension and 50mm for installed cables. Inside a cable management tray with tight radii, or in a patch panel with a 1U cable entry radius under 30mm, OM3/OM4 can add 0.1–0.3 dB of additional loss per tight bend at 850 nm.
|
|
||||||
|
|
||||||
On an SR4 link that's already at the edge of its budget due to multiple connectors, those small bend losses are the straw that breaks the link. The solution isn't cable management prayer—it's build margin into the design from the start.
|
|
||||||
|
|
||||||
## When SR4 Fails Before the Spec Predicts
|
|
||||||
|
|
||||||
If an SR4 link fails before its rated distance, the diagnostic sequence is: first, measure connector loss at each MPO interface with an insertion loss meter (not a visual fault locator—an actual power meter). Second, check individual fiber polarity. SR4 uses a 12-fiber MPO in a specific polarity type (Type B for a direct connection), and wrong polarity means TX fibers are connected to TX fibers, which results in no signal at all rather than degraded signal. Third, verify the actual fiber category: OM3 cables are aqua, OM4 is typically aqua or violet, OM5 is lime green—but cable label markings have been wrong enough times that it's worth verifying with an OTDR if the distance is marginal.
|
|
||||||
|
|
||||||
The practical design rule we use: for OM3, plan SR4 distances to 50 meters for high-reliability installations (zero margin anxiety), or 60 meters if you're confident in your connector quality and can verify loss after installation. For OM4, 80 meters is the real-world safe ceiling unless you can verify every connector mating is under 0.35 dB. The last 20 meters of OM4 specification distance are for installers who take fiber contamination personally.
|
|
||||||
|
|
||||||
## OM5 and the SWDM4 Story
|
|
||||||
|
|
||||||
OM5 was standardized to enable short wavelength division multiplexing (SWDM4) over a single fiber pair, supporting 40G and 100G over a 2-fiber MPO or LC duplex connection. For 100G applications, this means SWDM4 at four wavelengths: 850, 880, 910, and 940 nm.
|
|
||||||
|
|
||||||
However, 100GBASE-SR4 does not use SWDM4. It uses parallel 4-fiber-pair transmission. OM5 fiber is backward compatible with SR4 and will operate at the full 100-meter distance over OM4 (OM5 meets OM4 minimum EMB specs), but you get no additional distance over SR4 by switching to OM5. The 150-meter OM5 number applies to SWDM4-capable transceivers, which are a separate SKU from QSFP-28 SR4. Conflating the two is a mistake that's easy to make when reading fiber vendor marketing materials.
|
|
||||||
@ -1,56 +0,0 @@
|
|||||||
---
|
|
||||||
title: "Fiber Connector Contamination: The $50 Problem That Kills $5,000 Transceivers"
|
|
||||||
slug: "fiber-connector-cleaning-protocol-iec-61300"
|
|
||||||
type: guide
|
|
||||||
category: "Fiber & Cabling"
|
|
||||||
tags: ["fiber cleaning", "MPO", "LC connector", "IEC 61300-3-35", "contamination", "one-click cleaner", "field maintenance"]
|
|
||||||
seo_focus_keyword: "fiber connector cleaning protocol"
|
|
||||||
---
|
|
||||||
|
|
||||||
The most common cause of transceiver failure in data center environments isn't heat or electrostatic discharge or a bad EEPROM—it's a dirty connector end face. It costs $0 to clean a connector properly and it costs nothing to develop the habit. The optics community talks about fiber cleaning the way dentists talk about flossing: everyone knows they should do it, almost nobody does it consistently, and the consequences show up later at inconvenient times.
|
|
||||||
|
|
||||||
## What Contamination Actually Is
|
|
||||||
|
|
||||||
Under a 400x fiber inspection scope, a "clean" connector end face is a polished ceramic ferrule with the fiber core centered in the middle and no visible scratches, pits, or contamination. What you're looking for under magnification falls into distinct categories: scratches (permanent, require refinishing or replacement), contamination (removable), and chips or fractures (permanent damage, replace the connector).
|
|
||||||
|
|
||||||
Contamination types have different sources and different impacts. Dust particles typically sit on the cladding and cause modest insertion loss increases of 0.1–0.5 dB depending on particle size and proximity to the core. Oil contamination—fingerprints, skin oil transferred during handling, lubricants from cable jacketing—is more insidious because oil spreads across the end face and into the core area, causing insertion loss of 0.3–3 dB and, more critically, can become semi-permanent if it polymerizes under laser exposure.
|
|
||||||
|
|
||||||
This last point is worth emphasizing: the laser in a transceiver operates at power levels that are low enough to be eye-safe (Class 1) but high enough to cause a process called photobleaching or thermal damage when focused through oil contamination onto the fiber core. After sustained laser exposure, oil contamination on an LC connector end face can bake into a partially transparent film that doesn't wipe off cleanly. At that point, you're replacing the connector or the patch cord—not just cleaning it.
|
|
||||||
|
|
||||||
## IEC 61300-3-35: The Standard You Should Know
|
|
||||||
|
|
||||||
IEC 61300-3-35 is the international standard for fiber optic connector end face inspection. It defines four inspection zones on a ferrule end face: Zone A is the fiber core (0–25 μm radius from center), Zone B is the cladding (25–120 μm), Zone C is the adhesive (120–130 μm), and Zone D is the contact zone on the ferrule (130–250 μm).
|
|
||||||
|
|
||||||
The standard specifies maximum allowable defect sizes per zone. Zone A allows zero defects for single-mode and a maximum scratch width of 3 μm for multimode. Zone B allows scratches up to 10 μm wide for single-mode. The standard further differentiates between scratches, which are linear marks, and dig/pits, which are point defects.
|
|
||||||
|
|
||||||
Most automatic fiber inspection probes (Viavi P5000i, AFL FI-7000, EXFO FIP-435B) can perform IEC 61300-3-35 grading automatically: insert the probe, press a button, and get a pass/fail based on the standard. The "fail" output tells you what zone the defect is in and whether it's a scratch, pit, or contamination. For a data center operation doing high-volume cable work, automated inspection is the only practical approach—manual interpretation of 400x microscope images at scale is slow and inconsistent.
|
|
||||||
|
|
||||||
## One-Click Cleaners vs. Cassette Cleaners vs. Wet/Dry
|
|
||||||
|
|
||||||
Three cleaning technologies dominate the field:
|
|
||||||
|
|
||||||
**One-click cleaners** (Fujikura NTT-AT CT-30, Ilsintech CLE series) are cartridge-based tools that advance a fresh section of cleaning fabric with each stroke. For LC and SC connectors, they're the fastest method: cap off, click, inspect, done. The one-click cleaner works best for lightly contaminated connectors—dust and light oil. For heavily contaminated connectors with dried oil or particulates that have adhered to the core, a single stroke may not be sufficient.
|
|
||||||
|
|
||||||
**Cassette cleaners** (Cletop-S, AFL CassetteClean) use a ribbon fabric that you pull past the connector end face manually. These give slightly more control over cleaning pressure and number of strokes, which makes them preferable for stubborn contamination. The tradeoff is that you can reuse sections of fabric that have already collected contamination, which transfers dirt back to a connector if you're not disciplined about advancing to fresh fabric.
|
|
||||||
|
|
||||||
**Wet/dry cleaning** uses an IPA (isopropyl alcohol, 99%+ purity) swab or drop on the end face followed immediately by a dry wipe. This is the most effective method for heavy oil contamination. The wet step dissolves and lifts the oil, the dry step removes it before it can re-deposit. The critical detail is "immediately"—IPA evaporates in seconds, and if you apply IPA and then hesitate before wiping, the evaporation process can concentrate residue rather than removing it.
|
|
||||||
|
|
||||||
For MPO/MTP connectors, cleaning is more complex. The 12 or 24 fiber cores in an MPO ferrule can't be individually accessed with standard one-click cleaners. MPO-specific tools (Fujikura CT-70, Optikos CleanBlast Pro) use a wider cleaning surface designed for the array format. Compressed air alone is never sufficient—it moves debris around the end face rather than removing it, and can drive particles into the ferrule bore where they're impossible to remove without disassembly.
|
|
||||||
|
|
||||||
## The Field Cleaning Discipline That Actually Works
|
|
||||||
|
|
||||||
The single most important habit is: always inspect before you connect, and always inspect after you clean. The inspection-clean-inspect loop sounds redundant, but it's the only way to know whether your cleaning action succeeded or whether it moved contamination from one zone to another.
|
|
||||||
|
|
||||||
The second most important habit is: cap everything that isn't connected. Dust caps exist for a reason. A fiber port sitting uncapped in a rack is accumulating dust continuously, and the dust concentration in typical data center air handling environments is high enough to contaminate a connector end face in under an hour of exposed operation.
|
|
||||||
|
|
||||||
For patch cord storage, the caps that come with transceivers are adequate for short-term protection but not for long-term storage or repeated re-use. If you're maintaining a spare parts inventory, store patch cords in sealed bags, not just with the rubber caps that came on them.
|
|
||||||
|
|
||||||
The connectors that get overlooked most often are the ones on the transceiver side—the LC or MPO receptacle inside the transceiver housing. These are small, recessed, and difficult to inspect with standard probes. Transceiver receptacle cleaners (Push-type cleaners sized for LC, 1.25mm and 2.5mm versions, or MPO transceiver cleaners) address this. The contamination that forms on an uncapped transceiver receptacle from months in a storage drawer or on a shelf contributes directly to insertion loss when the transceiver is installed.
|
|
||||||
|
|
||||||
## Scope Magnification and What Each Level Shows
|
|
||||||
|
|
||||||
A 200x scope shows you gross contamination—large particles, obvious smears. It's adequate for quick field screening but not for IEC 61300-3-35 compliance. At 400x, you can distinguish scratch width and identify contamination zone by zone. At 800x and above (available on some lab-grade microscopes), you can see polishing quality and micro-scratches that aren't visible at 400x.
|
|
||||||
|
|
||||||
For production data center work, 400x auto-inspection probes cover the gap between "fast but blind" and "slow but thorough." For splicing quality verification or characterizing connector damage during an RCA, a 400x bench microscope with calibrated measurement overlay is worth having. For field work during a maintenance window at 2 AM with a flashlight in one hand, a one-click cleaner and a handheld 200x probe is the realistic baseline.
|
|
||||||
|
|
||||||
The principle is calibrated: clean fiber connectors is one of the few infrastructure elements where a $50 cleaning kit and 30 seconds of discipline can prevent a $5,000 transceiver return and an unplanned maintenance window.
|
|
||||||
@ -1,52 +0,0 @@
|
|||||||
---
|
|
||||||
title: "100G Form Factor Fragmentation: CFP, CFP2, CFP4, QSFP28, and What's Actually Dead"
|
|
||||||
slug: "cfp2-cfp4-qsfp28-form-factor-migration-100g"
|
|
||||||
type: analysis
|
|
||||||
category: "Transceivers & Form Factors"
|
|
||||||
tags: ["CFP", "CFP2", "CFP4", "QSFP28", "OSFP", "100G", "form factor", "carrier", "enterprise"]
|
|
||||||
seo_focus_keyword: "CFP2 CFP4 QSFP28 100G form factor comparison"
|
|
||||||
---
|
|
||||||
|
|
||||||
When 100G first came to market around 2010–2012, there were four separate form factor efforts underway simultaneously, each backed by different parts of the industry with different design priorities. The result was a fragmentation problem that still echoes through procurement decisions today: a 100G transceiver is not a transceiver, it's a family of six distinct physical formats, several of which are technically alive but commercially marginal, and the correct choice depends on who you ask and what decade their equipment was designed in.
|
|
||||||
|
|
||||||
## How CFP, CFP2, and CFP4 Differ
|
|
||||||
|
|
||||||
CFP (C Form-factor Pluggable) was the first standardized 100G form factor, defined by the CFP MSA starting around 2009. The original CFP module is enormous by modern standards: 144.75mm × 82mm × 13.6mm, drawing up to 32 watts. The physical size was driven by the optical and electrical component technology available in 2009—coherent DSP chips were large, high-power, and required substantial heat management. CFP was designed primarily for carrier coherent applications: 100G DWDM, OTN transport, submarine-class interfaces.
|
|
||||||
|
|
||||||
CFP2 arrived around 2013, roughly half the volume of CFP at 54mm wide, with a power budget reduced to 12W for most applications. The density improvement was significant: a linecard that could hold two CFP modules could now hold four CFP2 modules. This made CFP2 the preferred format for next-generation coherent linecards—Cisco's NCS 5500, Ciena's 6500, Nokia's 1830 PSS—and it remains the dominant form factor for 100G and 200G coherent applications in carrier gear today.
|
|
||||||
|
|
||||||
CFP4 is smaller still: 40mm wide, roughly quarter the size of the original CFP, with a power limit of about 6W for non-coherent applications. CFP4 was designed as a high-density client-side form factor, primarily for 100GBASE-LR4 and 100GBASE-ER4 in campus and metro applications. The market timing was unfortunate: by the time CFP4 production volumes were sufficient to drive prices down to competitive levels, QSFP28 had captured most of the enterprise market and CFP4 was left without a clear constituency.
|
|
||||||
|
|
||||||
## QSFP28: Why It Won the Enterprise Market
|
|
||||||
|
|
||||||
QSFP28 is mechanically identical to QSFP+ (40G), which is mechanically identical to QSFP (also 40G) from the original Finisar design. The dimensional continuity was a deliberate strategy: equipment vendors could design linecards with QSFP28 ports that were backward compatible with QSFP+ optics for 40G applications, giving customers a migration path from 40G to 100G without replacing linecards.
|
|
||||||
|
|
||||||
At 18.35mm × 72.4mm and a maximum power budget of 3.5W for standard applications (up to 7W for enhanced thermal variants), QSFP28 offered density that CFP4 couldn't match—36 ports per 1U linecard in standard designs versus 20 ports per 1U for CFP4. When the 802.3bm standard formalized 100GBASE-SR4 and 100GBASE-LR4 in QSFP28 packaging in 2015, the enterprise market converged rapidly.
|
|
||||||
|
|
||||||
By 2018, QSFP28 was the standard form factor for enterprise 100G deployments. CFP4 never recovered a distinct market position. Today, CFP4 transceivers are manufactured in small quantities for specific applications—predominantly 100GBASE-LR4 in older chassis that were designed with CFP4 slots—and prices are higher than QSFP28 equivalents because of low volume, not because of superior technology.
|
|
||||||
|
|
||||||
## What's Still Being Shipped in Carrier vs. Enterprise
|
|
||||||
|
|
||||||
The carrier/enterprise divergence is the key to understanding why multiple form factors persist.
|
|
||||||
|
|
||||||
In carrier optical transport networks, CFP2 is actively shipping in significant volume in 2024 and 2025. The reason is coherent optics. 100G and 200G coherent transceivers for DWDM transport remain CFP2 form factor because coherent DSP implementations require power budgets (typically 12–18W) that QSFP28 can't handle thermally. The coherent optical market has been slow to adopt smaller form factors for high-power applications—CFP2-DCO (Digital Coherent Optic) modules from Ciena, Acacia (now Cisco), and II-VI (now Coherent Corp.) are still the standard for backbone transport provisioning.
|
|
||||||
|
|
||||||
The picture is changing with CFP2-ACO (Analog Coherent Optic) and more recently with QSFP-DD and OSFP coherent solutions for 400G ZR applications. But for 100G coherent in existing carrier linecards, CFP2 is not going away on any near-term horizon—the installed base of CFP2-capable router linecards is enormous, and operators have no economic incentive to replace functioning infrastructure.
|
|
||||||
|
|
||||||
In enterprise networks, CFP, CFP2, and CFP4 are effectively legacy formats except in specific legacy equipment contexts. Any new enterprise purchase for 100G short-reach or medium-reach applications should be QSFP28 unless the hardware forces otherwise. The pricing difference is significant: a QSFP28 100GBASE-LR4 typically runs 30–40% less than an equivalent CFP4 module due to volume economics, and a QSFP28 100GBASE-SR4 is typically under €100 at market rates versus €200+ for a CFP4 equivalent.
|
|
||||||
|
|
||||||
## Where OSFP and QSFP-DD Fit
|
|
||||||
|
|
||||||
OSFP (Octal Small Form-factor Pluggable) and QSFP-DD (Quad Small Form-factor Pluggable Double Density) are the current 400G form factors that are increasingly relevant for 100G discussions. Both support 400G via 8×50G lanes, but both can also operate in breakout configurations at 4×100G.
|
|
||||||
|
|
||||||
QSFP-DD was designed with backward compatibility to QSFP28 in mind—a QSFP-DD port can accept a QSFP28 module in most implementations, which provides a migration path similar to QSFP28's backward compatibility with QSFP+. OSFP is larger and has higher power budget (15W vs. 12W for QSFP-DD) but does not accept QSFP28 modules. The OSFP vs. QSFP-DD competition is still ongoing in 400G infrastructure, with Arista and Cisco favoring QSFP-DD while Juniper's QFX platforms support both.
|
|
||||||
|
|
||||||
For 100G applications specifically, neither OSFP nor QSFP-DD adds anything over QSFP28—the cost savings from running native QSFP28 100G optics are clear. Where OSFP and QSFP-DD become relevant is in the migration from 100G to 400G without chassis replacement.
|
|
||||||
|
|
||||||
## The Practical Implication for Procurement
|
|
||||||
|
|
||||||
When you're ordering replacement optics for existing infrastructure, the form factor question is answered by the hardware: if the slot is CFP2, you need CFP2. The interesting decisions arise during new deployments or upgrades.
|
|
||||||
|
|
||||||
For any new 100G switch/router deployment in enterprise, QSFP28 is the unambiguous answer—the density, pricing, and ecosystem support are superior to all alternatives. For carrier coherent applications, CFP2 or CFP2-DCO remains the practical standard for linecards designed in the 2015–2022 window. For new 400G-capable infrastructure that needs to handle 100G in the near term, QSFP-DD slots with QSFP28 backward compatibility offer the best migration path.
|
|
||||||
|
|
||||||
The CFP form factor ecosystem isn't dead—it's stratified. CFP2 coherent is a healthy market with active development. CFP4 is a narrow market for legacy deployments. Original CFP is end-of-life for new designs and increasingly difficult to source at scale. Treating "CFP" as a monolith misses the carrier/enterprise split that explains why these form factors have such different trajectories.
|
|
||||||
@ -1,58 +0,0 @@
|
|||||||
---
|
|
||||||
title: "Transceiver Inventory Management: Why Excel Breaks at Scale and What a CMDB Actually Needs"
|
|
||||||
slug: "transceiver-inventory-management-excel-cmdb"
|
|
||||||
type: guide
|
|
||||||
category: "Operations & Management"
|
|
||||||
tags: ["inventory management", "CMDB", "transceiver", "end-of-life", "serial number", "network operations"]
|
|
||||||
seo_focus_keyword: "transceiver inventory management CMDB"
|
|
||||||
---
|
|
||||||
|
|
||||||
The network team of most organizations manages transceiver inventory in one of three ways: a shared Excel spreadsheet that nobody believes anymore, a CMDB that was populated accurately once in 2019 and hasn't been updated since, or a proprietary NMS module that tracks interfaces but not the physical optics in them. All three approaches eventually produce the same outcome: a reactive purchase at a premium price because someone discovered at 11 PM that they're out of the right optic for a failing interface.
|
|
||||||
|
|
||||||
The argument for doing this properly isn't purely operational hygiene. There are real financial consequences to transceiver inventory mismanagement, and they compound: over-purchasing creates dead stock that becomes end-of-life before deployment, under-purchasing creates emergency procurement situations with 3-5x cost premiums, and lack of per-port serial number tracking makes warranty claims and failure analysis nearly impossible.
|
|
||||||
|
|
||||||
## Why Excel Fails at Scale
|
|
||||||
|
|
||||||
The specific failure modes of spreadsheet-based transceiver tracking are predictable. The first is concurrent update conflicts: when two network engineers update the same spreadsheet simultaneously, one overwrites the other's changes. This is tolerable at 50 transceivers and catastrophic at 5,000. The second is search and filter limitations—Excel can filter by column value, but correlating "which ports have optics approaching end-of-support" requires cross-referencing three or four columns in ways that demand intermediate knowledge of pivot tables or VLOOKUPs that most network operations staff don't maintain.
|
|
||||||
|
|
||||||
The third, and most consequential, failure mode is schema drift. A spreadsheet that starts as "Part Number | Location | Quantity" gains columns over time: install date, procurement cost, PO number, firmware version, assigned engineer. Within 18 months, the schema is inconsistent—some rows have serial numbers, others don't; some locations are rack-level, others are port-level; "Type" means different things in rows populated by different people. At this point, the spreadsheet is a collection of data that can't be queried reliably.
|
|
||||||
|
|
||||||
## The Fields That Actually Matter
|
|
||||||
|
|
||||||
A functional transceiver CMDB record needs a specific set of fields, not an exhaustive one. The temptation is to track everything; the operational requirement is to track what you act on.
|
|
||||||
|
|
||||||
**Physical identity**: Serial number (mandatory, per optic), manufacturer part number, and the Flexoptix or vendor SKU. The serial number is the primary key—everything else is an attribute of a specific physical module. Part number lets you query inventory by type.
|
|
||||||
|
|
||||||
**Location**: Chassis hostname, slot/linecard, port. This needs to be port-granular, not rack-level. "IDF-3 rack 4" is useless when you're troubleshooting a DOM alarm at 2 AM. "core-switch-01 ethernet 1/24" is actionable.
|
|
||||||
|
|
||||||
**Status**: Installed, spare, failed-RMA, decommissioned. Spares need location too—which shelf, which bin. "Spare" without a physical location is notional inventory.
|
|
||||||
|
|
||||||
**Lifecycle**: Install date, purchase date, first-seen-in-network date, vendor end-of-support date, vendor end-of-life date. These four dates tell you the full lifecycle picture. Purchase date and install date can differ significantly (you may buy inventory six months before deployment), and both differ from the date a device first appeared in a network scan.
|
|
||||||
|
|
||||||
**Financial**: Purchase price, PO reference. Useful for depreciation accounting and for budget forecasting when a model goes end-of-life.
|
|
||||||
|
|
||||||
**Operational**: Firmware version (for optics with updatable firmware, like some coherent modules), last DOM reading timestamp, last physical inspection date. DOM readings in the CMDB are optional but valuable for trend analysis.
|
|
||||||
|
|
||||||
## End-of-Life Tracking: The Audit You Don't Want to Discover Reactively
|
|
||||||
|
|
||||||
Transceiver end-of-life dates are published by manufacturers on varying schedules, are often buried in product bulletin PDFs, and frequently change. The CMDB needs a process for ingesting these updates, not just a field to store them.
|
|
||||||
|
|
||||||
The practical approach is a three-tier alert system based on the end-of-life date: green when more than 24 months remain on support, yellow at 12–24 months (begin planning replacements in the next budget cycle), red at under 12 months (active replacement planning required). This maps to budget cycles in most organizations: a red-tier optic with 8 months of support remaining needs a replacement project scoped in the current quarter.
|
|
||||||
|
|
||||||
The end-of-life problem is particularly acute for transceivers because the replacement SKU for a discontinued optic may not be a drop-in substitute. A 10GBASE-ZR SFP+ that goes end-of-life from one manufacturer may be replaced with a different form factor or different wavelength specifications from the available alternatives. That's not a simple swap—it requires validation, and validation takes time. The 12-month yellow-tier alert exists specifically to create that time.
|
|
||||||
|
|
||||||
## Per-Port Serial Number Discipline
|
|
||||||
|
|
||||||
The most common omission in transceiver CMDBs is serial number tracking at the port level rather than the inventory level. "We have 200 QSFP-28 SR4 modules, here's the batch" is almost useless for operational purposes. "Port Ethernet 1/24 on core-switch-01 has serial number FX123456789, installed 2023-04-15" is actionable for failure analysis, warranty claims, and mean-time-between-failure calculations.
|
|
||||||
|
|
||||||
Collecting serial numbers is not difficult—every transceiver EEPROM contains the vendor serial number in SFF-8472 bytes 68–83 (for SFP+) or SFF-8636 bytes 196–211 (for QSFP28). On most switch platforms, `show interfaces ethernet 1/24 transceiver detail` or the equivalent includes the serial number in its output. The challenge is systematic collection: doing this manually at initial installation is error-prone, and doing it retrospectively for an existing network is a project.
|
|
||||||
|
|
||||||
The automation-friendly approach is to script regular collection of serial numbers from all switch interfaces (via SNMP ifIndex with Entity-MIB extensions, or via vendor APIs for Arista eAPI, Juniper NETCONF, or Cisco RESTCONF) and feed the results into the CMDB. Most modern network automation frameworks (Nautobot, NetBox, Ansible with NAPALM) can pull transceiver serial numbers as part of routine inventory collection. The data is there; the gap is usually the CMDB workflow to consume it.
|
|
||||||
|
|
||||||
## The Reactive Audit Scenario and How to Avoid It
|
|
||||||
|
|
||||||
The moment when poor transceiver inventory discipline becomes expensive is the reactive audit: a vendor announces end-of-sale on a specific SKU with 90 days' notice, and someone asks "how many of these do we have in production and what are they going to cost to replace?" If the answer requires manually SSHing into 200 switches and running `show inventory`, you're looking at days of work and a procurement decision made under time pressure.
|
|
||||||
|
|
||||||
The proactive equivalent—a CMDB query that returns part numbers, locations, counts, and end-of-life dates in a report that runs in 30 seconds—costs roughly the same to build as one reactive audit takes to perform. The difference is whether you build it before or after you need it.
|
|
||||||
|
|
||||||
Organizations that manage transceiver inventory well typically have three things: an automated serial number collection job that runs weekly and updates a CMDB, an end-of-life notification process tied to manufacturer announcements, and a quarterly review cycle where the CMDB report generates the replacement forecast for the next hardware budget. None of this is technologically complicated. It's workflow discipline, and it pays for itself the first time a 90-day end-of-sale notice arrives on a SKU you have 400 units of deployed across the network.
|
|
||||||
@ -1,54 +0,0 @@
|
|||||||
---
|
|
||||||
title: "100G ZR Coherent Pluggables and Timing: Why These Transceivers Care About PTP and SyncE"
|
|
||||||
slug: "100g-zr-coherent-pluggable-timing-ptp-synce"
|
|
||||||
type: deep-dive
|
|
||||||
category: "Coherent Optics"
|
|
||||||
tags: ["100G ZR", "coherent", "PTP", "SyncE", "timing", "QSFP28", "DSP", "DWDM"]
|
|
||||||
seo_focus_keyword: "100G ZR coherent pluggable timing PTP SyncE"
|
|
||||||
---
|
|
||||||
|
|
||||||
The 100G ZR specification (OIF-400ZR and its 100G subset implementations, as well as the slightly older OpenROADM coherent pluggable standards) introduced a category of transceiver that behaves fundamentally differently from SR4 or LR4 optics in ways that aren't immediately obvious from the QSFP28 form factor they share. The most overlooked of these differences is timing sensitivity. A 100GBASE-SR4 optic doesn't know or care about the time of day. A 100G ZR coherent module contains a DSP that absolutely does.
|
|
||||||
|
|
||||||
## What's Inside a Coherent Pluggable
|
|
||||||
|
|
||||||
A QSFP28 SR4 module contains a VCSEL array, a photodetector array, and passive optical components. The signal encoding is straightforward NRZ (Non-Return-to-Zero) at 25.78125 Gbps per lane. There's no local oscillator, no carrier phase recovery, no DSP performing coherent signal processing.
|
|
||||||
|
|
||||||
A 100G ZR module—take the Acacia (Cisco) QSFP28 ZR or the Lumentum QSFP28 DWDM coherent as examples—contains a narrow-linewidth tunable laser, an IQ modulator, a coherent receiver with 90° optical hybrid, and a coherent DSP chip. The modulation format is DP-QPSK (Dual Polarization Quadrature Phase Shift Keying) for 100G at 50 GHz spacing. The DSP performs chromatic dispersion compensation, polarization mode dispersion tracking, carrier frequency recovery, and phase noise compensation—all in real time.
|
|
||||||
|
|
||||||
The coherent DSP needs a frequency reference to maintain its internal timing and, more critically, to define the DSP's FEC (Forward Error Correction) frame timing. If the DSP's frequency reference drifts, the FEC frame alignment drifts, and once the FEC frame is misaligned, the error correction engine stops working and BER (Bit Error Rate) rises sharply. The transition from functional link to failed link can happen in seconds when timing loss occurs.
|
|
||||||
|
|
||||||
## Frequency vs. Phase: What Coherent DSPs Need
|
|
||||||
|
|
||||||
The precision timing requirements for coherent pluggables exist at two levels: frequency accuracy and phase alignment.
|
|
||||||
|
|
||||||
Frequency accuracy affects the coherent DSP's ability to lock its carrier recovery loop to the incoming optical signal. The local oscillator (the tunable laser) and the incoming optical carrier from the far end must be within the DSP's carrier recovery pull-in range, which is typically ±1.5 GHz for modern coherent receivers. This frequency accuracy requirement is met by the laser tuning accuracy, not by network timing—it's a hardware specification of the coherent module itself.
|
|
||||||
|
|
||||||
Phase alignment is where SyncE and PTP matter. Coherent pluggables used in OTN (Optical Transport Network) or Ethernet transport roles often need to pass through timing information from one end to the other. More directly relevant: the host router or switch port feeding the coherent pluggable must provide a sufficiently clean transmit clock to the module. The ZR specification requires that the host-side electrical interface provide a clock with accuracy better than ±20 ppm under normal conditions and better than ±100 ppm under holdover.
|
|
||||||
|
|
||||||
## The PTP Connection: Why It's Not Just for Telcos Anymore
|
|
||||||
|
|
||||||
PTP (Precision Time Protocol, IEEE 1588-2008 and the newer IEEE 1588-2019) distributes sub-microsecond timing accuracy across packet networks. In the telecom world, PTP is mandatory for LTE and 5G base station timing. In the coherent transport world, PTP becomes relevant when the coherent transport link itself needs to be timestamped or when the coherent module participates in a timing chain.
|
|
||||||
|
|
||||||
For 100G ZR specifically, PTP matters in two scenarios. First, if the ZR link is carrying timing-sensitive traffic (SyncE over Ethernet, 1588 timing streams), the coherent DSP needs to preserve timing transparency—it cannot introduce asymmetric delay that would corrupt PTP offset calculations. Second, if the router port that hosts the ZR module is a PTP Boundary Clock (BC) or Transparent Clock (TC), the ZR link's latency characteristics need to be known and stable for the BC/TC to account for link-side delay correctly.
|
|
||||||
|
|
||||||
Modern coherent ZR modules from Coherent Corp., Acacia/Cisco, and Lumentum specify a per-module propagation delay and a delay variation (jitter) floor. The propagation delay through the DSP is typically in the range of 1–3 μs, which is significant for PTP sub-microsecond applications. The delay variation—the variation in DSP processing time between packets—is typically under 100 ns, which is within acceptable bounds for most G.8275.1 (telecom profile) PTP applications.
|
|
||||||
|
|
||||||
## SyncE: The Physical Layer Timing Standard
|
|
||||||
|
|
||||||
SyncE (Synchronous Ethernet, defined by ITU-T G.8262 and G.8264) distributes frequency synchronization via the Ethernet physical layer clock. The idea is simple: the Ethernet PHY on a SyncE-capable port slaves its transmit clock to the received clock, making the physical layer timing chain a frequency distribution network.
|
|
||||||
|
|
||||||
The interaction with coherent pluggables is subtle. A 100G ZR module that is used as the physical layer for a SyncE link needs to preserve the input clock frequency across the coherent DWDM span. The ZR specification requires that the module's clock recovery from the host electrical interface be SyncE-transparent—meaning the module retains the timing information encoded in the electrical lane and forwards it optically to the far end.
|
|
||||||
|
|
||||||
Not all 100G ZR implementations are equally SyncE-transparent. Some first-generation ZR implementations used their internal DSP clock as the retiming reference, effectively breaking the SyncE chain across the coherent span. This was a known issue with certain early Acacia modules and was addressed in firmware updates. Before deploying 100G ZR in a SyncE timing chain, verify that the specific module firmware version is SyncE-transparent. This is documented in vendor release notes but is frequently missed during evaluation.
|
|
||||||
|
|
||||||
## What Happens When You Ignore Timing Requirements
|
|
||||||
|
|
||||||
The failure mode for ignoring timing requirements in a coherent ZR deployment is not dramatic—the link typically comes up and passes traffic initially. The problems emerge over time.
|
|
||||||
|
|
||||||
First: frequency wander. If the host router port is not providing a stable frequency reference (because SyncE is not configured, or because the port's reference clock is coming from a free-running oscillator rather than a locked source), the coherent DSP's frequency tracking loop will see long-term frequency drift. The DSP's acquisition range is wide enough to handle this for weeks or months, but eventually the cumulative drift can exceed the pull-in range and the link will drop. The troubleshooting path is non-obvious because the link was working fine the previous week.
|
|
||||||
|
|
||||||
Second: timing chain corruption. In a network where the coherent ZR link is part of a PTP timing path, a SyncE-opaque ZR module introduces an asymmetric delay that biases PTP offset calculations. This appears as a slowly growing time error on PTP slaves downstream of the coherent link—the clocks appear stable but are systematically offset from true time.
|
|
||||||
|
|
||||||
Third: holdover failure. Coherent DSPs in ZR modules maintain an internal holdover oscillator to ride through brief reference clock interruptions. The holdover accuracy is typically ±100 ppm for 24 hours (per G.8262 SyncE ESEC specification). If the network relies on ZR modules for timing distribution and the reference clock fails, the DSP's holdover quality determines how long the timing chain remains within acceptable bounds before alarms are triggered.
|
|
||||||
|
|
||||||
The summary for operators: deploy 100G ZR in timing-sensitive networks only after confirming SyncE transparency in the specific firmware version you're running, verify that the host router port provides a SyncE-locked or PTP-disciplined reference clock, and document the ZR DSP propagation delay for any PTP Boundary Clock calculations. These checks take less than an hour on a lab unit before deployment and prevent a category of subtle failure that is otherwise very difficult to diagnose in production.
|
|
||||||
@ -1,54 +0,0 @@
|
|||||||
---
|
|
||||||
title: "Burn-In Testing Transceivers Before Deployment: What 72 Hours Catches That Incoming Inspection Misses"
|
|
||||||
slug: "optic-burn-in-testing-deployment-infant-mortality"
|
|
||||||
type: guide
|
|
||||||
category: "Testing & Quality"
|
|
||||||
tags: ["burn-in testing", "transceiver testing", "infant mortality", "quality assurance", "optical modules", "data center"]
|
|
||||||
seo_focus_keyword: "transceiver burn-in testing before deployment"
|
|
||||||
---
|
|
||||||
|
|
||||||
The failure rate of optical transceivers follows a pattern that engineers familiar with the Weibull distribution or the bathtub curve will recognize immediately: elevated failures in the first hours to days of operation (infant mortality), a long stable period of low failure rate (useful life), and eventual wear-out failures at end of life. The infant mortality region is the one that burn-in testing addresses, and the time investment is straightforwardly justified by the cost of discovering those failures in production.
|
|
||||||
|
|
||||||
## The Infant Mortality Curve for Optical Modules
|
|
||||||
|
|
||||||
The physics of early-life failures in transceivers are dominated by three mechanisms: VCSEL (Vertical Cavity Surface Emitting Laser) defects that manifest under sustained forward bias, solder joint micro-fractures that propagate under thermal cycling, and EEPROM data corruption that surfaces when the module is first powered in a live environment.
|
|
||||||
|
|
||||||
VCSEL defects are the most common. A transceiver that has never been operated may contain a VCSEL array where one or more emitters has a crystalline defect at the p-n junction. These defects don't cause immediate failure at room temperature—they pass initial electrical testing, they pass optical power measurements at room temperature. Under sustained operation at elevated temperatures (a QSFP28 in a dense switch runs its internal components at 45–75°C depending on airflow and ambient), these defects propagate. A VCSEL that measures -1.0 dBm at room temperature after 10 minutes of operation may measure -3.5 dBm after 48 hours at 70°C internal temperature.
|
|
||||||
|
|
||||||
Solder joint micro-fractures follow a similar pattern. The thermal cycling from room temperature to operating temperature—repeated over the first 24–48 hours of operation—stresses solder joints that have marginal formation. A joint that is electrically continuous at room temperature may become intermittent after 10–15 thermal cycles. The failure signature is intermittent optical power dropout rather than a clean dead module.
|
|
||||||
|
|
||||||
EEPROM issues are rarer but exist. Some early-life failures trace to EEPROM cells that stored data correctly at the time of manufacture but have marginal retention characteristics. The module passes all tests in the factory but loses calibration data after being powered for the first time in a customer environment.
|
|
||||||
|
|
||||||
## What 72-Hour Soak Testing Catches
|
|
||||||
|
|
||||||
A standard burn-in protocol runs modules under continuous electrical and optical load for 72 hours at elevated temperature (typically 70°C for QSFP28 modules, consistent with the upper end of the commercial temperature range). The 72-hour duration is derived from empirical data on VCSEL defect propagation rates: most infant mortality failures in VCSELs manifest within the first 48 hours at elevated temperature; 72 hours provides a margin that catches the slower-propagating defects without running into the useful-life failure curve.
|
|
||||||
|
|
||||||
What this catches that incoming inspection misses: any failure mode that requires sustained thermal stress to manifest. Incoming inspection typically involves a 15–30 minute functional test at room temperature: power on, verify optical output, check DOM data, verify electrical interface, done. This catches dead-on-arrival modules but not marginal modules.
|
|
||||||
|
|
||||||
A marginal module that passes incoming inspection will either fail in production within the first week—at an inconvenient time, requiring an emergency maintenance window—or, if the defect is slow-progressing, will degrade gradually over 3–6 months and generate chronic low-power alarms before eventual failure. Neither outcome is acceptable in environments where uptime matters.
|
|
||||||
|
|
||||||
The 72-hour burn-in catches approximately 85–90% of infant mortality failures, based on published data from module manufacturers' internal testing and from hyperscale data center operators who have shared aggregated failure statistics. The remaining 10–15% fail in the first week of production but survive the burn-in—typically because their failure mechanism is triggered by specific traffic patterns or mechanical stress in the production environment rather than purely thermal stress.
|
|
||||||
|
|
||||||
## Practical Burn-In Rack Setup for High-Volume Deployments
|
|
||||||
|
|
||||||
A burn-in rack for transceivers doesn't need to be elaborate, but it needs to provide three things: sustained optical load (active data transmission or loopback), controlled temperature, and monitoring.
|
|
||||||
|
|
||||||
The most common setup uses a rack-mounted switch or media converter platform specifically configured for burn-in duty, with all ports occupied and looped back using fiber loopback connectors. For QSFP28 SR4, a simple fiber loopback (connecting the TX MPO to the RX MPO) is sufficient—the module transmits into its own receiver, DOM data shows active optical power, and thermal load is representative of production conditions.
|
|
||||||
|
|
||||||
Temperature is managed either by placing the burn-in rack in a chamber (preferred for controlled conditions) or by restricting airflow to allow natural convection heating to bring the module temperature up to range. Most QSFP28 modules operating in a low-airflow environment with active loopback will reach 60–70°C internal temperature within 30 minutes. An IR thermometer on the external QSFP28 cage shows external temperatures of 40–50°C when internal module temperatures are in the 60–70°C range.
|
|
||||||
|
|
||||||
Monitoring during burn-in should capture DOM data at regular intervals—every 5 minutes is adequate. The monitoring output should track TX power, RX power, temperature, and bias current over time. Automated monitoring with threshold alerting is preferable to manual checks: you want to know if TX power drops by 1 dB between hour 24 and hour 48, because a 1 dB drift is the early indicator of a VCSEL defect before the module fails completely.
|
|
||||||
|
|
||||||
For organizations doing less than 50 modules per quarter, a commercial burn-in platform (Spirent AX/100G test chassis, or a repurposed ToR switch) is usually sufficient. For higher volumes—major data center buildouts or cloud infrastructure deployments consuming hundreds of QSFP28 modules per month—dedicated test equipment from EXFO, Spirent, or Viavi with automated pass/fail logging and per-serial-number records provides traceability that pays off during vendor warranty claims.
|
|
||||||
|
|
||||||
## The Economics: When Does Burn-In Pay for Itself?
|
|
||||||
|
|
||||||
The calculation is straightforward. An infant mortality failure discovered in production costs: an unplanned maintenance window (minimum 2–4 hours of engineer time), potential service impact (varies enormously by deployment context), and the replacement optic cost. In a carrier-grade or critical infrastructure environment, the maintenance window cost alone exceeds €500–€2,000 in labor and potential SLA exposure.
|
|
||||||
|
|
||||||
A burn-in rack running 48 ports continuously has a setup cost of roughly €3,000–€10,000 depending on the platform and instrumentation chosen, amortized over the rack's useful life of 5+ years. The per-module cost of burn-in time and labor is typically €5–€15 per module. That cost is recovered from the first 2–3 infant mortality failures avoided.
|
|
||||||
|
|
||||||
The break-even analysis depends on your failure rates and your cost of downtime. For enterprise deployments with tolerant maintenance windows, burn-in may not be economically justified at low volumes. For data center, carrier, or any application where an optical failure causes automated failover events, service alarms, or SLA exposure, burn-in is justified from the first deployment. The right answer depends on knowing your actual infant mortality rate from your transceiver supplier, which is something worth asking for explicitly.
|
|
||||||
|
|
||||||
## The Incoming Inspection That Still Matters
|
|
||||||
|
|
||||||
Burn-in testing doesn't replace incoming inspection—it complements it. Incoming inspection catches DOA modules (typically 0.1–0.5% of a large batch) and EEPROM programming errors before they're installed. Burn-in catches marginal modules that pass inspection. Running both in sequence means a module that makes it into production has been functional for at least 72 hours under thermal stress, has verified DOM data, and has passed a clean incoming inspection. That's a defensible position when your infrastructure director asks why you spent the extra 72 hours before a major deployment.
|
|
||||||
@ -1,50 +0,0 @@
|
|||||||
---
|
|
||||||
title: "100GHz vs 50GHz DWDM Channel Plans: The C-Band Math and Why Your Old Gear Limits You More Than You Think"
|
|
||||||
slug: "dwdm-channel-plan-100ghz-vs-50ghz-c-band"
|
|
||||||
type: deep-dive
|
|
||||||
category: "DWDM & Coherent"
|
|
||||||
tags: ["DWDM", "channel plan", "C-band", "100GHz", "50GHz", "flex-grid", "EDFA", "optical amplifier"]
|
|
||||||
seo_focus_keyword: "DWDM channel plan 100GHz 50GHz C-band"
|
|
||||||
---
|
|
||||||
|
|
||||||
The C-band (conventional amplification band) in optical communications spans roughly 1530 nm to 1565 nm—the wavelength range over which Erbium-Doped Fiber Amplifiers (EDFAs) provide practical gain. The ITU-T has divided this spectrum into channels with two dominant spacings that remain relevant in deployed networks today: 100 GHz spacing (ITU-T G.694.1 fixed grid) and 50 GHz spacing (also G.694.1, but the denser variant). Understanding which works in your network and which doesn't requires clarity on the math, the equipment constraints, and where the real bottleneck usually lives.
|
|
||||||
|
|
||||||
## The C-Band Arithmetic
|
|
||||||
|
|
||||||
The ITU-T 100 GHz channel grid defines center frequencies at 193.1 THz + n×100 GHz, where n is an integer (positive, negative, or zero). In wavelength terms, the reference is 1552.52 nm (193.1 THz), and channels are separated by approximately 0.8 nm. The full C-band at 100 GHz spacing provides approximately 40 usable channels from C17 (196.1 THz, 1528.77 nm) to C61 (192.1 THz, 1560.61 nm), depending on EDFA bandwidth and system design margin.
|
|
||||||
|
|
||||||
At 50 GHz spacing, you double the channel count to approximately 80 channels in the same spectral region. Each channel occupies half the spectral width, which has direct implications for the modulation format—a 50 GHz channel occupies a spectral slot that's much tighter, requiring narrower optical filter passbands and modulation formats with lower spectral occupancy. For 10G and 25G per channel, this is manageable. For 100G per channel over 50 GHz spacing with legacy modulation formats (NRZ OOK), the spectral efficiency requirements are extremely tight and filter narrowing starts to impair signal integrity.
|
|
||||||
|
|
||||||
The flex-grid standard (ITU-T G.694.1 amendment) moves away from fixed channel positions entirely, defining spectrum as a series of 12.5 GHz slots that can be allocated in any combination. Flex-grid is the native habitat of modern coherent DWDM—a 100G DP-QPSK signal needs roughly a 37.5 GHz slot (3 × 12.5 GHz), while a 400G DP-16QAM signal needs approximately 75 GHz. Flex-grid lets you mix and match channel widths, which maximizes spectral efficiency across a mixed-rate DWDM system.
|
|
||||||
|
|
||||||
## Why 100 GHz DWDM Equipment Limits Expansion
|
|
||||||
|
|
||||||
The ROADM (Reconfigurable Optical Add-Drop Multiplexer) generation determines what's possible in your optical network. ROADMs from the 2005–2012 era were designed around fixed 100 GHz channel plans using thin-film filter (TFF) technology. TFF filters have a fixed passband of approximately 80 GHz FWHM (Full Width at Half Maximum) at the specified center frequency. They literally cannot pass a 50 GHz-spaced channel—the adjacent channel falls within the filter's stopband.
|
|
||||||
|
|
||||||
WSS (Wavelength Selective Switch) based ROADMs from the 2012–2018 period use liquid crystal on silicon (LCoS) technology with programmable filter shapes, but the first-generation WSS designs (Finisar WSS-1×9, JDSU/II-VI equivalents) typically have a minimum achievable passband of around 37.5 GHz and were characterized for 50 GHz channel spacing. These can support 50 GHz channels but not true flex-grid fractional slots.
|
|
||||||
|
|
||||||
Second-generation WSS ROADMs with flex-grid capability (available from Lumentum, Finisar, and II-VI from around 2015 onward) support 12.5 GHz granularity. These are what coherent 400G systems require, and if your ROADM nodes predate 2015, the answer to "can we deploy 400G ZR on our DWDM network" is probably "not without node upgrades."
|
|
||||||
|
|
||||||
The EDFA gain flatness profile is the second constraint. C-band EDFAs have a gain spectrum that is inherently not flat—the gain is higher around 1530–1535 nm and lower around 1555–1560 nm. Gain flattening filters (GFFs) embedded in EDFA amplifier units compensate for this, but GFFs are designed for a specific channel loading scenario. An EDFA designed for 40-channel × 100 GHz loading with a specific tilt compensation in the GFF will have a different residual gain tilt when loaded with 80-channel × 50 GHz operation. This isn't a catastrophic failure, but it means the optical power levels per channel shift, and your system's OSNR (Optical Signal-to-Noise Ratio) margin calculations change.
|
|
||||||
|
|
||||||
## Wideband Amplifiers and the L-Band Option
|
|
||||||
|
|
||||||
Standard C-band EDFAs cover 1530–1565 nm. Wideband C+L amplifiers extend coverage to include the L-band (1565–1625 nm), effectively doubling the available spectrum. This is the capacity expansion path for systems that have fully loaded the C-band and can't reduce channel spacing further.
|
|
||||||
|
|
||||||
The practical implication of adding L-band is cost and complexity: C+L amplification requires separate amplifier paths for C and L bands (combined into a single module in modern designs but still requiring separate pump lasers and gain media stages), and the ROADM nodes require WSS elements characterized for the full C+L spectral range. Not all existing ROADM node designs have an L-band upgrade path.
|
|
||||||
|
|
||||||
For networks that are capacity-constrained on existing C-band infrastructure, the evaluation path is: first, can channel spacing be reduced from 100 GHz to 50 GHz? (Requires WSS-capable ROADMs with sub-50 GHz filter granularity and coherent transceivers with adequate spectral efficiency.) Second, can flex-grid allocation improve spectral efficiency by right-sizing channels? (Requires second-generation WSS ROADMs.) Third, if C-band is fully exploited, is C+L upgrade viable? (Requires assessment of every ROADM node in the path.) In most cases, the bottleneck in the first two assessments turns out to be equipment generation, not fiber capacity.
|
|
||||||
|
|
||||||
## The Coherent Modulation Connection
|
|
||||||
|
|
||||||
The 100 GHz vs. 50 GHz question has a direct dependency on which modulation format your transponders use. Legacy 10G DWDM systems used OOK (On-Off Keying) with optical duobinary or NRZ modulation, occupying 10–20 GHz of spectrum per channel—easily accommodated in 100 GHz spacing with margin to spare. 100G DP-QPSK occupies roughly 37.5 GHz in the OIF-100G-SR specification, fitting into a 50 GHz channel with 12.5 GHz guard band. 100G DP-16QAM (used in high-capacity short-haul systems) occupies approximately 25 GHz, fitting into a 50 GHz channel with more margin.
|
|
||||||
|
|
||||||
The 400G case is where spacing starts to bite: 400G DP-16QAM at 64 GBaud occupies approximately 75 GHz of spectrum. At 100 GHz channel spacing, a 400G channel fits with 25 GHz guard band. At 50 GHz spacing, a 400G channel won't fit. This is why networks designed for 50 GHz channel spacing have limited 400G capacity if they can't migrate to flex-grid operation.
|
|
||||||
|
|
||||||
## What You Should Actually Plan For
|
|
||||||
|
|
||||||
The architecture guidance that applies most broadly: any new DWDM infrastructure investment should be flex-grid capable from the outset. The incremental cost of flex-grid WSS hardware over fixed-grid hardware is modest—typically under 10% of the WSS node cost—and it's the difference between a system that can accommodate 400G and beyond versus one that's locked into 100G channel rates.
|
|
||||||
|
|
||||||
For existing 100 GHz infrastructure, the practical capacity expansion options before a full ROADM replacement are: migration to coherent 100G on existing channels (replacing legacy 10G OOK transponders with coherent 100G, which doesn't increase channel count but multiplies per-channel capacity by 10), and evaluation of whether WSS-capable ROADMs in the network can support 50 GHz re-spacing. If even one legacy TFF-based ROADM node exists in the path, 50 GHz migration requires that node to be upgraded first.
|
|
||||||
|
|
||||||
The 10-year-old DWDM gear constraint is real and specific: TFF-based amplified spontaneous emission (ASE) levels, fixed filter passbands, and non-flex-grid WSS elements are not software upgradeable. The bottleneck is the optical hardware, and identifying exactly which nodes in a multi-span DWDM network are the limiting element is the prerequisite for any capacity planning discussion.
|
|
||||||
@ -1,60 +0,0 @@
|
|||||||
---
|
|
||||||
title: "400G ZR Interoperability Reality: Which Vendor Pairs Actually Work and What to Test Before You Commit"
|
|
||||||
slug: "400g-zr-interoperability-matrix-testing"
|
|
||||||
type: analysis
|
|
||||||
category: "Coherent Optics"
|
|
||||||
tags: ["400G ZR", "interoperability", "OIF", "coherent", "QSFP-DD", "DSP", "DWDM"]
|
|
||||||
seo_focus_keyword: "400G ZR interoperability vendor matrix"
|
|
||||||
---
|
|
||||||
|
|
||||||
The OIF (Optical Internetworking Forum) 400ZR Implementation Agreement was ratified in 2020 with the explicit goal of enabling multi-vendor interoperability in 400G coherent DWDM. Two and a half years after the first compliant modules shipped, the honest assessment is: interoperability works, within well-defined conditions, and there are enough asterisks on the "works" to fill a whitepaper. Knowing which vendor pairs have been validated and what the OIF implementation agreement actually covers—versus what it leaves to vendor interpretation—is essential before committing a production network to a specific configuration.
|
|
||||||
|
|
||||||
## What OIF-400ZR Specifies
|
|
||||||
|
|
||||||
The OIF-400ZR IA defines a specific, narrow operating point: 400G using DP-16QAM modulation, FEC using OpenFEC (a concatenated CFEC/UFEC scheme), at a single carrier frequency within the C-band, with a maximum reach of approximately 120 km on a compensated link (EDFA-amplified, no inline DCM) or 80 km on an uncompensated link.
|
|
||||||
|
|
||||||
The IA specifies the modulation format, baud rate (approximately 60 GBaud), spectral occupancy (approximately 75 GHz), FEC algorithm, DSP framing, OTU4 client mapping, and key electrical interface parameters for the QSFP-DD host connector. Within these constraints, a module from Vendor A should be interoperable with a module from Vendor B.
|
|
||||||
|
|
||||||
What the IA does not specify: the specific DSP implementation, laser linewidth characteristics beyond the minimum requirement, pre-emphasis and equalization algorithms, and—critically—firmware update sequences and initialization timing. Each DSP vendor (Acacia, InPhi/Marvell, Coherent Corp./II-VI, Lumentum, Broadcom) implements the coherent signal processing differently, and these differences are the source of most practical interoperability issues.
|
|
||||||
|
|
||||||
## The DSP Version Problem
|
|
||||||
|
|
||||||
The most consequential compatibility issue in 400G ZR deployment is DSP firmware versioning. The OIF-400ZR IA defines the protocol, but each DSP generation implements that protocol with different FEC coefficients, different carrier recovery loop parameters, and different chromatic dispersion compensation ranges.
|
|
||||||
|
|
||||||
A specific example: early DSP implementations used a CD (Chromatic Dispersion) acquisition range of ±8,000 ps/nm. The specification required a minimum of ±2,400 ps/nm (for uncompensated links up to 120 km on G.652 fiber, which accumulates roughly 2,000–2,400 ps/nm of CD). Early Acacia AC400 and early InPhi Porrima (CP200) DSP implementations had acquisition ranges well within spec but differed in how they signaled acquisition state to the host. If one end acquired lock and began transmitting live traffic before the far end had completed its initial carrier recovery, the mismatched initialization state caused a transient failure that resolved itself within seconds but occasionally triggered the host router's interface error threshold and bounced the link.
|
|
||||||
|
|
||||||
This specific issue was addressed in firmware updates released in 2021–2022 for most first-generation 400ZR DSP implementations. But it illustrates the class of problem: OIF-400ZR compliance means protocol compliance, not implementation-level behavioral compatibility, and the implementation differences show up in edge cases like acquisition timing, fault recovery, and behavior under marginal OSNR.
|
|
||||||
|
|
||||||
## Which Vendor Pairs Have Been Validated
|
|
||||||
|
|
||||||
The most current publicly available interoperability validation data comes from two sources: the OIF's own interoperability demonstrations (conducted at OFC and other industry events since 2021) and operator validation reports from major telcos and cloud providers who have published their findings.
|
|
||||||
|
|
||||||
Validated pairs that have been publicly demonstrated to operate in bidirectional coherent ZR mode include:
|
|
||||||
|
|
||||||
Acacia (Cisco) AC400/AC1200 with InPhi (Marvell) Porrima CP200 and CP400: demonstrated at OFC 2021 and 2022, with confirmed firmware version requirements published. This pair works reliably at firmware revisions specified in the OIF demo documentation.
|
|
||||||
|
|
||||||
Lumentum 400ZR QSFP-DD with Coherent (II-VI) CFP2-DCO ZR: validated in lab testing by multiple European operators. The CFP2 to QSFP-DD pairing works because the ZR specification is form-factor independent—the optical interface is the standard; the electrical host interface is separate.
|
|
||||||
|
|
||||||
Broadcom Bakerfield-based implementations (used by Innolight, HG Tech, and others in merchant silicon modules) with Acacia and InPhi: generally validated at the protocol level, with some firmware version sensitivity around CD acquisition timing.
|
|
||||||
|
|
||||||
Combinations that have known issues or limited validation: any first-generation 400ZR module at firmware predating mid-2021 against any other first-generation module. The 2020–early 2021 firmware was when the "implementation agreement is ratified but implementations are still maturing" period was most visible. If you have first-generation 400ZR modules in inventory that haven't received firmware updates since deployment, treat their interoperability with new second-generation modules as unvalidated.
|
|
||||||
|
|
||||||
## How to Test Before You Commit
|
|
||||||
|
|
||||||
The validation process for a 400G ZR vendor pair should cover more than "does the link come up." A complete interoperability test covers:
|
|
||||||
|
|
||||||
**Link acquisition testing**: power both ends up simultaneously from a cold start and measure time to link establishment. Repeat 20 times. Any failure to establish link within 60 seconds (the OIF-400ZR acquisition time requirement) is a bug. Any consistent delay beyond 30 seconds warrants investigation.
|
|
||||||
|
|
||||||
**Marginal OSNR testing**: use a variable optical attenuator to reduce the received OSNR incrementally and measure FEC error rate and pre-FEC BER at each attenuation step. The FEC threshold (the point where corrected errors appear) and the hard decision threshold (the point where uncorrected errors appear) should be consistent with the OIF-400ZR specification. A DSP pair that shows a larger gap between FEC threshold and hard decision threshold is more robust under impaired conditions.
|
|
||||||
|
|
||||||
**Link restoration after failure**: simulate fiber cut and restoration (loopback disconnection and reconnection) and measure time to link re-establishment. Coherent DSP reacquisition times vary from 5 to 60 seconds depending on implementation and condition history.
|
|
||||||
|
|
||||||
**FEC performance verification**: at nominal OSNR, FEC corrected error count should be low (on the order of 10^-4 to 10^-5 pre-FEC BER). A link running at consistently high pre-FEC BER with active FEC correction is operating with less margin than specification implies, and multi-vendor pairs may have slightly different pre-FEC BER characteristics at the same optical power level.
|
|
||||||
|
|
||||||
**Firmware version documentation**: record the specific firmware version on both ends before and after any firmware update, and re-run the test matrix after updates. Firmware updates that change DSP coefficients can shift interoperability behavior.
|
|
||||||
|
|
||||||
## The Operational Reality
|
|
||||||
|
|
||||||
Production 400G ZR networks running interoperable multi-vendor configurations exist and are operationally stable—this isn't a theoretical exercise. The conditions for success are: confirmed firmware version compatibility (both ends on validated firmware revisions), tested and documented link acquisition behavior, and a change control process that requires re-validation after DSP firmware updates.
|
|
||||||
|
|
||||||
The operational risk of 400G ZR interoperability is not that it doesn't work—it's that the conditions under which it works are specific, and changes to those conditions (new firmware, new module generation on one end, changed optical path characteristics) can shift interoperability behavior without obvious warning. Treating the validation matrix as a living document, updated with each significant change, is the practice that distinguishes networks that manage 400G ZR coherent well from those that manage it reactively.
|
|
||||||
@ -1,66 +0,0 @@
|
|||||||
---
|
|
||||||
title: "G.652 vs G.657: When Bend-Insensitive Fiber Matters and When It's Just a Premium You Don't Need"
|
|
||||||
slug: "single-mode-fiber-g652-g657-bend-insensitive"
|
|
||||||
type: guide
|
|
||||||
category: "Fiber & Cabling"
|
|
||||||
tags: ["G.652", "G.657", "single-mode fiber", "bend-insensitive", "SMF-28", "attenuation", "data center access"]
|
|
||||||
seo_focus_keyword: "G.652 vs G.657 bend-insensitive fiber"
|
|
||||||
---
|
|
||||||
|
|
||||||
G.657 bend-insensitive fiber has a legitimate use case. It's a real improvement in specific installation scenarios, and for those scenarios, it's worth the 20–40% price premium over G.652. For every other scenario, you're paying for a fiber characteristic that your installation will never stress. Fiber specifications are an area where the gap between marketing materials and engineering requirements tends to be wide, and the G.652 vs. G.657 question is a good case study in why reading the actual spec matters.
|
|
||||||
|
|
||||||
## What G.652 Actually Specifies
|
|
||||||
|
|
||||||
ITU-T G.652 is the foundational standard for single-mode optical fiber, covering what is commonly marketed as "Standard SMF" or "OS2" in data center contexts. G.652 defines four sub-variants (A through D), of which G.652D is the current standard and what essentially all new single-mode fiber deployments use. G.652D specifies:
|
|
||||||
|
|
||||||
Attenuation: maximum 0.4 dB/km at 1310 nm and 0.4 dB/km at 1550 nm (maximum); typical installed fiber from major manufacturers measures 0.18–0.20 dB/km at 1550 nm in practice.
|
|
||||||
|
|
||||||
Zero dispersion wavelength: in the range of 1300–1324 nm, with chromatic dispersion coefficient D ≤ 3.5 ps/(nm·km) at 1285 nm and ≤ 3.5 ps/(nm·km) at 1330 nm.
|
|
||||||
|
|
||||||
Mode field diameter: 8.6–9.2 μm at 1310 nm.
|
|
||||||
|
|
||||||
Macrobend performance: G.652D specifies macrobend-induced additional attenuation of ≤ 0.1 dB for 100 turns around a 30 mm radius mandrel at 1550 nm. This bend performance specification is the baseline—adequate for structured cabling installations with standard bend radii in cable trays, conduit, and patch panels.
|
|
||||||
|
|
||||||
## What G.657 Adds
|
|
||||||
|
|
||||||
G.657 (bend-insensitive single-mode fiber) is defined in ITU-T G.657, with two categories A and B, each with sub-variants. The relevant comparison is:
|
|
||||||
|
|
||||||
G.657A1 and A2: fully backward compatible with G.652D in splice and connector behavior. Mode field diameter and chromatic dispersion characteristics are within G.652D tolerance. The difference is macrobend performance.
|
|
||||||
|
|
||||||
G.657A1: ≤ 0.2 dB additional attenuation for 1 turn at 10 mm bend radius at 1550 nm (versus G.652D's requirement specified at 30 mm radius).
|
|
||||||
|
|
||||||
G.657A2: ≤ 0.03 dB additional attenuation for 1 turn at 7.5 mm bend radius at 1550 nm.
|
|
||||||
|
|
||||||
G.657B2 and B3: enhanced bend insensitivity at even tighter radii, but with mode field diameters that may differ from G.652D—meaning splicing to G.652D fiber introduces additional splice loss and G.657B3 in particular is not splice-compatible without splicing loss penalty.
|
|
||||||
|
|
||||||
The critical parameter is radius. G.652D specifies performance at 30 mm bend radius. G.657A2 specifies performance at 7.5 mm bend radius. The question for any installation is: will fiber actually be bent to these radii?
|
|
||||||
|
|
||||||
## When Bend-Insensitive Fiber is Genuinely Justified
|
|
||||||
|
|
||||||
The use case that actually justifies G.657A2 is the data center access layer—specifically, the runs from patch panels to servers in high-density racks, and inside server trays or cable management systems where fiber must navigate very tight radii.
|
|
||||||
|
|
||||||
In a standard 1U cable management panel, fiber patch cords routing from a 48-port LC patch panel to server ports can encounter bend radii under 20 mm at the cable entry points and inside tightly packed cable trays. At 30 mm, G.652D performs fine. At 15–20 mm, G.652D begins to show increased attenuation—the fiber core is slightly deformed by tight bends, and this produces additional insertion loss of 0.1–0.5 dB per bend, which compounds across multiple tight bends in a dense patch panel run.
|
|
||||||
|
|
||||||
Installed G.657A2 in a dense data center access layer with tight cable management radii will show consistently lower connector-to-connector insertion loss on the same physical path, because the fiber doesn't add bend-induced loss at the radii that the cabling actually encounters. Over a 3–5 meter patch cord with four tight bends, the difference can be 0.3–0.8 dB total—meaningful on a link budget for SR applications that have limited power budget margin.
|
|
||||||
|
|
||||||
The other justified use case is inside buildings where fiber must be routed through conduit with unavoidable tight bends at conduit elbows, or in wall-mounted enclosures where space constraints force tight coiling of excess fiber length. G.657A2 is the standard specification for FTTH (Fiber to the Home) drop cable precisely because the in-building routing environment is full of 15–20 mm bend radii that would cause unacceptable loss on G.652D.
|
|
||||||
|
|
||||||
## When It Isn't Justified
|
|
||||||
|
|
||||||
Trunk fiber runs between buildings, in underground conduit, or in overhead cable trays do not encounter 15 mm bend radii under normal installation conditions. The minimum bend radius for trunk cable is limited by the cable itself (typically 20× cable diameter under load, 10× at rest), and for a standard 12-fiber SMF-28 indoor/outdoor cable, the minimum bend radius at rest is 30–40 mm. G.652D handles this without any performance penalty versus G.657A2.
|
|
||||||
|
|
||||||
Campus fiber backbone runs—even in tight conduit pathways—rarely produce sustained bend radii below 25 mm. G.652D is appropriate. Inter-rack connections in data centers using pre-terminated trunk cables with MPO connectors operate above the bend threshold where G.657A2 adds value. G.652D is appropriate.
|
|
||||||
|
|
||||||
The splice compatibility caveat for G.657B2/B3 is worth emphasizing: if you install G.657B3 fiber in a backbone where it needs to be spliced to existing G.652D plant, you will incur a splice loss penalty of 0.1–0.3 dB per splice due to mode field diameter mismatch. On a long span with multiple splices, this penalty eliminates the cost justification quickly. G.657A1 and A2 avoid this problem because they are genuinely splice-compatible with G.652D.
|
|
||||||
|
|
||||||
## Attenuation Differences at Operating Wavelengths
|
|
||||||
|
|
||||||
G.657A2 fiber from major manufacturers (Corning ClearCurve ULL, OFS BendBright XS, Prysmian BendBright) has attenuation at 1550 nm of approximately 0.18–0.20 dB/km—essentially identical to G.652D from the same manufacturers. The bend insensitivity improvement comes from a modified refractive index profile (typically a depressed cladding or trench-assisted design) that provides a tighter core confinement without significantly affecting the propagation characteristics at the 1310 nm and 1550 nm operating wavelengths.
|
|
||||||
|
|
||||||
The bend-induced attenuation addition at 1625 nm (L-band) is higher than at 1550 nm for G.652D at tight bend radii, and G.657A2 provides better performance at L-band bend radii as well. For networks considering C+L band operation over existing fiber plant, the bend performance at 1625 nm is a consideration if the installed plant includes tight-bend sections.
|
|
||||||
|
|
||||||
## The Purchasing Decision
|
|
||||||
|
|
||||||
For new data center cabling projects where fiber patch cords will navigate dense 1U cable management panels with radii below 20 mm: specify G.657A2. The price premium (typically 25–35% per meter over equivalent G.652D patch cord) is justified by the measurable improvement in dense-patch-panel insertion loss performance.
|
|
||||||
|
|
||||||
For all other structured cabling applications—backbone runs, inter-building connections, standard rack-to-rack connections with normal cable management: specify G.652D. The bend insensitivity premium provides no operational benefit in installation environments where fiber radii stay above 25 mm, and the lower unit cost of G.652D is more appropriately directed toward improved connector quality and cleaning protocol, which have demonstrably higher impact on link budget than fiber grade selection in normal-bend environments.
|
|
||||||
@ -1,56 +0,0 @@
|
|||||||
---
|
|
||||||
title: "10G to 25G Migration: When the Per-Port Economics Justify the Switch"
|
|
||||||
slug: "25g-vs-10g-upgrade-path-sfp28-sfp-plus"
|
|
||||||
type: analysis
|
|
||||||
category: "Migration & Upgrades"
|
|
||||||
tags: ["25G", "10G", "SFP28", "SFP+", "migration", "TOR switch", "server connectivity", "enterprise"]
|
|
||||||
seo_focus_keyword: "10G to 25G migration SFP28 upgrade decision"
|
|
||||||
---
|
|
||||||
|
|
||||||
The move from 10G to 25G server connectivity has been underway for long enough that the decision tree is reasonably well-established, but the number of enterprises still running 10G ToR infrastructure in new deployments suggests the economic case isn't as clear as the bandwidth case. The honest answer is: 25G is almost always the right choice for new deployments, and the reasons why many organizations still choose 10G have more to do with procurement inertia and cabling assumptions than actual economics.
|
|
||||||
|
|
||||||
## The Per-Port Cost Comparison in 2024
|
|
||||||
|
|
||||||
The hardware economics have shifted significantly over the past five years. In 2019, 10G SFP+ SR optics were approximately 30% cheaper than 25G SFP28 SR equivalents, and 10G ToR switches were substantially cheaper than 25G switches in 48-port configurations. By 2024, the economics look different:
|
|
||||||
|
|
||||||
A 25G SFP28 SR optic from a quality third-party manufacturer runs €25–€45 depending on volume. A 10G SFP+ SR optic from the same manufacturer runs €15–€25. The per-port cost delta is €10–€20, or roughly 40–60% more for 25G. That's a real premium.
|
|
||||||
|
|
||||||
The switch-level comparison is more nuanced. A 48-port 25G ToR switch (e.g., Arista 7050CX3-48YC12, Cisco Nexus 93180YC-FX) runs approximately €8,000–€15,000 at current street prices depending on vendor and optic count. A comparable 48-port 10G ToR switch runs €4,000–€8,000. The 25G switch premium is roughly €5,000–€7,000 per switch, or approximately €100–€150 per port—versus the €10–€20 per-port optic cost delta.
|
|
||||||
|
|
||||||
The capital cost comparison thus comes out to roughly €120–€170 per port more expensive for 25G versus 10G in a fresh deployment. Over a 5-year hardware lifecycle, this is approximately €25–€35 per port per year.
|
|
||||||
|
|
||||||
## The Bandwidth Economics and Lifecycle Consideration
|
|
||||||
|
|
||||||
A 25G port delivers 2.5× the bandwidth of a 10G port at roughly 1.5–1.7× the cost. The cost-per-Gbps comparison favors 25G: approximately €5–€7 per Gbps for 25G versus €8–€12 per Gbps for 10G at current prices.
|
|
||||||
|
|
||||||
The lifecycle argument is stronger than the initial cost argument. Infrastructure installed today will remain in service for 5–7 years under typical enterprise refresh cycles. The server connectivity requirements at the end of that cycle—2029 or 2030—will reflect application workloads that are being planned and deployed now. AI/ML inference workloads, high-frequency data analytics, containerized microservices with high east-west traffic volumes, NVMe-over-Fabric storage—all of these drive higher bandwidth utilization per server than a 10G link was designed to sustain under the workloads of 2015.
|
|
||||||
|
|
||||||
Installing 10G in a new deployment in 2024 means accepting mid-cycle obsolescence around 2027–2028 when server bandwidth requirements start to exceed 10G sustained utilization rates, requiring either a refresh ahead of the planned cycle or bandwidth-constrained application performance.
|
|
||||||
|
|
||||||
## SFP28 vs. SFP+ Physical Compatibility
|
|
||||||
|
|
||||||
SFP28 and SFP+ use the same SFF-8402 mechanical form factor. An SFP28 module will physically fit in an SFP+ port, and vice versa. However, the electrical interface specifications differ:
|
|
||||||
|
|
||||||
SFP+ operates at the 10GBASE-SR/LR Ethernet or 10G Fibre Channel electrical interface rate. SFP28 operates at 25GBASE-SR/LR electrical interface rate. The host port's SerDes (Serializer/Deserializer) determines which speeds are supported.
|
|
||||||
|
|
||||||
In practice: an SFP28 module inserted into an SFP+ port will typically auto-negotiate to 10G operation or fail to link, depending on the platform. It will not run at 25G in an SFP+ port regardless of the module's specifications. Conversely, an SFP+ module in an SFP28 port will typically run at 10G if the switch supports 10G on that port interface type.
|
|
||||||
|
|
||||||
This backward compatibility is useful for mixed-generation migrations. A 25G ToR switch with 48 SFP28 ports can accept SFP+ 10G modules in ports that connect to servers not yet upgraded to 25G NICs—a common scenario during a phased server refresh where ToR switches are upgraded before all servers. The SFP28 port runs at 10G with the SFP+ module without any configuration change in most implementations.
|
|
||||||
|
|
||||||
## TOR Cabling Scenarios: What Changes and What Doesn't
|
|
||||||
|
|
||||||
The fiber plant between ToR switches and servers doesn't change at all for SR applications. 25GBASE-SR uses 850 nm VCSEL over OM3/OM4 on LC duplex—the same fiber and connector type as 10GBASE-SR. A server rack with OM4 patch cords pre-installed for 10G can have its transceivers swapped to 25G without touching the fiber. This is a material point for existing data centers: the cabling investment is preserved.
|
|
||||||
|
|
||||||
The cabling considerations that do change are at the ToR-to-ToR level. If you're also upgrading uplinks from 10G to higher speed, the uplink ports on 25G ToR switches are typically 100G QSFP28 (4×25G) rather than 40G QSFP+. This changes the cable type for uplinks, but the switch-to-leaf fiber runs are typically shorter-reach and are being installed fresh in most refresh projects anyway.
|
|
||||||
|
|
||||||
For direct attach copper (DAC) connections—common in top-of-rack server to switch connections up to 3–5 meters—the change from 10G to 25G means new DAC cables. SFP28 25G DAC cables are not compatible with SFP+ 25G ports because the electrical signaling is different, and while the connectors are mechanically interchangeable, the active copper or passive twinax cable must be rated for the appropriate speed. Existing 10G DAC cables need replacement in a 25G migration.
|
|
||||||
|
|
||||||
## Why Most Enterprises Do This Wrong
|
|
||||||
|
|
||||||
The typical suboptimal migration pattern is: an organization approves a server refresh, new servers arrive with dual 25G NICs, and procurement orders 10G ToR switches "because we have 10G SFP+ switches from the last cycle and we want to standardize." This decision optimizes for the wrong variable—minimizing OpEx on network hardware in the short term at the cost of deploying infrastructure that is obsolete by design.
|
|
||||||
|
|
||||||
The second common mistake is upgrading ToR switches to 25G while keeping 10G uplinks, creating a 25G-to-10G speed mismatch at the aggregation layer that immediately limits the achievable bandwidth per ToR switch to the uplink capacity rather than the server-facing capacity. If you're migrating to 25G at the server layer, the aggregation layer uplinks need to be part of the same migration plan.
|
|
||||||
|
|
||||||
The correct migration sequence is: upgrade ToR switches and uplinks first (25G ports, 100G uplinks), then migrate servers as they're refreshed. This allows existing 10G servers to run at 10G on the new infrastructure (using SFP+ backward compatibility) while new 25G servers get full-speed connectivity immediately. The infrastructure investment is complete at the start, and no second migration is required when the last 10G server is replaced.
|
|
||||||
|
|
||||||
The economics justify 25G for essentially any enterprise deploying more than 100 server ports in a new facility or major refresh today. The argument for 10G boils down to "we're at end of lease and this will be torn down in 18 months"—which is a legitimate exception, not a default.
|
|
||||||
@ -1,78 +0,0 @@
|
|||||||
---
|
|
||||||
title: "Building a Proper Optical Link Budget Calculator: From Component Losses to EDFA Placement"
|
|
||||||
slug: "optical-budget-calculator-guide-dwdm-span"
|
|
||||||
type: tutorial
|
|
||||||
category: "Engineering & Design"
|
|
||||||
tags: ["link budget", "optical power budget", "EDFA", "DWDM", "attenuation", "splice loss", "optical design"]
|
|
||||||
seo_focus_keyword: "optical link budget calculator DWDM"
|
|
||||||
---
|
|
||||||
|
|
||||||
An optical link budget is a power accounting exercise—add up all the losses a signal encounters between transmitter and receiver, subtract that from the transmitter's launch power, and verify that the result exceeds the receiver's sensitivity floor by an acceptable margin. The concept is simple. The practice requires accounting for variables that are individually small but collectively determine whether a link works on day one and continues to work five years later after fiber aging, connector degradation, and splices that were optimistic at acceptance testing.
|
|
||||||
|
|
||||||
## The Component-by-Component Loss Model
|
|
||||||
|
|
||||||
A complete link budget starts with the transmitter launch power. For a standard 100GBASE-ER4 QSFP28 (80 km application), the specified minimum launch power is +1 dBm per lane (the ER4 uses 4 lanes at 25G over CWDM wavelengths). Receiver sensitivity is specified at -14 dBm minimum (with FEC), giving a nominal budget of 15 dB before any system effects.
|
|
||||||
|
|
||||||
Working through the loss components:
|
|
||||||
|
|
||||||
**Fiber attenuation**: G.652D fiber at 1310 nm (the operating band for CWDM-based ER4) has typical attenuation of 0.34–0.36 dB/km. For 80 km, that's 27.2–28.8 dB of fiber loss alone—which immediately tells you that 100GBASE-ER4 without amplification cannot reach 80 km on 1310 nm-based CWDM. ER4 is a CWDM optic using wavelengths 1295/1310/1325/1340 nm where conventional EDFAs don't operate. ER4's 40 km reach (not 80 km—I'll correct this) on OM4 is based on the actual spec. The 80 km application over single-mode uses ZR or extended-range coherent optics.
|
|
||||||
|
|
||||||
Let me recalibrate to a more instructive example: a 100G DP-QPSK ZR optic deployed on a DWDM system at 1550 nm wavelength (C-band, EDFA-amplified), targeting 80 km without inline amplification.
|
|
||||||
|
|
||||||
A 100G ZR QSFP28 module specifies a launch power of approximately +1.5 dBm (depending on manufacturer, typically 0 to +3 dBm) and a receiver sensitivity of approximately -22 dBm (with soft-decision FEC). Nominal budget: approximately 23.5 dB.
|
|
||||||
|
|
||||||
**Fiber attenuation** at 1550 nm over G.652D: 0.19 dB/km typical, 0.22 dB/km maximum. For 80 km: 15.2–17.6 dB. Using 0.20 dB/km as a design value: 16 dB.
|
|
||||||
|
|
||||||
**Splice loss**: properly fusion-spliced G.652D connections produce 0.02–0.05 dB per splice. For an 80 km run with splices every 2 km, that's approximately 40 splices at 0.03 dB average: 1.2 dB.
|
|
||||||
|
|
||||||
**Connector insertion loss**: 2 connectors at each end (one on the optic, one at the patch panel) at 0.3 dB each: 0.6 dB × 2 ends = 1.2 dB for 4 total connector pairs. Using 0.3 dB per mating for well-maintained LC APC connectors.
|
|
||||||
|
|
||||||
**Total channel insertion loss**: 16 + 1.2 + 1.2 = 18.4 dB.
|
|
||||||
|
|
||||||
Budget remaining: 23.5 - 18.4 = 5.1 dB total margin. Against the 23.5 dB budget, that's 5.1 dB of margin before the link fails.
|
|
||||||
|
|
||||||
## Aging and Temperature Margin
|
|
||||||
|
|
||||||
A link budget that shows 5.1 dB total margin at installation is comfortable today but needs to remain viable over the fiber plant's service life. Several degradation mechanisms consume margin over time:
|
|
||||||
|
|
||||||
**Fiber aging**: G.652D fiber increases in attenuation at approximately 0.001 dB/km/year for the first few years, then stabilizes. Over 10 years, this adds 0.01 dB/km, or 0.8 dB for an 80 km span—a meaningful consumption of margin.
|
|
||||||
|
|
||||||
**Connector degradation**: connectors that are maintained properly (cleaned, capped when not in use) degrade negligibly. Connectors in poorly maintained environments can increase from 0.3 dB to 1.0 dB or more over 5–7 years. Budget 0.3 dB of additional connector loss per connection point over the system life as an aging allowance—0.6 dB total for 4 connector pairs on our 80 km example.
|
|
||||||
|
|
||||||
**Temperature effects**: optical power levels in transceivers vary with temperature. SFF-8636 specifies that QSFP28 modules must operate to specification across 0°C to 70°C case temperature. Launch power at 70°C case temperature may be 1–2 dB lower than at 25°C for some module designs. Budget a temperature derating of 1 dB.
|
|
||||||
|
|
||||||
**Safety margin**: standard practice is to include 3 dB of safety margin for unaccounted losses—measurement uncertainties, OTDR dead zones, repair splices after future fiber cuts, and the inevitable "where did that 0.5 dB go" at commissioning.
|
|
||||||
|
|
||||||
Total system margin requirement: aging (1.4 dB) + temperature (1.0 dB) + safety (3.0 dB) = 5.4 dB.
|
|
||||||
|
|
||||||
In our 80 km example, the available margin of 5.1 dB is less than the required system margin of 5.4 dB. The link is marginally under-designed for the 80 km distance. The solution is either to reduce span length by 5–8 km (available margin then becomes ≈6.2 dB), accept slightly lower average connector quality as 0.25 dB instead of 0.30 dB (requires verified connector quality), or reconsider whether an amplified design is appropriate.
|
|
||||||
|
|
||||||
## The 80 km DWDM Span with EDFA: Worked Example
|
|
||||||
|
|
||||||
For longer-reach DWDM applications where a single amplified span is justified, the budget methodology extends to include EDFA characteristics.
|
|
||||||
|
|
||||||
Consider a 120 km DWDM span using a single EDFA at the midpoint (60 km from each end). The same 100G ZR optic with 23.5 dB budget launches into the first 60 km segment.
|
|
||||||
|
|
||||||
**First fiber segment** (0–60 km): 60 km × 0.20 dB/km = 12.0 dB, plus 30 splices × 0.03 dB = 0.9 dB, plus two connector pairs = 0.6 dB. First segment loss: 13.5 dB.
|
|
||||||
|
|
||||||
**EDFA parameters**: a typical C-band EDFA for 100G coherent applications provides gain of 20–23 dB with noise figure of 5–6 dB. Using 20 dB gain (conservative, to minimize gain tilt on a loaded C-band system) and 5.5 dB noise figure.
|
|
||||||
|
|
||||||
**OSNR calculation**: the OSNR at the EDFA output must be sufficient for the second fiber segment to still meet the receiver's OSNR sensitivity floor. OSNR into the EDFA is:
|
|
||||||
|
|
||||||
OSNR_in = P_in (dBm) - NF(dB) - 10·log10(h·ν·Bref)
|
|
||||||
|
|
||||||
where h·ν is the photon energy at 1550 nm (~1.28 × 10^-19 J) and Bref is the reference bandwidth (12.5 GHz for 0.1 nm reference bandwidth convention). The noise floor term 10·log10(h·ν·Bref) evaluates to approximately -58 dBm. After a 13.5 dB loss segment, with launch power +1.5 dBm, EDFA input power is -12 dBm. OSNR out of the EDFA is approximately -12 - 5.5 + 58 - (20 - 13.5) = 23.5 dB (simplified single-span approximation).
|
|
||||||
|
|
||||||
**Second fiber segment** (60–120 km): 12.0 dB fiber + 0.9 dB splices + 0.6 dB connectors = 13.5 dB. EDFA output power after 20 dB gain and 13.5 dB second segment loss arrives at the receiver at approximately: -12 + 20 - 13.5 = -5.5 dBm.
|
|
||||||
|
|
||||||
Receiver sensitivity for 100G ZR is approximately -22 dBm, giving 16.5 dB of power margin—but OSNR is the actual constraint for coherent systems. A single EDFA span at these parameters will deliver approximately 18–20 dB OSNR at the receiver, and 100G DP-QPSK ZR requires approximately 13–14 dB OSNR for BER below the FEC threshold. OSNR margin is approximately 4–6 dB, which is adequate but not lavish.
|
|
||||||
|
|
||||||
## Putting the Calculator Together
|
|
||||||
|
|
||||||
A functional optical link budget spreadsheet for single-mode systems needs five inputs per segment: span length, fiber attenuation coefficient, average splice spacing and loss, connector count and loss, and any passive splitting or filtering losses. The outputs are: total segment loss, available power budget, operating margin, and recommended EDFA placement if margin is insufficient.
|
|
||||||
|
|
||||||
The aging margin (0.01 dB/km/year × planned life in years), temperature margin (1 dB for standard commercial transceivers), and safety margin (3 dB minimum, 4 dB for carrier-grade applications) are constants applied to the available budget.
|
|
||||||
|
|
||||||
For EDFA-amplified spans, the additional inputs are EDFA gain, noise figure, and the OSNR sensitivity floor for the modulation format—which changes from 13 dB for 100G DP-QPSK to approximately 21 dB for 400G DP-16QAM. For multi-amplifier spans or cascaded EDFA designs, OSNR accumulates additively (in linear scale) and the calculation extends iteratively per span, which is where dedicated DWDM planning tools (Ciena's GreenPlanner, Cisco's Network Planner, or open-source alternatives) add value over manual spreadsheet calculations.
|
|
||||||
|
|
||||||
The manual worked-example approach remains valuable for understanding what the tools are actually computing and for validating their outputs against known engineering principles.
|
|
||||||
@ -1,80 +0,0 @@
|
|||||||
---
|
|
||||||
title: "MTP/MPO and Cassette Fiber Management in 40G/100G/400G: Polarity, Gender, and the Array Loss Problem"
|
|
||||||
slug: "mtp-mpo-cassette-fiber-management-40g-100g-400g"
|
|
||||||
type: guide
|
|
||||||
category: "Fiber & Cabling"
|
|
||||||
tags: ["MTP", "MPO", "cassette", "polarity", "fiber management", "40G", "100G", "400G", "base-12", "base-24"]
|
|
||||||
seo_focus_keyword: "MTP MPO cassette fiber management polarity"
|
|
||||||
---
|
|
||||||
|
|
||||||
MTP/MPO connectors solve a real problem: plugging in 12 or 24 fibers simultaneously instead of one at a time. But they introduce a set of secondary problems—polarity, gender, connector loss at scale—that are collectively responsible for more 40G and 100G link failures than any other single cause in structured cabling deployments. These aren't obscure edge cases; they're the normal failure modes of array fiber systems when the installation team doesn't have a clear mental model of what they're building.
|
|
||||||
|
|
||||||
## MTP vs. MPO: The Terminology
|
|
||||||
|
|
||||||
MPO (Multi-fiber Push On) is the connector standard defined by IEC 61754-7. MTP is US Conec's brand name for their MPO implementation. The terms are used interchangeably in the industry, though purists will note that MTP includes some proprietary improvements (better ferrule float, removable housing) that the base MPO standard doesn't specify. For practical purposes, MTP and MPO connectors intermated freely—an MTP male connector mates with an MPO female adapter, and vice versa. When this article uses MTP/MPO, it means both.
|
|
||||||
|
|
||||||
## Polarity: The Core Concept
|
|
||||||
|
|
||||||
Optical fiber has a direction: the TX port on device A needs to connect to the RX port on device B, and vice versa. With LC duplex connectors, polarity is enforced mechanically—the LC connectors are keyed so that TX connects to RX. With MPO/MTP array connectors, polarity becomes a management problem because a single 12-fiber MPO connector carries multiple TX fibers and multiple RX fibers, and the physical connector looks identical regardless of polarity type.
|
|
||||||
|
|
||||||
TIA-568 defines three polarity methods:
|
|
||||||
|
|
||||||
**Method A (Type A / Straight)**: fibers are numbered 1–12 in the same sequence at both ends of a trunk cable. The connector keys face opposite directions (one up, one down) at the two ends. The practical result is that fiber 1 at end A connects to fiber 1 at end B. This maintains fiber position but rotates the signal: what was fiber 1 TX at end A arrives at fiber 1 at end B, which—depending on the equipment—may be an RX port or a TX port.
|
|
||||||
|
|
||||||
**Method B (Type B / Reversed)**: fibers are reversed end-to-end. Fiber 1 at end A connects to fiber 12 at end B. Both connectors have keys facing the same direction. The fiber reversal means that TX at one end connects to RX at the other—for a 12-fiber MPO used in 40GBASE-SR4 (3 TX, 3 RX) or 100GBASE-SR4 (4 TX, 4 RX), Type B polarity implements the required TX-to-RX connection without any adapter modification.
|
|
||||||
|
|
||||||
**Method C (Type C / Paired Swap)**: adjacent pairs of fibers are swapped (1↔2, 3↔4, etc.). This is used less frequently and primarily for specific legacy applications.
|
|
||||||
|
|
||||||
The dominant standard for 40GBASE-SR4 and 100GBASE-SR4 direct connections is Type B polarity in the trunk cable—this is the approach specified in IEEE 802.3ba and 802.3bm for parallel optic applications. A Type B trunk cable between two QSFP+ SR4 or QSFP28 SR4 modules produces the correct TX-to-RX connectivity without any polarity adapter cassette.
|
|
||||||
|
|
||||||
## Where Cassettes Complicate Polarity
|
|
||||||
|
|
||||||
Cassettes (also called modules) are fiber breakout devices that convert between MPO connectors (at the trunk side) and LC duplex or SC connections (at the equipment side). They're used to connect MPO-cabled infrastructure to LC-port switches and routers without running individual LC patch cords from the rack.
|
|
||||||
|
|
||||||
The problem is that cassettes introduce their own polarity conversion. A Type A cassette maintains fiber sequence—fiber 1 of the MPO becomes the first LC pair. A Type B cassette reverses the sequence. When you combine trunk cables and cassettes, the total polarity depends on the combination of trunk type and cassette type.
|
|
||||||
|
|
||||||
The combination that produces correct TX-to-RX connectivity for 40G/100G parallel optics:
|
|
||||||
- Type B trunk cable + Type A cassette: correct
|
|
||||||
- Type A trunk cable + Type B cassette: correct
|
|
||||||
- Type B trunk cable + Type B cassette: incorrect (double-reversal, same as no reversal)
|
|
||||||
- Type A trunk cable + Type A cassette: incorrect
|
|
||||||
|
|
||||||
If your 100GBASE-SR4 link comes up with no light received (RX power absent on all four lanes simultaneously), polarity is usually the diagnosis. If it comes up with some lanes working and others not, the problem may be loss or damage on specific fibers rather than polarity.
|
|
||||||
|
|
||||||
## Gender Management: Male and Female MPO Connectors
|
|
||||||
|
|
||||||
MPO/MTP connectors have physical gender: male connectors have guide pins (two small steel pins that protrude from the ferrule face), female connectors have guide holes. Two male connectors cannot mate; two female connectors cannot mate. A male connector mates with a female connector via an adapter.
|
|
||||||
|
|
||||||
The convention in structured cabling is: trunk cables terminate in male MPO connectors, cassettes have female MPO ports on the trunk side. This means a trunk cable male MPO plugs directly into a cassette female MPO port, and the cassette's LC ports face the equipment.
|
|
||||||
|
|
||||||
Gender problems arise when trunk cables are constructed or terminated incorrectly (both ends male or both ends female), or when field-installed MPO connectors are made by installers who don't follow the male-at-trunk-end convention. The symptom is obvious (connectors won't mate) but the resolution—replacing a cable, adding a gender adapter, or re-terminating—can be disruptive in a completed installation.
|
|
||||||
|
|
||||||
Gender adapters (MF to FF, or MM to MM via a barrel adapter) exist for field fixes, but they add an additional connector mating with associated insertion loss and should be treated as temporary solutions rather than permanent installations.
|
|
||||||
|
|
||||||
## The Array Connector Loss Problem
|
|
||||||
|
|
||||||
Insertion loss in MPO/MTP connectors is systematically higher than in LC connectors, for a fundamental mechanical reason. An MPO connector aligns 12 or 24 fibers simultaneously using two guide pins that reference the ferrule body, not the individual fiber positions. The positional accuracy of each fiber within the MPO ferrule depends on the precision of the ferrule boring, the fiber position within the ferrule holes, and the compression of the ferrule during mating.
|
|
||||||
|
|
||||||
IEC 61754-7 specifies maximum insertion loss of 0.5 dB per mating for multi-fiber connectors. High-performance MPO (APC-polished, precision-ferrule construction from US Conec, Senko, or Radiall) achieves 0.35 dB per mating average. Low-cost MPO connectors—particularly field-terminated MPO assemblies with less rigorous alignment control—regularly measure 0.6–1.0 dB per mating.
|
|
||||||
|
|
||||||
The loss problem compounds with array size. A 24-fiber MPO has 24 fibers that all need to be within their positional tolerance simultaneously. The statistical probability of all 24 fibers meeting their positional accuracy specification is lower than for a 12-fiber MPO, which is lower than for an LC connector aligning 1 fiber. The result: 24-fiber MPO connectors have consistently higher average insertion loss than 12-fiber MPO connectors from the same manufacturer.
|
|
||||||
|
|
||||||
For 100GBASE-SR4, which uses 8 of the 12 fibers in a standard MPO (4 TX, 4 RX), the 4 unused fibers in a base-12 MPO are not energized but their ferrule positions affect the alignment accuracy of the 8 active fibers. Pre-terminated MPO assemblies from quality manufacturers specify which fiber positions are active and optimize ferrule manufacturing around those positions.
|
|
||||||
|
|
||||||
## When to Use Harness Cables vs. Cassettes
|
|
||||||
|
|
||||||
Harness cables (fan-out cables) are a single MPO connector at one end that breaks out into multiple LC or SC connectors at the other end. They're a direct connection with no additional connector interfaces in the signal path. Cassettes use two connector interfaces (MPO-to-cassette, cassette-to-LC), adding approximately 0.3–0.7 dB compared to a harness cable.
|
|
||||||
|
|
||||||
The tradeoff is flexibility. Cassettes in a structured cabling deployment allow individual LC patch cord changes without disturbing the trunk infrastructure. Harness cables require the entire harness to be replaced if any individual LC connection needs rerouting. For high-density, frequently-reconfigured environments like data center interconnect or co-location hosting, cassette-based infrastructure is operationally preferable despite the higher connector loss.
|
|
||||||
|
|
||||||
For environments with fixed or infrequently-changed connections—campus fiber backbone, inter-building connections—harness cables offer better loss performance at lower per-unit cost. The decision comes down to operational flexibility requirements versus link budget constraints.
|
|
||||||
|
|
||||||
## Base-12 vs. Base-24 Planning
|
|
||||||
|
|
||||||
Base-12 (12-fiber MPO) and base-24 (24-fiber MPO) cassette infrastructure have different density implications. A 1U cassette panel holding 6 base-12 cassettes provides 6 × 12 = 72 fiber connections (36 LC duplex ports). A 1U cassette panel holding 4 base-24 cassettes provides 4 × 24 = 96 fiber connections (48 LC duplex ports)—a 33% density increase.
|
|
||||||
|
|
||||||
Base-24 is more efficient for 100GBASE-SR4 and 40GBASE-SR4, both of which use 8 active fibers per link in a 12-fiber MPO housing (leaving 4 fibers unused). A base-24 MPO supports three SR4 links (8 fibers × 3 = 24) with zero waste. A base-12 MPO supports one SR4 link with 4 fibers unused—67% fiber utilization.
|
|
||||||
|
|
||||||
For 400GBASE-SR8 (8× 50G NRZ lanes on 16 fibers, using two 8-fiber groups in a 16-fiber or 24-fiber MPO), base-24 is essentially required for efficient utilization. Planning new data center fiber infrastructure for 400G deployment should specify base-24 MPO throughout, or plan for harness cable breakouts using the full 24-fiber MPO for three 8-fiber 400G links.
|
|
||||||
|
|
||||||
The practical guidance: new structured cabling deployments for 40G/100G and above should use pre-terminated, factory-tested MPO assemblies from a single-source manufacturer, specify Type B polarity for direct 40G/100G SR connections, and plan the cassette vs. harness decision based on reconfiguration frequency rather than initial cost. The connector insertion loss accounting matters from the first design—not as an afterthought when a link won't train.
|
|
||||||
@ -1,54 +0,0 @@
|
|||||||
---
|
|
||||||
title: "SFF-8024 Transceiver Identifier Codes Decoded"
|
|
||||||
slug: "sff-8024-transceiver-id-codes"
|
|
||||||
type: deep-dive
|
|
||||||
category: "Standards & Compatibility"
|
|
||||||
tags: [SFF-8024, EEPROM, transceiver-identification, NOS-compatibility, IDPROM, compatible-optics]
|
|
||||||
seo_focus_keyword: "SFF-8024 transceiver identifier codes"
|
|
||||||
---
|
|
||||||
|
|
||||||
Every transceiver carries a small autobiography in its EEPROM. The first few bytes tell the host system what kind of device it's talking to — and network operating systems use this information to decide whether to bring up the port at all. SFF-8024, the SFF Committee's identifier mapping document, is the Rosetta Stone for this. It's also where most "why won't my optic come up?" investigations eventually land.
|
|
||||||
|
|
||||||
## The Three Bytes That Matter Most
|
|
||||||
|
|
||||||
Address 0x00 is the identifier byte. This is the top-level declaration: what form factor am I? The values are standardized across SFF-8024 Table 4-1. A value of 0x03 means SFP/SFP+/SFP28. 0x0D is QSFP+. 0x11 is QSFP28. 0x18 is QSFP-DD. 0x1E is OSFP. If you're staring at a hex dump from `ethtool -m` or `show interface transceiver` and the first byte reads 0xFF or 0x00, the module either failed to respond during power-up or the interface isn't initialized. That's not a compatibility problem — that's a hardware problem.
|
|
||||||
|
|
||||||
Address 0x01 is the extended identifier. In QSFP28 (SFF-8636) modules, this byte encodes power class and CDR (Clock Data Recovery) capability. Bits 7:6 define the power class from 1 (1.5W max) through 8 (20W max). Bit 2 indicates whether a CDR is present in the TX path, bit 1 for RX. This matters because a host system configured for a low-power module will assert a power-override signal differently than for a high-power module. Mismatches here cause brown-outs that look exactly like a failing laser.
|
|
||||||
|
|
||||||
Address 0x07 is the connector type. The SFF-8024 Table 4-3 list is long — LC (0x07), MPO 1x12 (0x0C), MPO 2x16 (0x0E), CS (0x11), SN (0x13), and so on. This byte exists partly for inventory systems and partly so that NOS platforms can sanity-check whether the physical connection makes sense for the interface type. In practice, most platforms don't gate port bring-up on this byte, but some do log warnings, and a few vendor-specific implementations do use it in their validation logic.
|
|
||||||
|
|
||||||
## What "Compatible" Vendors Actually Write
|
|
||||||
|
|
||||||
Third-party transceiver vendors face a choice when programming EEPROM: fill the identifier fields exactly per specification, or copy the OEM bytes verbatim. The answer varies by vendor and matters enormously.
|
|
||||||
|
|
||||||
A well-programmed compatible module populates 0x00, 0x01, and 0x07 per SFF-8024 exactly. The vendor OUI at bytes 0x41–0x43 (in SFF-8636) will be the compatible vendor's own OUI, not Cisco's or Juniper's. The vendor name field at bytes 0x50–0x5F will say something like "FLEXOPTIX" or "FINISAR" or whatever the actual manufacturer is. This is the honest approach and works reliably on well-implemented platforms.
|
|
||||||
|
|
||||||
A "reprogrammed" module is a different animal. These are typically genuine OEM optics — often pulled from decommissioned equipment — where the EEPROM has been overwritten to match a specific OEM's signature: vendor name, part number, serial number suffix, and sometimes even the OUI. The identifier bytes at 0x00–0x07 stay correct (they're structural, not branding), but the upper vendor fields now claim to be something they're not. The practical implication is that the module will pass vendor authentication checks that read those fields, including Cisco's IDPROM authentication on some Nexus platforms.
|
|
||||||
|
|
||||||
This raises an obvious question about authenticity and one less obvious question about safety. The authenticity question is a policy issue for your organization. The safety question is more interesting: if the EEPROM claims capabilities the hardware doesn't actually have, you can get unexpected behavior under marginal conditions — particularly around power classes and TX disable behavior.
|
|
||||||
|
|
||||||
## How NOS Platforms Use These Bytes for Gating
|
|
||||||
|
|
||||||
Cisco IOS-XE on Catalyst platforms performs what Cisco calls "transceiver validation." The platform reads the identifier bytes, checks the vendor OUI against an internal allow-list, and makes a port enable decision. If the vendor OUI isn't recognized, the port typically comes up anyway but with a log message: `%TRANSCEIVER-3-NOT_QUALIFIED: Transceiver is not qualified`. The port still passes traffic. This is the "warning but allow" model.
|
|
||||||
|
|
||||||
NX-OS on Nexus 9000 series is more aggressive. With `service unsupported-transceiver` not configured, the platform will refuse to bring up a port with an unrecognized vendor OUI, writing `%SFF8472-5-THRESHOLD_VIOLATION` events and keeping the interface admin-down effectively. Enabling unsupported transceiver support is a global command that many operators apply during initial deployment and then forget about — until they migrate to a new chassis where it hasn't been set.
|
|
||||||
|
|
||||||
Junos on MX and PTX platforms reads identifier bytes during PIC initialization. The behavior differs between Junos versions prior to 19.1 and after, where Juniper tightened vendor validation for 400G optics specifically. On QFX switches, the behavior is closer to the Catalyst model — warn and allow. On PTX, particularly for coherent pluggables, validation is stricter.
|
|
||||||
|
|
||||||
Arista EOS is by far the most permissive of the major platforms. It reads the identifier bytes for informational purposes, logs the vendor string, and brings up the port. This is partly why Arista gear is often used for initial compatibility testing of new optics — the platform itself won't be the obstacle.
|
|
||||||
|
|
||||||
## The "Reprogrammed" Question in Plain Terms
|
|
||||||
|
|
||||||
When someone offers you "reprogrammed Cisco optics" at a significant discount, what they're selling is a module that has had its EEPROM overwritten to impersonate a Cisco-branded part. The identifier bytes (0x00, 0x01, 0x07) will be correct per SFF-8024 because they describe the actual hardware. The vendor branding fields will claim Cisco, often including a plausible-looking serial number.
|
|
||||||
|
|
||||||
On platforms that gate port enable on vendor OUI validation, this module will pass the check. On platforms that do EEPROM cryptographic signing (Cisco's later implementations on high-end platforms do perform a signed validation for certain optics), it will fail, because you can copy bytes but you can't forge the signature without the private key.
|
|
||||||
|
|
||||||
The risk profile of reprogrammed optics is not primarily about signal quality. The optical hardware is typically genuine. The risks are: loss of traceable provenance for compliance purposes, behavior under firmware updates that tighten validation (the port you were relying on stops coming up after a scheduled maintenance window), and inability to get vendor TAC support even for problems unrelated to the optic.
|
|
||||||
|
|
||||||
## Reading the EEPROM Yourself
|
|
||||||
|
|
||||||
On Linux, `ethtool -m ethX` gives you a decoded view. For raw hex, `ethtool --dump-module-eeprom ethX hex on` is more useful when you want to check specific bytes against SFF-8024. On Cisco NX-OS, `show interface ethernet X/Y transceiver detail` parses the EEPROM and presents the vendor fields. On Junos, `show chassis pic fpc-slot pic-slot` and `show interfaces diagnostics optics` give you the decoded view.
|
|
||||||
|
|
||||||
If you're doing systematic inventory of a large install base, writing a small script that checks byte 0x00 against expected values and flags mismatches is a ten-minute investment that pays off every time you inherit someone else's network. The alternative is discovering that your "100G LR4" port is actually running a 25G SR optic that someone labeled wrong after the last migration, while you're standing in a data center at 2 AM.
|
|
||||||
|
|
||||||
Understanding SFF-8024 is not about memorizing lookup tables. It's about knowing which three bytes define what the hardware claims to be, which fields define what the vendor claims to have built, and what your NOS does with the difference between those claims and what it expects to see. Most compatibility issues reduce to exactly that gap.
|
|
||||||
@ -1,52 +0,0 @@
|
|||||||
---
|
|
||||||
title: "EDFA vs. Raman Amplifiers for Long-Haul: What Actually Differs"
|
|
||||||
slug: "optical-amplifier-edfa-raman-basics"
|
|
||||||
type: deep-dive
|
|
||||||
category: "Long-Haul & Transmission"
|
|
||||||
tags: [EDFA, Raman-amplifier, long-haul, noise-figure, pump-laser, optical-amplification, hybrid-amplifier]
|
|
||||||
seo_focus_keyword: "EDFA vs Raman amplifier"
|
|
||||||
---
|
|
||||||
|
|
||||||
The question of EDFA versus Raman comes up every time someone designs a span longer than 80 km and discovers that a single amplifier won't do what they hoped. Both technologies add gain to an optical signal without converting it to electrical form. Beyond that statement, almost everything is different — the physics, the component requirements, the noise characteristics, the deployment model, and the failure modes.
|
|
||||||
|
|
||||||
## How EDFA Works and Why It's Dominant
|
|
||||||
|
|
||||||
An Erbium-Doped Fiber Amplifier runs signal light through a short segment of silica fiber that has been doped with erbium ions — typically 10 to 30 meters of fiber with erbium concentration around 100–1000 ppm. A pump laser, usually operating at 980 nm or 1480 nm, excites the erbium ions to a higher energy state. When a signal photon in the C-band (1530–1565 nm) or L-band (1565–1625 nm) passes through, it triggers stimulated emission from the excited erbium ions, producing a second photon of identical wavelength and phase. That's optical amplification.
|
|
||||||
|
|
||||||
The practical implications of this mechanism: EDFA gain is confined to the erbium emission spectrum, which maps almost exactly onto the C-band and L-band — the same bands used by DWDM systems. This alignment is not a coincidence; it's a significant reason why DWDM developed the way it did. Typical gain values for a single-stage EDFA are 20–35 dB with output power up to +23 dBm for a booster amplifier. Inline amplifiers typically run 15–25 dB of gain.
|
|
||||||
|
|
||||||
Noise figure for a well-designed EDFA is 4–6 dB in the C-band. This number is fundamental — it directly degrades OSNR at every amplifier stage, and since long-haul spans may have 20 or more amplifier sites, the cascaded noise figure is the dominant factor in system reach. A 6 dB noise figure EDFA across 20 spans degrades OSNR by roughly 120 dB·nm — that's a theoretical degradation that forces you into stronger FEC or shorter spans.
|
|
||||||
|
|
||||||
Pump laser requirements for EDFA are manageable. A 980 nm pump delivering 200–300 mW of optical power is standard for an inline EDFA. These pumps are single-mode InGaAs devices, well-understood, available from multiple suppliers, and typically rated for 25+ year MTBF at normal operating temperatures. The pump is inside the EDFA housing; the fiber plant doesn't need to carry pump light.
|
|
||||||
|
|
||||||
## How Raman Amplification Works
|
|
||||||
|
|
||||||
Raman amplification uses stimulated Raman scattering (SRS) in standard transmission fiber. A high-power pump laser — typically 500 mW to 1.5 W — is injected into the fiber span, either co-propagating or counter-propagating with the signal. The pump photons interact with molecular vibrations in the silica, downshifting in frequency by approximately 13 THz. If that downshifted frequency coincides with a signal wavelength, the signal experiences gain.
|
|
||||||
|
|
||||||
The most useful characteristic of Raman amplification: it can be distributed. Rather than a discrete amplifier at a point, the gain is spread across the entire fiber span. A counter-pumped Raman configuration injects pump light at the amplifier site and the gain occurs throughout the preceding 80 km of fiber, with the maximum gain accumulating in the last 20–30 km before the amplifier. This means the signal power in the middle of the span is lower than with lumped EDFA amplification — and lower signal power means less nonlinear impairments like cross-phase modulation and four-wave mixing.
|
|
||||||
|
|
||||||
The noise figure for Raman amplification is where it gets interesting. Distributed Raman can achieve an effective noise figure of 0 to -3 dB when the on/off gain (gain measured with pump on versus pump off) is 10–15 dB. That negative noise figure is only meaningful in the context of OSNR calculations — it represents that the signal degradation is less than what a theoretically ideal lumped amplifier would impose, because the signal never fell as low before being amplified.
|
|
||||||
|
|
||||||
The pump laser requirements are where Raman becomes difficult. You need 500 mW to 1.5 W of pump power per channel to achieve 10–15 dB of Raman gain in standard SMF-28. These pumps are high-power multimode Fabry-Perot or DFB devices, significantly more expensive and less reliable than EDFA pump lasers. The pump light travels through the transmission fiber — which means the connectors, splices, and any ROADMs in the path all interact with pump power. Raman-pumped spans require careful attention to fiber connectors; a contaminated connector heating up under 1 W of pump light is a genuine safety concern.
|
|
||||||
|
|
||||||
Raman amplification also creates gain tilt across the C-band. The peak gain frequency is fixed relative to the pump, but the gain is not flat across the signal band. Multi-pump Raman configurations — using two or more pump wavelengths — can flatten the gain profile, but this adds cost and control complexity.
|
|
||||||
|
|
||||||
## What You Cannot Swap
|
|
||||||
|
|
||||||
The systems implications of the pump location difference are profound. EDFA requires electrical power at each amplifier site. Raman can, in principle, allow you to skip electrical power at an intermediate site — a technique called "optically amplified remote repeater" — by propagating pump light from a far-end site. Submarine cable systems have used this since the 1990s. Terrestrial operators have used remote Raman pumping to avoid building access roads to intermediate sites in difficult terrain.
|
|
||||||
|
|
||||||
But you cannot just insert a Raman amplifier where an EDFA was. The fiber plant, channel power, OSNR budget, and gain equalization all need to be re-engineered. Raman gain is not flat, so gain-flattening filters designed for EDFA profiles will not compensate correctly. The dispersion accumulated before the Raman gain site is different from the lumped-gain case, affecting the nonlinear phase accumulated.
|
|
||||||
|
|
||||||
EDFAs cannot achieve negative effective noise figures. If your OSNR budget is already tight and you need lower noise, Raman is the tool — but not as a drop-in.
|
|
||||||
|
|
||||||
## The Hybrid Case That Actually Makes Sense
|
|
||||||
|
|
||||||
Ultra-long-haul systems — spans of 120 km or longer, or systems targeting 8,000+ km terrestrial reach — routinely combine distributed Raman preamplification with a conventional EDFA booster. The hybrid architecture works like this: Raman pumps at the amplifier site inject counter-propagating light into the span. The distributed gain amplifies the signal throughout the last 40 km of fiber before the amplifier, reducing the minimum signal power in the span and suppressing noise accumulation. Then an EDFA provides the bulk gain needed to compensate for the full 120 km of fiber loss.
|
|
||||||
|
|
||||||
The OSNR improvement from a hybrid configuration is typically 3–5 dB compared to EDFA-only on the same span. On a 20-span system, that 3–5 dB improvement either extends reach by several hundred kilometers or allows you to run higher-order modulation (say, 64QAM instead of 16QAM) at the same reach — which doubles the spectral efficiency and the capacity you can deploy on the fiber.
|
|
||||||
|
|
||||||
The Corning TrueWave RS fiber and OFS AllWave fiber, both common in North American terrestrial long-haul, have Raman gain coefficients well-characterized for this type of hybrid design. SSMF (G.652D) has a Raman gain coefficient of approximately 0.4 (W·km)^-1 at 1455 nm pump wavelength. Ultra-low-loss fibers like Corning SMF-28 Ultra or OFS TrueWave Reach have slightly different Raman gain profiles but the hybrid technique applies to all of them.
|
|
||||||
|
|
||||||
Vendors like Ciena (on the GeoMesh platform), Nokia (on the PSI-M), and Infinera design EDFA+Raman hybrid amplifier nodes as standard offerings for terrestrial systems over 80 km spans. The additional cost over a pure EDFA solution — primarily the high-power Raman pump modules and the associated safety interlock systems for the fiber — is justified whenever the alternative is building a new amplifier hut at a site where no infrastructure exists.
|
|
||||||
|
|
||||||
The choice between EDFA, Raman, and hybrid ultimately comes down to three numbers: span loss, OSNR target, and budget. EDFA works for the vast majority of cases. Raman becomes worth considering above 80 km spans or when OSNR margins are under 3 dB. Hybrid is the answer when both are true simultaneously.
|
|
||||||
@ -1,60 +0,0 @@
|
|||||||
---
|
|
||||||
title: "The 800G QSFP-DD Ecosystem in 2026: What's Shipping, What's Not"
|
|
||||||
slug: "qsfp-dd-800g-ecosystem-2026"
|
|
||||||
type: analysis
|
|
||||||
category: "High-Speed Optics"
|
|
||||||
tags: [800G, QSFP-DD, Tomahawk5, Trident5, silicon-photonics, hyperscale, datacenter]
|
|
||||||
seo_focus_keyword: "800G QSFP-DD ecosystem 2026"
|
|
||||||
---
|
|
||||||
|
|
||||||
800G has been "almost here" for long enough that it's worth taking a clear look at what is actually in production, what is sampling, and what remains a roadmap slide. The distinction matters because the gap between hyperscale reality and enterprise availability is currently about 18 to 24 months, and decisions made based on hyperscale deployment announcements often fail to account for that lag.
|
|
||||||
|
|
||||||
## The ASICs That Drive the Deployment Timeline
|
|
||||||
|
|
||||||
Broadcom's Tomahawk 5 (BCM78900) is the primary silicon moving 800G from concept to shipped product. It delivers 51.2 Tbps across 512 SerDes lanes at 112G PAM4, which maps to 64 ports of 800G or 128 ports of 400G. The chip entered production in 2023, and by late 2024, Arista, Cisco, and several ODMs (Wistron, Accton, Celestica) had production 800G switches based on it. The key specification for optics: Tomahawk 5 uses 112G PAM4 SerDes on the ASIC interface, which means the optic-to-ASIC electrical interface is fundamentally different from 400G systems using 56G PAM4.
|
|
||||||
|
|
||||||
Broadcom's Trident 5-X (BCM78800) targets the 12.8 Tbps to 25.6 Tbps range for disaggregated switching use cases. It supports 400G ports natively and 800G in break-out configurations. This chip matters because enterprise access and aggregation will migrate to it before true 800G edge ports become common in enterprise networks.
|
|
||||||
|
|
||||||
Intel's Tofino 3 (now under Broadcom ownership post-acquisition) supports 12.8 Tbps with programmable P4 forwarding. Tofino 3 is relevant for 800G primarily in the context of software-defined networking and telco use cases where per-packet programmability is needed. It hasn't driven high-volume optics demand.
|
|
||||||
|
|
||||||
The ASIC picture from Marvell (Teralynx 10, targeting 51.2 Tbps) and Cisco's in-house silicon (Silicon One G200 at 25.6 Tbps) rounds out the merchant and captive silicon landscape. Neither has driven 800G optic volumes comparable to Tomahawk 5.
|
|
||||||
|
|
||||||
## What Optic Form Factors Are Shipping
|
|
||||||
|
|
||||||
For 800G, QSFP-DD is the dominant form factor in datacenter deployments. The QSFP-DD MSA specifies 8 electrical lanes, and at 800G each lane carries 100G. The two electrical interface options are 8x100G PAM4 (most common for current transceivers) and 2x400G (relevant for future coherent pluggables). Mechanically, QSFP-DD fits in the same footprint as QSFP, with a second row of electrical contacts — though QSFP-DD modules are longer than QSFP28 by about 5 mm.
|
|
||||||
|
|
||||||
OSFP (Octal Small Form Factor Pluggable) is the competing form factor. OSFP is larger than QSFP-DD, supports higher power (up to 20W vs. QSFP-DD's current ~16W practical limit), and is preferred by hyperscalers who need the thermal headroom for coherent 800G and 1.6T roadmap modules. Meta and Microsoft have standardized on OSFP for spine deployments; most merchant silicon switch vendors offer both form factors.
|
|
||||||
|
|
||||||
For 800G in production, the optic types that are actually shipping and available from multiple vendors as of early 2026:
|
|
||||||
|
|
||||||
**800G SR8**: 8 lanes over OM4 multimode fiber, 100 meters. Primarily used for GPU-to-switch in AI clusters. Requires MPO-16 or dual MPO-12 connectivity. Vendors shipping production quantities include Innolight, Eoptolink, II-VI (now Coherent), and Lumentum. Price per unit is approximately $800–1,200 for 100m SR8 from tier-1 compatible vendors.
|
|
||||||
|
|
||||||
**800G DR8**: 8 lanes over OS2 single-mode fiber, 500 meters. Using 8 separate laser transmitters and 8 receivers on one fiber pair (PSM8 architecture) or 8 fiber pairs. This is the dominant spine-to-spine optic in hyperscale deployments. Availability has improved significantly through 2025.
|
|
||||||
|
|
||||||
**800G 2xFR4**: Two wavelength-multiplexed 400G FR4 streams on a single fiber pair, reaching 2 km. Technically demanding, sampling from major coherent vendors, not yet broadly available as a standard catalog item from compatible vendors.
|
|
||||||
|
|
||||||
**800G ZR/ZR+**: QSFP-DD coherent pluggable for DCI at 1,000+ km. This is where OSFP thermal headroom matters. Limited production, primarily deployed by operators running open line systems. Ciena, Lumentum, and Acacia (Cisco) are the primary suppliers.
|
|
||||||
|
|
||||||
## Where Enterprise Deployment Actually Stands
|
|
||||||
|
|
||||||
Enterprise networks are not deploying 800G today in any meaningful volume, with narrow exceptions in HPC and research networks. The blockers are not primarily optics availability — they're economics and use case fit.
|
|
||||||
|
|
||||||
The economic argument: current 400G pricing (roughly $200–400 for 400G SR4 or DR4 compatible optics) combined with mature 400G switch silicon means the cost-per-bit of 400G is still lower than 800G when you factor in switch, optic, and fiber costs together. 800G only makes economic sense when port density drives the decision — which happens at hyperscale ToR densities, not typical enterprise core switches.
|
|
||||||
|
|
||||||
The use case argument: most enterprise applications don't saturate 400G links at the server level. NVMe-oF storage, high-performance compute, and AI training workloads are exceptions. For those workloads, 400G is already standard and 800G is appearing in late 2025/2026 greenfield deployments.
|
|
||||||
|
|
||||||
Realistic enterprise 800G deployment timeline: significant enterprise adoption for AI/HPC applications starts in 2026, general-purpose datacenter spine deployments in 2027–2028. This is roughly the same adoption curve that 400G followed — hyperscale leading by 2–3 years.
|
|
||||||
|
|
||||||
## The Cabling Infrastructure Implication
|
|
||||||
|
|
||||||
One detail that the silicon and optic roadmap discussions often obscure: 800G SR8 requires 8-fiber-per-direction connectivity (or 16-fiber per cable), compared to 400G SR4's 4-fiber-per-direction. A data center pre-wired for 400G with MPO-12 trunk cables (12 fibers, supporting 400G SR4) needs significant cabling infrastructure upgrades to support 800G SR8. MPO-16 or dual-MPO-12 breakout cassettes are available but add cost and complexity.
|
|
||||||
|
|
||||||
DR8 and FR8 variants use single-mode fiber and can, in some configurations, fit on existing single-mode plant depending on exact fiber counts. But the move from 4-fiber-per-direction 400G to 8-fiber-per-direction 800G is a recurring theme that affects every hyperscale facility and will affect enterprise retrofits.
|
|
||||||
|
|
||||||
This infrastructure consideration is probably the least-discussed factor in 800G deployment planning, and it's the one most likely to push enterprise timelines to the right.
|
|
||||||
|
|
||||||
## What to Actually Buy Today
|
|
||||||
|
|
||||||
If you're sourcing for 400G deployments today, the optics market is well-supplied and pricing has fallen substantially. For 800G planning purposes, begin evaluating QSFP-DD 800G SR8 for AI cluster deployments where you're building new fiber plant — you can design for MPO-16 from the start. For everything else, 400G is the right purchasing decision for the next 18 months of enterprise projects.
|
|
||||||
|
|
||||||
The 800G ecosystem is real. It's shipping at hyperscale. The compatible vendor supply chain is maturing. But the enterprise on-ramp is still 18–24 months away from making 800G the default recommendation for general datacenter use.
|
|
||||||
@ -1,56 +0,0 @@
|
|||||||
---
|
|
||||||
title: "Fiber Patch Cord Standards That Actually Matter"
|
|
||||||
slug: "fiber-optic-patch-cord-standards"
|
|
||||||
type: guide
|
|
||||||
category: "Fiber Infrastructure"
|
|
||||||
tags: [patch-cord, insertion-loss, return-loss, UPC, APC, IEC-61754-4, fiber-endface, connector-standards]
|
|
||||||
seo_focus_keyword: "fiber optic patch cord standards"
|
|
||||||
---
|
|
||||||
|
|
||||||
Patch cords don't get much attention until something doesn't work. Then they get a lot of attention. Most of the problems we see traced back to patch cords are either insertion loss above what the link budget allows or return loss below what the transmitter requires — and both of these come down to standards that are well-defined but frequently ignored at purchasing time.
|
|
||||||
|
|
||||||
## Insertion Loss: What IEC 61754-4 Actually Specifies
|
|
||||||
|
|
||||||
IEC 61754-4 is the international standard for LC connectors. It defines the physical dimensions of the ferrule, the mating geometry, and the performance requirements. The key insertion loss specification: for a typical LC/UPC connector pair, the maximum insertion loss is 0.35 dB per mating. That's the standard. Good connectors from quality manufacturers consistently measure 0.10–0.20 dB. Bad connectors — or good connectors with contaminated endfaces — can measure 1.0 dB or more.
|
|
||||||
|
|
||||||
The 0.35 dB figure is often misunderstood as a per-connector budget. A patch cord has two connectors. A typical two-patch-cord connection with a coupler has four connectors and two couplers. At worst-case 0.35 dB per connector and 0.2 dB per coupler, you've consumed 1.8 dB of budget before the signal has traveled a meter of real fiber. On a 100G SR4 link with a 2.6 dB loss budget (OM4 at 100 m), that's essentially all of your headroom. This is why cheap patch cords in high-density environments cause intermittent errors rather than hard failures — you're operating right at the edge.
|
|
||||||
|
|
||||||
IEC 61754-20 covers MPO connectors, with slightly different loss specifications: 0.35 dB per mating for a standard type, better for angled-physical-contact (APC) MPO variants. For 400G SR4 and 800G SR8 applications using MPO-12 or MPO-16 connectors, the relevant standard is IEC 61754-7 for MPO/MTP specifically, which also addresses polarity (Types A, B, C).
|
|
||||||
|
|
||||||
Return loss is specified separately. For a standard PC (physical contact) polish, minimum return loss is 26 dB. UPC (ultra physical contact) spec is 50 dB minimum. APC spec is 60 dB minimum. The distinction matters significantly for single-mode systems where Rayleigh backscattering and connector reflections can interfere with laser stability.
|
|
||||||
|
|
||||||
## The PC/UPC/APC Endface Geometry Explained
|
|
||||||
|
|
||||||
The three endface geometries — PC, UPC, and APC — are defined by the angle and curvature of the fiber end relative to the ferrule.
|
|
||||||
|
|
||||||
PC (physical contact) polish produces a slightly curved end face with the apex roughly centered on the fiber core. When two PC connectors mate, the fiber cores touch at the apex. This eliminates the air gap that causes large reflections in older flat-polished connectors, achieving the 26 dB return loss spec.
|
|
||||||
|
|
||||||
UPC (ultra physical contact) uses the same geometry as PC but with tighter tolerances and a finer polish. The apex offset from center must be within 50 µm (versus 50 µm for PC as well, but the surface quality is better), and the surface roughness is lower. The result is better contact and higher return loss — the 50 dB minimum spec. For practical purposes, use UPC wherever you use PC; the cost difference between PC and UPC patch cords is negligible and you should just standardize on UPC.
|
|
||||||
|
|
||||||
APC (angled physical contact) cuts the ferrule endface at an 8-degree angle from perpendicular. When two APC connectors mate, the angled faces align and the fiber cores meet at the angle. Any back-reflection travels at an 8-degree angle, which means it doesn't couple back into the fiber core — it travels into the cladding and is absorbed. This geometry achieves the 60 dB return loss specification and is used wherever reflections cause problems: CATV systems, PON OLT ports, optical amplifier outputs, and any single-mode laser that's sensitive to back-reflections destabilizing the cavity.
|
|
||||||
|
|
||||||
The critical point about APC that causes problems: APC connectors have green-colored housings by convention, and they will mate with UPC connectors physically but catastrophically optically. An APC-to-UPC mating produces a massive air gap between the fiber cores because the 8-degree angle prevents proper contact. Insertion loss of 4–8 dB is typical for a mismatched APC-UPC mating. Return loss drops to roughly 14 dB. The link will almost certainly fail — or run with such high error rates that it appears to fail intermittently.
|
|
||||||
|
|
||||||
The most common place this happens: PON infrastructure. An OLT port using APC connectors (standard for PON) connected to a UPC patch cord in the IDF because someone grabbed the wrong spool. The symptoms — intermittent errors on the downstream, customer complaints about slow speeds — look exactly like a failing transceiver. Checking connector colors before calling the field service team is a good first step.
|
|
||||||
|
|
||||||
## Repeatability Over 500 Mating Cycles
|
|
||||||
|
|
||||||
IEC 61754-4 specifies that LC connectors must maintain their loss and return loss specifications over 500 mating cycles. This number is often treated as if it means the connector fails at cycle 501 — it doesn't. It means that a connector meeting the standard will not show measurable degradation over 500 matings under controlled conditions.
|
|
||||||
|
|
||||||
In practice, the failure mode is contamination, not mechanical wear. Ferrule ceramic is harder than the mating adapter, so wear is slow. But each mating cycle transfers debris from the connector endface to the adapter and back. In a data center environment where cleaning protocols are inconsistent, connectors can show measurable insertion loss increases after 20–30 mating cycles if they're not cleaned.
|
|
||||||
|
|
||||||
The practical recommendation: clean both connectors and adapters before every insertion. An IEC 61300-3-35 grade inspection (which requires a flaw-free endface with no scratches near the fiber core) before deployment, and cleaning with isopropyl alcohol wipes or a reel-type fiber cleaner for maintenance. If you're getting field complaints about a specific patch panel, the first diagnostic step is cleaning every connector in the panel and remeasuring insertion loss, not ordering replacement patch cords.
|
|
||||||
|
|
||||||
## When APC Is the Wrong Choice
|
|
||||||
|
|
||||||
Despite APC's superior return loss performance, it's specifically wrong in several situations.
|
|
||||||
|
|
||||||
Multimode systems don't use APC. The laser sources in multimode transceivers (VCSELs at 850 nm or 1310 nm) are not sensitive to back-reflections in the same way as single-mode DFB or tunable lasers. Applying APC to a 40G SR4 or 100G SR4 link is wasteful (APC MPO connectors exist but serve no purpose on multimode) and creates mating compatibility risks.
|
|
||||||
|
|
||||||
Any existing UPC or PC infrastructure. Mixing APC and UPC in a network creates mating hazard. If your patch panels use UPC adapters, inserting an APC transceiver cable directly into the adapter produces the catastrophic result described above. You either standardize the entire segment on APC or keep it on UPC/PC.
|
|
||||||
|
|
||||||
Short-reach datacenter interconnects over OM3/OM4 fiber. Zero reason to use APC here. Use UPC, keep all your connectors the same color, reduce installation errors.
|
|
||||||
|
|
||||||
The use cases where APC is mandatory: GPON OLT ports (the standard requires it), any external cavity or coherent laser where back-reflections exceed the -30 dB isolation provided by UPC, fiber-to-the-premises termination points per G.657A specifications, and cable TV transmission equipment.
|
|
||||||
|
|
||||||
Getting patch cord selection right is mostly about avoiding predictable failures. The standards exist, they're reasonable, and the delta between cheap-and-non-compliant and compliant patch cords is small enough that there's no economic argument for buying bad ones.
|
|
||||||
@ -1,50 +0,0 @@
|
|||||||
---
|
|
||||||
title: "Transceiver Failure Root Cause Analysis: A Systematic Approach"
|
|
||||||
slug: "transceiver-failure-root-cause-analysis"
|
|
||||||
type: tutorial
|
|
||||||
category: "Troubleshooting"
|
|
||||||
tags: [transceiver-failure, DOM, root-cause-analysis, laser-failure, ESD, contamination, troubleshooting]
|
|
||||||
seo_focus_keyword: "transceiver failure root cause analysis"
|
|
||||||
---
|
|
||||||
|
|
||||||
When a transceiver fails, the instinct is to replace it and move on. That works operationally, but it leaves the underlying cause unaddressed. If the root cause is contamination, you'll have the same failure in two weeks. If it's a firmware incompatibility, every optic in that platform is at risk. If it's ESD damage during installation, you have a handling problem that will continue generating failures. Systematic root cause analysis changes the economics of transceiver lifecycle management.
|
|
||||||
|
|
||||||
## The Five Failure Categories
|
|
||||||
|
|
||||||
Transceiver failures divide into five categories with distinct signatures: laser degradation, receiver saturation, contamination, ESD damage, and firmware/software incompatibility. Each has characteristic symptoms, DOM data signatures, and distinguishing tests.
|
|
||||||
|
|
||||||
**Laser degradation** is the natural end-of-life failure mode. VCSEL and DFB laser diodes degrade over time due to facet oxidation, dark line defects propagating from material dislocations, and catastrophic optical mirror damage (COMD) from operating above rated power. Laser degradation is a slow process — the module typically shows increasing bias current over months before the optical output drops below threshold. The DOM data for a laser nearing end-of-life: TX bias current increasing toward the high alarm threshold while TX output power is flat or slowly declining, followed by a rapid collapse in TX output as threshold current is no longer met. Average laser lifetimes for good VCSELs are 200,000 hours at rated temperature. But "rated temperature" is doing a lot of work in that sentence.
|
|
||||||
|
|
||||||
**Receiver saturation** is a failure mode that engineers often misidentify as a link problem or a remote-end transmitter issue. The DOM data signature is RX power reading at or above the high alarm threshold — often showing the maximum value the ADC can read, like -1.0 dBm or higher, while the link still shows bit errors or complete failure. The receiver photodiode or TIA (transimpedance amplifier) is overdriven by too much optical power. This happens when: the transmitter at the far end is running at maximum output while the fiber path loss is minimal (short single-mode links with no attenuator), or when the modulation frequency response of the receiver degrades with age and the high-frequency components cause peak power excursions above the saturation threshold. Fix: add a fiber attenuator. 5–10 dB of inline attenuation on a receiver-saturated link is completely normal and correct.
|
|
||||||
|
|
||||||
**Contamination** is the most common cause of premature failure and the easiest to prevent. Endface contamination — oil from fingerprints, dust, cleaning residue — causes localized hot spots as the optical power density at the fiber core (roughly 60 µm diameter) hits contamination particles. At 100G and higher power densities, this can physically damage the endface within minutes of operation. The DOM data doesn't always show contamination clearly: you may see slightly elevated TX power as the laser drive circuit compensates for loss, or normal TX power with abnormal link errors. The definitive test is visual inspection with an IEC 61300-3-35 grade fiber microscope — the fiber core should be completely clean, and anything visible in the 0–25 µm zone is a problem.
|
|
||||||
|
|
||||||
**ESD damage** causes immediate or latent failure. Immediate ESD damage is obvious: the module doesn't respond at all after installation, shows no DOM data, and the TX disable may be stuck. Latent ESD damage is worse because the module appears to work but has degraded performance — typically manifesting as elevated TX bias current (the laser junction resistance has changed), poor receiver sensitivity (the TIA input has degraded), or intermittent DOM readout failures as the EEPROM interface is compromised. ESD damage is particularly common at ports 1 and last-port-in-row positions, at grounding straps on switch chassis that aren't actually grounded, and during module swaps performed without ESD wrist straps.
|
|
||||||
|
|
||||||
**Firmware and software incompatibility** presents as the module initializing but failing to come up, or coming up with degraded performance, or reporting correct DOM values but with intermittent link flaps. This failure mode has increased significantly with CMIS 4.0 and 5.0 modules on older NOS versions that don't implement the initialization state machine correctly. The distinguishing characteristic: the same physical module works in a different platform or a different NOS version.
|
|
||||||
|
|
||||||
## Reading DOM Data Post-Mortem
|
|
||||||
|
|
||||||
When you pull a failed module, check the DOM values before you ship it back. Most modules retain their last-valid DOM readings in EEPROM. Four fields matter most for post-mortem: TX bias current, TX output power, RX input power, and temperature.
|
|
||||||
|
|
||||||
TX bias current approaching or exceeding the high alarm threshold (typically 100 mA for SFP28, 13 mA per lane for QSFP28) suggests laser degradation or thermal stress. If the current is normal but TX output is low, the laser itself may be intact but the TOSA coupling efficiency has degraded — potentially from contamination damage on the lens.
|
|
||||||
|
|
||||||
RX input power below the low alarm threshold (typically -20 to -23 dBm for 100G SR4) during a link failure could indicate far-end TX failure, fiber break, or severe contamination on the receive side. RX power above the high alarm threshold is receiver saturation as discussed.
|
|
||||||
|
|
||||||
Temperature deserves attention. An SFP+ module rated for 0–70°C that was consistently running at 68°C has been operating at the edge of its rated range. That's not failure per se, but it explains why it's the second module to fail in that same slot. Check the ambient temperature and airflow at that chassis position.
|
|
||||||
|
|
||||||
## The Distinguishing Test Sequence
|
|
||||||
|
|
||||||
When you have a failed module and want to determine root cause, this sequence takes about 15 minutes and answers most questions.
|
|
||||||
|
|
||||||
First, inspect the endface under a fiber microscope before doing anything else. If you see contamination or physical damage on the endface, that's probably your answer. Document it photographically.
|
|
||||||
|
|
||||||
Second, check the DOM history if available. Some NOS platforms log DOM readings over time (Junos has `show interfaces diagnostics optics extensive` with historical data on some platforms; Arista EOS has similar). A gradual trend toward threshold is laser degradation. A sudden step change is ESD or contamination damage.
|
|
||||||
|
|
||||||
Third, try the module in a different chassis slot and a different fiber patch cord. If it works, the problem is in the original slot — dirty adapter, incompatible firmware, thermal issue in that specific position. If it still doesn't work, the module itself is the issue.
|
|
||||||
|
|
||||||
Fourth, use a power meter and light source to verify optical output from the TX if the module powers up. If the TX is producing measurable output but below spec, that's a partially-degraded laser or TOSA alignment issue. If there's no TX output at all, the laser driver or the laser itself has failed.
|
|
||||||
|
|
||||||
Fifth, if everything else checks out, check the NOS firmware version against the module vendor's compatibility matrix. This is where the compatible optics documentation from your vendor matters — a good compatible vendor publishes the NOS versions and feature sets their modules have been validated against.
|
|
||||||
|
|
||||||
Skipping to "replace and move on" is fine for a single failure. For recurring failures in a specific slot, a specific chassis, or across a deployment, the 15-minute analysis pays for itself many times over.
|
|
||||||
@ -1,56 +0,0 @@
|
|||||||
---
|
|
||||||
title: "Cisco Nexus vs. Catalyst Optical Behavior: Where You'll Get Burned"
|
|
||||||
slug: "cisco-nexus-vs-catalyst-optic-behavior"
|
|
||||||
type: analysis
|
|
||||||
category: "Vendor Compatibility"
|
|
||||||
tags: [Cisco, Nexus, Catalyst, NX-OS, IOS-XE, IDPROM, compatible-optics, transceiver-compatibility]
|
|
||||||
seo_focus_keyword: "Cisco Nexus Catalyst transceiver compatibility"
|
|
||||||
---
|
|
||||||
|
|
||||||
Cisco has two dominant switching platforms in the enterprise and datacenter market, and they do not treat optical transceivers the same way. If you're migrating from Catalyst to Nexus infrastructure, or sourcing optics for a mixed environment, or trying to understand why the QSFP-28 that worked fine on a Catalyst 9500 is generating compatibility warnings on a Nexus 9300, the answer is in how the two platforms read and act on EEPROM data. These differences are documented, but not conspicuously.
|
|
||||||
|
|
||||||
## The IDPROM Parsing Difference
|
|
||||||
|
|
||||||
Both NX-OS and IOS-XE read the transceiver's EEPROM during module initialization. The data is parsed into what Cisco calls the IDPROM (Identifier PROM). But what each platform does with that data diverges significantly.
|
|
||||||
|
|
||||||
IOS-XE on Catalyst platforms performs vendor validation by checking the vendor name field (bytes 0x50–0x5F in SFF-8636 for QSFP modules) against an internal list of qualified vendors. If the name isn't recognized, the platform logs a warning and continues operating. The port comes up. You get `%TRANSCEIVER-3-NOT_QUALIFIED: Transceiver in Gi1/0/X is not qualified` in the syslog, and traffic flows normally. This has been the Catalyst behavior for years and is well understood.
|
|
||||||
|
|
||||||
NX-OS on Nexus platforms performs a more complex validation sequence. In addition to the vendor name check, NX-OS on many Nexus 9000 series platforms checks the vendor OUI (bytes 0x41–0x43 in SFF-8636), the part number field, and on some line card types, performs a IDPROM integrity check. The consequence of a failed check is not a warning — it's a port that stays in the `sfpInvalid` state and doesn't pass traffic. The command `service unsupported-transceiver` globally enables third-party transceiver support on NX-OS, but this command is not enabled by default and many operators are surprised when the Nexus installation doesn't match the Catalyst behavior they're accustomed to.
|
|
||||||
|
|
||||||
The exact behavior varies by Nexus platform. Nexus 9300-EX/FX/GX line cards are more permissive than older Nexus 9300 series fixed-configuration switches. Nexus 7000 with N7K-M324FQ-25L line cards is notably strict about coherent optics vendor validation. Check the specific platform and software version before assuming behavior.
|
|
||||||
|
|
||||||
## The NX-OS Version Factor
|
|
||||||
|
|
||||||
NX-OS behavior has changed across versions in ways that create unexpected operational surprises. NX-OS 9.3(x) introduced stricter IDPROM validation for 100G optics on certain Nexus 9300 series platforms, breaking optics that had worked under 9.2(x). This was a deliberate change, documented in the release notes as a "security enhancement" — which it arguably is, from Cisco's perspective.
|
|
||||||
|
|
||||||
NX-OS 10.x introduced CMIS support for 400G QSFP-DD modules, but also changed how the power class fields are parsed. Modules that negotiated power class correctly under 9.3(x) sometimes need updated firmware or revised EEPROM programming to properly initialize under 10.x. If you're upgrading NX-OS and you have third-party 400G optics, test a subset before committing the entire fleet.
|
|
||||||
|
|
||||||
IOS-XE version differences are less dramatic for transceiver behavior, but Catalyst 9000 series running IOS-XE 17.x added what Cisco calls "Transceiver Type Verification" which rejects certain QSFP form factors based on the electrical interface specification claimed in the EEPROM rather than just vendor identity. This matters if you're inserting a QSFP28 module into a Catalyst 9500 running newer IOS-XE — the platform checks whether the claimed electrical interface type is consistent with the expected interface for that port.
|
|
||||||
|
|
||||||
## The Practical Impact for Compatible Optics
|
|
||||||
|
|
||||||
A well-programmed compatible QSFP28 that works on Catalyst will work on Nexus provided `service unsupported-transceiver` is enabled. This is the default assumption, and it's correct for the large majority of cases. But there are specific scenarios where it breaks down.
|
|
||||||
|
|
||||||
Nexus 9000 platforms with the -EX or -FX suffix use ASIC-level optical management. Some of these platforms read the EEPROM and configure the port SERDES parameters based on optic type — particularly the CDR bypass and TX emphasis settings. If a compatible module's EEPROM claims a different CDR configuration than what the hardware actually implements, the ASIC may configure the lane incorrectly, producing link flaps or elevated BER that look exactly like a bad fiber run. This is rare but occurs with lower-quality compatible modules that copy OEM EEPROM bytes without understanding the implication of the CDR control fields.
|
|
||||||
|
|
||||||
Nexus platforms with port-level optical policies (configured via `interface ethernet X/Y` with `transceiver-type` checks in some Nexus 9000-V configurations) may reject modules based on declared nominal bit rate rather than form factor alone. This matters when inserting a 100G QSFP28 module that is programmed with nominal bit rate of 100 Gbps into a port that has been configured for 40G operation. The platform sees a capability mismatch and keeps the port down.
|
|
||||||
|
|
||||||
## Migrating Between Platforms: The Checklist
|
|
||||||
|
|
||||||
When you're moving infrastructure from Catalyst to Nexus, or managing a mixed environment, these steps prevent the majority of optic-related surprises.
|
|
||||||
|
|
||||||
Enable `service unsupported-transceiver` before the first Nexus deployment if you're using any non-Cisco-branded optics. Verify it persists across configuration save/restore cycles — it should, but confirm it explicitly.
|
|
||||||
|
|
||||||
Check DOM threshold configuration. Catalyst platforms allow per-interface DOM threshold customization. NX-OS handles DOM thresholds differently, and the default thresholds in NX-OS may cause alarm events on optics that were quiet on Catalyst. Review `show interface transceiver details` across a sample of ports after migration.
|
|
||||||
|
|
||||||
For 400G QSFP-DD optics specifically, verify the CMIS version that your modules implement against the NX-OS version's CMIS support. Modules implementing CMIS 4.0 or 5.0 require NX-OS 9.3(7) or later for proper initialization. Earlier NX-OS versions may bring up the port but fail to configure the module correctly, resulting in FEC mismatch or incorrect modulation settings.
|
|
||||||
|
|
||||||
Test autonegotiation behavior. Catalyst 9000 series has different default autonegotiation settings than Nexus 9300 series for some port speeds. A 100G port on Catalyst may default to autoneg off while the same port on Nexus 9300 may default to autoneg on, which matters for optics that don't handle autoneg gracefully.
|
|
||||||
|
|
||||||
The underlying reality is that both platforms work well with compatible optics when configured correctly. The differences are navigable. But they exist, they're not consistently documented in one place, and discovering them at 2 AM during a migration window is avoidable if you've done the pre-migration testing.
|
|
||||||
|
|
||||||
## The One Command That Saves Time
|
|
||||||
|
|
||||||
On NX-OS, `show interface ethernet X/Y transceiver` is your primary diagnostic. If the output shows `sfpInvalid` in the SFP field, the IDPROM validation failed and `service unsupported-transceiver` is either not enabled or the module has a specific EEPROM issue that needs investigation. If it shows `sfpNotPresent`, the module isn't being detected at all — check seating and try reseating. If it shows vendor name and part number correctly but the interface is still down, the optic is detected but there's a link-level issue separate from compatibility.
|
|
||||||
|
|
||||||
On IOS-XE, `show interfaces GigabitEthernet X/Y/Z transceiver detail` gives you the IDPROM fields plus DOM values. The `Transceiver Type` field tells you what the platform parsed from EEPROM — if this doesn't match what you expect, the EEPROM is either corrupted or programmed differently than the module's actual specification.
|
|
||||||
@ -1,56 +0,0 @@
|
|||||||
---
|
|
||||||
title: "PAM4 vs. NRZ Modulation in Transceivers: The Practical Implications"
|
|
||||||
slug: "pam4-vs-nrz-modulation-transceivers"
|
|
||||||
type: deep-dive
|
|
||||||
category: "Modulation & Signal Integrity"
|
|
||||||
tags: [PAM4, NRZ, modulation, CDR, link-budget, 400G, fiber-quality, signal-integrity]
|
|
||||||
seo_focus_keyword: "PAM4 vs NRZ transceiver modulation"
|
|
||||||
---
|
|
||||||
|
|
||||||
The migration from NRZ (Non-Return-to-Zero) to PAM4 (Pulse Amplitude Modulation, 4-level) is the defining signal engineering change of the 100G-to-400G transition. Every engineer deploying 400G or higher-speed optics needs to understand what PAM4 actually means for their fiber plant, their link budget assumptions, and what happens when the two technologies coexist in the same infrastructure.
|
|
||||||
|
|
||||||
## NRZ: The Baseline
|
|
||||||
|
|
||||||
NRZ is the modulation scheme that has dominated optical networking since digital transmission began. In NRZ, each bit period contains a single signal level representing either a 1 or a 0. A 100G NRZ system uses 4 parallel lanes, each running at 25 Gbps NRZ — the baud rate (symbol rate) equals the bit rate per lane. This is conceptually clean: one baud = one bit.
|
|
||||||
|
|
||||||
The noise sensitivity of NRZ is determined by the eye opening — the vertical and horizontal separation between the 0 and 1 signal levels at the decision point. A clean NRZ eye has roughly half the optical power range available for separating 0 from 1. Optical sensitivity for a 25G NRZ receiver is typically around -10 to -14 dBm minimum for a good transceiver, depending on the specific implementation.
|
|
||||||
|
|
||||||
25G NRZ at the lane level has been successfully deployed on OM3 and OM4 multimode fiber for distances from 1 m to 300 m (OM4), and on OS2 single-mode fiber for distances up to 10 km (10GBASE-LR profile scaled to 25G lane rates). The technology is proven, the fiber requirements are well-characterized, and the installed base is enormous.
|
|
||||||
|
|
||||||
## PAM4: Four Levels Instead of Two
|
|
||||||
|
|
||||||
PAM4 encodes 2 bits per symbol by using four distinct amplitude levels instead of two. A 50 Gbaud PAM4 signal carries 100 Gbps per lane. This is why 400G QSFP-DD can use 8 lanes at 50 Gbaud PAM4 rather than requiring 16 lanes at 25 Gbaud NRZ — the lane count stays manageable as bit rates scale.
|
|
||||||
|
|
||||||
The fundamental tradeoff: PAM4 squeezes four amplitude levels into the same voltage or optical power range that NRZ used for two. The three eye openings in a PAM4 signal (between levels 0-1, 1-2, and 2-3) are each only one-third the size of the single NRZ eye opening. This means the noise margin for each level transition is significantly reduced. Specifically, for the same peak optical power, PAM4 has 9.5 dB less margin per eye compared to NRZ.
|
|
||||||
|
|
||||||
That 9.5 dB is why PAM4 systems require much stronger FEC (Forward Error Correction). 100G NRZ over SR4 uses KR4 FEC or KP4 FEC as optional. 400G PAM4 over SR8 or DR4 mandates KP4 FEC — without FEC, the raw BER from a PAM4 system running at the edge of its link budget would be unacceptable. KP4 FEC can correct a raw BER of up to 2.4×10^-4 to below 10^-15, which is what makes PAM4 practical despite the noise margin reduction.
|
|
||||||
|
|
||||||
## The Link Budget Difference
|
|
||||||
|
|
||||||
A 100G SR4 link (4 lanes, 25G NRZ per lane) over OM4 has an application code loss budget of 1.9 dB for the channel loss at 850 nm (this covers the fiber, connectors, and splices, not the transceiver internal loss). The minimum launch power per lane is typically -4 dBm and the receiver sensitivity is around -9.5 dBm, leaving 5.5 dB of component margin beyond the fiber channel budget.
|
|
||||||
|
|
||||||
A 400G SR8 link (8 lanes, 50G PAM4 per lane) over OM4 has an application code channel loss budget of 1.9 dB as well — the same number. But the minimum launch power per lane is now 0 dBm (versus -4 dBm) and the receiver sensitivity is -6.5 dBm for a good implementation. The effective operating range is similar on paper, but the PAM4 system is running the lasers harder and requiring better receiver performance to achieve the same optical channel.
|
|
||||||
|
|
||||||
The practical consequence: a fiber infrastructure that "works fine" for 100G SR4 may be marginal for 400G SR8, even at the same distances, if:
|
|
||||||
|
|
||||||
The connectors are at or near the 0.35 dB per mating specification rather than the 0.10–0.15 dB typical for good connectors. On a 100G NRZ link, 5 connector pairs at 0.30 dB each (1.5 dB total) leaves 0.4 dB of headroom. On a 400G PAM4 link where the PAM4 eye margin reduction has already consumed most of the engineering margin, the same 1.5 dB of connector loss may push the link outside the acceptable operating region, especially on temperature variations.
|
|
||||||
|
|
||||||
The fiber itself has higher attenuation than nominal. OM4 is specified at 3.5 dB/km at 850 nm. Old OM4 that has been routed through tight bends, patched dozens of times, or is running warm in poorly ventilated trays may measure 4.0–4.5 dB/km at 850 nm. For a 50 m run, that's still under 0.25 dB extra — not significant. For a 100 m run at the edge of spec, it matters.
|
|
||||||
|
|
||||||
## CDR Requirements and What Happens Without Them
|
|
||||||
|
|
||||||
Clock Data Recovery is the DSP function that synchronizes the receiver sampling clock to the incoming signal. In NRZ systems at 25G, CDR is helpful but not always mandatory — many short-reach multimode links run without CDR in the receiver because the eye opening is large enough for a simple comparator. This is why some 10G and 25G SFP28 short-reach modules are sold as "CDR-free" variants, which are cheaper and have lower latency.
|
|
||||||
|
|
||||||
PAM4 systems at 50G per lane require CDR in both the transmitter and receiver. The transmitter CDR is needed because the ASIC serializer outputs a NRZ signal from two NRZ lanes, which the CDR converts to PAM4 before the optical interface. The receiver CDR performs the inverse: PAM4 optical to two-lane NRZ electrical. There is no "CDR-free" PAM4 transceiver for 50G+ lane rates because the DSP is integral to the modulation scheme.
|
|
||||||
|
|
||||||
The implication: PAM4 transceivers have higher power consumption (more DSP), more latency (CDR adds roughly 20–50 ns), and more points of failure (the DSP itself) compared to equivalent-bandwidth NRZ systems. This is a known and accepted tradeoff at 400G and above.
|
|
||||||
|
|
||||||
## When Your Old Fiber Plant Fails
|
|
||||||
|
|
||||||
The most common deployment failure pattern with PAM4 is this: an operator deploys 400G DR4 or FR4 transceivers on existing single-mode fiber infrastructure that was installed for 10G or 40G. The fiber tests clean at 1310 nm with an OTDR. The links come up. A few weeks later, some links are flapping or showing elevated FEC counters.
|
|
||||||
|
|
||||||
The fiber plant may be fine in terms of attenuation. The problem is chromatic dispersion accumulation. PAM4 at 50G per lane is more sensitive to dispersion than 10G NRZ because the symbol period is shorter (20 ps for 50 Gbaud vs. 100 ps for 10G NRZ) and the 4-level eye opening is smaller — dispersion-induced pulse spreading closes an already-small PAM4 eye faster than it closes an NRZ eye.
|
|
||||||
|
|
||||||
400G DR4 has a dispersion tolerance of approximately ±50 ps/nm, which translates to about 310 m of SMF-28 at 1310 nm. For runs under 500 m, dispersion is not the issue. For runs of 1–2 km on older SMF that runs slightly higher dispersion per km, it can be.
|
|
||||||
|
|
||||||
The practical takeaway is that "the fiber worked before" is not sufficient qualification for PAM4. Test the actual insertion loss at the operating wavelength, count the connector matings in the path, and check the fiber type specification. For 400G and above, the fiber infrastructure needs to meet a tighter tolerance than it did at 100G NRZ, even if the nominal link budget numbers look similar.
|
|
||||||
@ -1,62 +0,0 @@
|
|||||||
---
|
|
||||||
title: "PON Optics for Enterprise Engineers: GPON, XGS-PON, and Why They're Different"
|
|
||||||
slug: "pon-gpon-xgspon-optics-explainer"
|
|
||||||
type: guide
|
|
||||||
category: "Access & PON"
|
|
||||||
tags: [PON, GPON, XGS-PON, NG-PON2, burst-mode, OLT, ONT, access-optics]
|
|
||||||
seo_focus_keyword: "GPON XGS-PON optics explained"
|
|
||||||
---
|
|
||||||
|
|
||||||
Most optical networking engineers who work in datacenters or enterprise backbone environments have never touched a PON transceiver, and the assumption that PON optics work like datacenter optics is natural but wrong. PON (Passive Optical Network) transceivers have fundamentally different operating principles, and the differences explain why they're cheaper, less interchangeable, and architecturally constrained in ways that datacenter optics are not.
|
|
||||||
|
|
||||||
## The Architecture That Defines the Optics
|
|
||||||
|
|
||||||
A PON system consists of an OLT (Optical Line Terminal) at the operator's central office or equipment room, connected via a passive 1:32 or 1:64 optical splitter to multiple ONTs (Optical Network Terminals) at subscriber premises. The word "passive" is key: there are no active amplifiers in the distribution network. The splitter simply divides the optical signal.
|
|
||||||
|
|
||||||
This means the OLT transmitter must produce enough power to reach the farthest ONT through the splitter loss. A 1:32 splitter has about 15 dB of splitting loss plus fiber attenuation over runs that may extend 20 km. The OLT transmits at +2 to +5 dBm (for GPON class B+ and C+), and the ONT receiver must handle received power as low as -28 dBm. That 30+ dB working range is roughly three times wider than a typical datacenter transceiver's operating range.
|
|
||||||
|
|
||||||
The upstream direction — ONT to OLT — is even more interesting. ONTs at different distances from the OLT receive the downstream signal at different power levels, but their upstream transmissions all arrive at the OLT from different distances, through different amounts of fiber, and therefore at different power levels. The OLT receiver must handle upstream bursts from different ONTs that may differ in power by 20–30 dB from one burst to the next, arriving milliseconds apart.
|
|
||||||
|
|
||||||
This requirement — burst-mode reception with rapid power level adaptation — is the defining technical challenge of PON optics.
|
|
||||||
|
|
||||||
## Burst-Mode Receivers: The Core Difference
|
|
||||||
|
|
||||||
A datacenter SFP28 receiver amplifies the incoming signal continuously. The TIA (transimpedance amplifier) is designed for continuous-mode operation: it settles to a stable gain and offset setting and maintains it. A PON OLT receiver must reset its operating point on every upstream burst, typically in less than 1 µs for GPON and less than 800 ns for XGS-PON.
|
|
||||||
|
|
||||||
This burst-mode requirement means the OLT transceiver's receiver uses a different TIA architecture — typically a burst-mode TIA that uses fast automatic gain control (AGC) circuitry to set the decision threshold independently for each upstream burst. This is significantly harder to implement than continuous-mode reception and is a major reason why OLT transceivers cost substantially more than the ONT transceivers they communicate with.
|
|
||||||
|
|
||||||
ONT transceivers, by contrast, use burst-mode transmitters. The ONT must switch its laser on and off according to the time slot allocated by the OLT (TDMA scheduling), and the burst must ramp to full power within a tight preamble window before the payload data begins. The laser driver in an ONT transceiver is designed for rapid on/off cycling — millions of times per day in normal operation.
|
|
||||||
|
|
||||||
Datacenter transceivers do have TX disable functionality, but it's not designed for sub-microsecond burst operation. Using a datacenter SFP+ as an ONT transmitter would produce garbled timing on the burst preamble and fail to meet the G.984 timing specifications.
|
|
||||||
|
|
||||||
## GPON vs. XGS-PON vs. NG-PON2: What Changes in the Optics
|
|
||||||
|
|
||||||
GPON (G.984) operates at 2.488 Gbps downstream and 1.244 Gbps upstream, using 1490 nm for downstream and 1310 nm for upstream. The downstream uses NRZ modulation at 2.5G. This is mature technology — GPON has been deployed since 2004 and is the dominant residential broadband technology globally.
|
|
||||||
|
|
||||||
XGS-PON (G.9807.1) is the symmetrical 10G successor: 9.953 Gbps downstream at 1577 nm and 9.953 Gbps upstream at 1270 nm. The "XGS" designation means 10G symmetrical, distinguishing it from XG-PON1 (asymmetrical 10G downstream/2.5G upstream). The optics are significantly more demanding — the upstream rate is 8x GPON, requiring faster burst-mode receivers and transmitters, tighter wavelength control, and better receiver sensitivity.
|
|
||||||
|
|
||||||
XGS-PON OLT transceivers use DFB lasers for the downstream transmitter (as do GPON OLT transceivers for the 1490 nm downstream) and have photodiodes capable of burst-mode operation at 10G. The burst-mode reset time requirement drops to under 800 ns at 10G versus approximately 1–2 µs for GPON.
|
|
||||||
|
|
||||||
NG-PON2 (G.989) uses TWDM (Time and Wavelength Division Multiplexing) with 4 or 8 wavelength pairs, each carrying 10G, for aggregate capacities of 40G or 80G per PON port. The OLT transceiver for NG-PON2 is a tunable DWDM device — fundamentally more complex and expensive than a fixed-wavelength GPON or XGS-PON transceiver. NG-PON2 deployment is primarily in greenfield telco access builds; retrofitting GPON infrastructure to NG-PON2 is possible but requires tunable ONT transceivers at every premise.
|
|
||||||
|
|
||||||
The practical compatibility picture: GPON and XGS-PON can coexist on the same fiber infrastructure using wavelength division — GPON uses 1490/1310 nm, XGS-PON uses 1577/1270 nm, and they don't interfere. An XGS-PON OLT port and a GPON OLT port can connect to the same passive splitter, serving a mix of GPON and XGS-PON ONTs. This is the standard migration path for operators upgrading from GPON.
|
|
||||||
|
|
||||||
## APC Connectors: The PON Standard
|
|
||||||
|
|
||||||
One practical detail that catches enterprise engineers: PON OLT ports use APC (angled physical contact) connectors, specifically LC/APC or SC/APC. This is mandated by G.984 and G.9807 because the back-reflection at PON power levels into the OLT receiver is sufficient to cause problems with a UPC connector's 50 dB return loss spec. APC's 60 dB return loss reduces this further.
|
|
||||||
|
|
||||||
If you're connecting PON equipment into patch panels designed for datacenter use with UPC adapters, you will get the APC/UPC mating disaster described elsewhere — catastrophically high insertion loss and a completely non-functional link. PON infrastructure needs APC patch panels and APC patch cords throughout.
|
|
||||||
|
|
||||||
## Where PON Makes Sense Outside Telco Access
|
|
||||||
|
|
||||||
Enterprise campuses have found PON useful for several scenarios that don't look like traditional GPON residential deployment.
|
|
||||||
|
|
||||||
Passive cabling infrastructure for office buildings: a single OLT card in the IDF can serve 32–64 offices via a passive splitter, eliminating active switching in each floor's closet. For read-only data collection (IoT, CCTV, access control), the downstream-heavy nature of GPON (2.5G down, 1.25G up) is less of a limitation.
|
|
||||||
|
|
||||||
Industrial facilities where powered infrastructure in hazardous areas is problematic: a passive fiber plant with all active equipment in a safe room at the OLT side satisfies electrical safety requirements for areas where powered Ethernet switches would require explosion-proof enclosures.
|
|
||||||
|
|
||||||
Long-distance building connects: GPON's 20 km operating range without amplification covers campus-to-campus connections that would otherwise require either very expensive coherent optics or an intermediate active repeater site.
|
|
||||||
|
|
||||||
The constraint in all these enterprise applications is the shared bandwidth model. PON is a shared medium — the 2.5G GPON downstream or 10G XGS-PON downstream is divided among all ONTs on that splitter. For enterprise applications where individual tenants need guaranteed bandwidth, PON's TDMA scheduling means you're sharing with everyone else on the same splitter tree.
|
|
||||||
|
|
||||||
Understanding PON optics is primarily about understanding the burst-mode and shared-medium constraints that make these transceivers different from everything else in your infrastructure. The wavelength plan, the connector standards, and the APC requirement are all downstream of that fundamental architectural difference.
|
|
||||||
@ -1,56 +0,0 @@
|
|||||||
---
|
|
||||||
title: "IP-Optical Integration and Disaggregation: What's Real in 2026"
|
|
||||||
slug: "ip-optical-integration-disaggregation"
|
|
||||||
type: opinion
|
|
||||||
category: "Network Architecture"
|
|
||||||
tags: [disaggregation, open-line-system, ROADM, OpenConfig, ONOS, IP-optical, coherent]
|
|
||||||
seo_focus_keyword: "IP optical disaggregation open line system"
|
|
||||||
---
|
|
||||||
|
|
||||||
The disaggregation narrative in optical networking has been running for about a decade. The pitch is straightforward: separate the ROADM hardware from the amplifiers, separate the amplifiers from the coherent transponders, use open APIs to control everything, and stop being locked into single-vendor end-to-end optical systems. The reality in 2026 is more nuanced than either the enthusiasts or the skeptics claim.
|
|
||||||
|
|
||||||
## What Disaggregation Actually Means
|
|
||||||
|
|
||||||
Traditional optical networking sells you a complete system: ROADM nodes from one vendor, with that vendor's coherent transponder cards inside, managed by that vendor's NMS, with amplifier modules designed specifically for that chassis. Ciena's WaveLogic, Nokia's PSI, Infinera's ICE — all traditionally vertically integrated stacks.
|
|
||||||
|
|
||||||
Disaggregation separates these components. An "open line system" (OLS) provides the ROADM function — wavelength switching, amplification, and optical performance monitoring — from one vendor (or a white-box OLS vendor), while the coherent transponders come from a different vendor or sit in an IP router as pluggable ZR/ZR+ modules. The control plane — allocating wavelengths, configuring gain, monitoring OSNR — runs on a separate controller using open APIs.
|
|
||||||
|
|
||||||
The practical driver for operators: coherent transceiver technology has advanced to where pluggable modules like 400G ZR (OpenZR+ standard) deliver acceptable performance for many use cases, particularly metro and regional DCI. Operators can buy QSFP-DD 400ZR modules from multiple vendors (Acacia/Cisco, Lumentum, Viavi, Flexoptix) and insert them into routers, eliminating the separate transponder shelf. The "open line system" then becomes the common infrastructure that all those pluggables share.
|
|
||||||
|
|
||||||
## Where OpenConfig Actually Runs
|
|
||||||
|
|
||||||
OpenConfig has become the primary data model for IP-optical control. The OpenConfig optical transport model covers the channel (OCH), optical media channel, and amplifier configuration. Vendors including Ciena, Nokia, Infinera, Fujitsu, and Lumentum all publish OpenConfig YANG model support for their optical equipment, which theoretically means a controller can configure any of them using the same data model.
|
|
||||||
|
|
||||||
"Theoretically" carries significant weight here. OpenConfig model coverage is vendor-specific. One vendor may implement 80% of the optical amplifier model and leave gain tilt configuration as a proprietary extension. Another may implement the full model but with subtly different semantics for target power versus actual power reporting. The test of real interoperability is not whether the YANG is there but whether you can actually manage a multi-vendor optical layer from a single controller without platform-specific workarounds.
|
|
||||||
|
|
||||||
ONOS (Open Network Operating System) from the Open Networking Foundation has an optical south-bound layer that supports OpenConfig and NETCONF. Organizations like AT&T, NTT, and several European operators have deployed ONOS-controlled optical networks in production, with the important caveat that those deployments typically involve one or two vendor platforms, not arbitrary multi-vendor mixing.
|
|
||||||
|
|
||||||
The IETF Transport API (TAPI) provides a higher-level abstraction for topology and service provisioning on top of the device-level OpenConfig models. TAPI is where real multi-vendor automation lives, and it's where the most active standards development is happening. Operatr and similar open-source tooling has made TAPI more accessible for operators building northbound OSS integration.
|
|
||||||
|
|
||||||
## The Proprietary Wall That Still Holds
|
|
||||||
|
|
||||||
Despite a decade of disaggregation progress, several areas remain proprietary or de-facto single-vendor in 2026.
|
|
||||||
|
|
||||||
Optical performance monitoring at the level needed for margin prediction is still mostly proprietary. Measuring OSNR, CD (chromatic dispersion), PMD (polarization mode dispersion), and Q-factor across a multi-span optical link requires access to DSP internals of the coherent transceiver. OpenZR+ defines a management interface but the actual impairment measurement detail varies by vendor DSP implementation. If you're building an automated margin-based routing system, you're relying on vendor-specific telemetry in most cases.
|
|
||||||
|
|
||||||
FEC performance monitoring data is partially standardized (the G.709 OTN management model covers some of it) but the per-vendor FEC algorithm performance curves are not public. A link that's "above threshold" per the open API may have 3 dB of actual margin or 0.1 dB, depending on which vendor's DSP is in the transponder. This makes cross-vendor OSNR margin comparisons difficult in automated systems.
|
|
||||||
|
|
||||||
Amplifier gain tilt optimization — adjusting gain profiles across the C-band to equalize power across WDM channels — is handled differently by every amplifier vendor. OpenConfig has models for this, but the closed-loop algorithms that actually flatten the spectrum are proprietary and vendor-specific. You can read and set target values via OpenConfig, but the adaptive behavior that achieves those targets is in proprietary firmware.
|
|
||||||
|
|
||||||
## The Case for Open Line Systems
|
|
||||||
|
|
||||||
Despite the limitations, open line systems make genuine economic sense in specific deployment patterns.
|
|
||||||
|
|
||||||
For metro and regional DCI (distances under 400 km), the 400G ZR pluggable in a router eliminates the transponder shelf entirely. The "open line system" provides the optical amplification and switching that the ZR module needs, but the ZR module itself is now a commodity item available from multiple vendors. The economics are compelling: a QSFP-DD 400ZR module from a compatible vendor costs roughly $3,000–5,000; a comparable proprietary transponder card is $15,000–25,000. The OLS amplifier cost is similar either way. Multiplied across a 40-wavelength metro network, the savings justify the integration effort.
|
|
||||||
|
|
||||||
For operators with multi-vendor transport networks — which includes most tier-1 operators running networks acquired through mergers — a common control plane via OpenConfig/TAPI provides operational benefits even if it doesn't achieve full technical interoperability. Being able to view all optical network elements in one NMS topology regardless of vendor reduces MTTR significantly.
|
|
||||||
|
|
||||||
For research and education networks (R&E networks), where operational complexity tolerance is higher and wavelength counts are modest, full disaggregation has been successfully implemented. GÉANT, Internet2, and several national R&E networks run disaggregated optical with multi-vendor hardware. These deployments are valid proof points, but they also have teams that tolerate a level of platform-specific tuning that most commercial operators don't.
|
|
||||||
|
|
||||||
## The Honest Summary
|
|
||||||
|
|
||||||
Open line systems are real, deployed at scale by tier-1 operators, and continue to grow as a percentage of new optical deployments. The economics of 400G ZR pluggables versus proprietary transponders are compelling enough that the adoption curve has accelerated past the "early adopter" phase.
|
|
||||||
|
|
||||||
The proprietary wall hasn't fallen — it's moved. Vertical integration still exists at the amplifier and monitoring level. OpenConfig gets you to 70–80% of what you need for automated optical control. The last 20–30% — adaptive margin management, optimal gain tilt, per-channel impairment prediction — still requires either vendor-specific tools or a team willing to build the integration layer themselves.
|
|
||||||
|
|
||||||
The honest recommendation for operators evaluating disaggregation in 2026: build your business case around the ZR/ZR+ pluggable economics, which are solidly favorable. Plan for proprietary integration requirements at the amplifier management level, and evaluate whether your operational team has the capacity to manage multi-vendor optical layer complexity versus the cost savings from avoiding vertical integration lock-in.
|
|
||||||
@ -1,60 +0,0 @@
|
|||||||
---
|
|
||||||
title: "Fibre Channel vs. Ethernet SFPs: Why They're Not Interchangeable"
|
|
||||||
slug: "fcoe-fibre-channel-sfp-differences"
|
|
||||||
type: deep-dive
|
|
||||||
category: "Storage Networking"
|
|
||||||
tags: [Fibre-Channel, SFP, storage-networking, 32G-FC, dual-rate, FCoE, SAN-optics]
|
|
||||||
seo_focus_keyword: "Fibre Channel SFP vs Ethernet SFP"
|
|
||||||
---
|
|
||||||
|
|
||||||
Fibre Channel and Ethernet transceivers look identical. The physical connector is the same SFP or SFP+ housing, the fiber interface is the same LC duplex connector, and the form factor dimensions are interchangeable. Put an 8G Fibre Channel SFP next to an 8G Ethernet SFP and you'd need to read the label to tell them apart. Plug one into the wrong port type, however, and nothing will work — and the error messages won't necessarily tell you why.
|
|
||||||
|
|
||||||
## The Encoding Difference That Explains Everything
|
|
||||||
|
|
||||||
The fundamental difference between Fibre Channel and Ethernet at equivalent bit rates is the line encoding and the resulting actual baud rate.
|
|
||||||
|
|
||||||
Ethernet at 10 Gbps uses 64B/66B encoding, which carries 10 Gbps of user data at a line rate of 10.3125 Gbps (the 64/66 overhead is about 3%). The SerDes in a 10GbE SFP+ runs at 10.3125 Gbaud.
|
|
||||||
|
|
||||||
10G Fibre Channel (10GFC) also uses 64B/66B encoding, but it's defined to operate at exactly 10.51875 Gbps line rate. The nominal bit rate for 10GFC is 10 Gbps, but the actual line rate differs from 10GbE by roughly 2%.
|
|
||||||
|
|
||||||
For lower-speed Fibre Channel — 4G, 8G, 16G — the encoding is 8B/10B, not 64B/66B. 8G Fibre Channel uses 8B/10B encoding, which carries 8 Gbps of data at a line rate of 10 Gbps (8B/10B has 25% overhead). The baud rate for 8G FC is therefore 10.51875 Gbaud. By coincidence, this is the same line rate as 10GbE if you round to the nearest hundred megabits — but the encoding is completely different, which means the SFP's SERDES must be configured for the correct encoding.
|
|
||||||
|
|
||||||
16G Fibre Channel uses 64B/66B encoding (Fibre Channel moved from 8B/10B to 64B/66B at 16G), running at a line rate of 14.025 Gbps. 32G Fibre Channel also uses 64B/66B at 28.05 Gbps line rate. 64G Fibre Channel uses 256B/257B encoding at 57.8 Gbps.
|
|
||||||
|
|
||||||
The SFP+ in an Ethernet switch has SerDes configured for 10.3125 Gbps with 64B/66B framing. The SFP+ in a Fibre Channel HBA has SerDes configured for 10.51875 Gbps with either 8B/10B or 64B/66B framing. The clock rates are different. Plugging an FC SFP into an Ethernet port leaves the SerDes trying to lock to a signal at the wrong clock rate with the wrong encoding — it won't train, and you'll get no link.
|
|
||||||
|
|
||||||
## The 8G/16G/32G FC Hierarchy
|
|
||||||
|
|
||||||
Fibre Channel has a clean speed hierarchy: 1G, 2G, 4G, 8G, 16G, 32G, 64G, 128G. Each generation doubles the bandwidth. The transceiver types for the dominant data center speeds:
|
|
||||||
|
|
||||||
8G FC uses OM3 or OM4 multimode fiber with a 850 nm VCSEL, supporting distances up to 50 m on OM3 and 150 m on OM4. The SFP+ form factor is standard.
|
|
||||||
|
|
||||||
16G FC uses OM3 or OM4 multimode, 850 nm VCSEL, up to 100 m on OM4. Also SFP+ form factor. The line rate is 14.025 Gbps, faster than 10GbE.
|
|
||||||
|
|
||||||
32G FC uses OM4 multimode (25–100 m), 850 nm VCSEL, and also supports single-mode fiber with 1310 nm DFB for longer distances (up to 10 km for 32G SFP+ LW). SFP+ or SFP28 depending on vendor, both physically compatible with SFP+ cages.
|
|
||||||
|
|
||||||
64G FC and 128G FC use QSFP28 and QSFP-DD form factors, with multiple optical lanes. These are relatively new and primarily found in cutting-edge storage arrays and directors.
|
|
||||||
|
|
||||||
The key practical point: 32G FC SFP28 modules will not work in SFP+ ports on older HBAs or switches that don't support 32G, even if the connectors fit. The speed negotiation on FC is not automatically downward-compatible in the same way Ethernet auto-negotiation works.
|
|
||||||
|
|
||||||
## What "Dual-Rate" Means and Its Limitations
|
|
||||||
|
|
||||||
"Dual-rate" FC transceivers are programmable modules that can operate at two different FC speeds — typically 8G/16G or 16G/32G. The transceiver uses a configurable SerDes that can be switched between the two baud rates by the host system via the SFP management interface (I2C commands to the transceiver EEPROM).
|
|
||||||
|
|
||||||
Dual-rate transceivers are useful for infrastructure upgrades: you can deploy 16G/32G dual-rate modules in a fabric that's currently 16G, then upgrade the HBAs and switches to 32G and switch the optics to 32G mode without replacing any hardware. This reduces upgrade costs in large SAN environments.
|
|
||||||
|
|
||||||
The limitations: dual-rate doesn't mean any-rate. A 16G/32G dual-rate SFP28 cannot run at 8G. The SERDES clock configurations supported are specifically those designed into the transceiver. Some vendors offer 8G/16G/32G "tri-rate" transceivers, but these are specialized products, not catalog items.
|
|
||||||
|
|
||||||
Dual-rate FC transceivers also cannot switch protocols. A dual-rate FC module cannot operate in an Ethernet port regardless of its rate support, because the fundamental encoding and framing difference remains.
|
|
||||||
|
|
||||||
## Storage Network Optic Selection in Practice
|
|
||||||
|
|
||||||
For a SAN deployment, optic selection follows straightforward rules. Match the optic speed to the HBA and switch fabric speed — there's no benefit to mismatching. Use multimode fiber for all in-datacenter runs (the distances are short, multimode is cheap and flexible). Only use single-mode if you need to extend beyond 100 m, which typically means inter-datacenter or inter-building SAN extensions.
|
|
||||||
|
|
||||||
Use vendor-qualified optics from your HBA and FC switch vendors when you need TAC support. Broadcom (Emulex), Marvell (QLogic), and Brocade/Broadcom FC switch platforms all publish qualified transceiver lists. The compatible transceiver market for FC exists but is smaller than for Ethernet, and the qualification testing is less extensive because the use cases are more specialized.
|
|
||||||
|
|
||||||
FCoE (Fibre Channel over Ethernet) introduces a different set of tradeoffs. FCoE encapsulates FC frames in Ethernet frames, allowing FC traffic to run on lossless Ethernet infrastructure using DCB (Data Center Bridging). FCoE uses standard Ethernet SFP+ or QSFP+ transceivers — not FC transceivers — because the electrical interface to the CNA (Converged Network Adapter) is Ethernet, even though the traffic is FC.
|
|
||||||
|
|
||||||
FCoE never achieved the market adoption that was predicted around 2010. The complexity of implementing lossless Ethernet (Priority Flow Control, ETS) combined with the management complexity of a converged storage/networking fabric did not deliver the promised cost savings over separate Ethernet and FC networks. Most new SAN deployments in 2026 use either iSCSI (straightforward, uses standard Ethernet optics) or native FC (uses FC optics as described). FCoE exists in production but is not the growth technology.
|
|
||||||
|
|
||||||
The takeaway for engineers who encounter both storage and networking: the reason your 8G FC SFP doesn't work in a 10GbE switch is not a vendor lock-in conspiracy. It's the baud rate and encoding mismatch that makes the two incompatible at the SerDes level. Understanding this prevents a class of troubleshooting sessions that start with "but they're both SFP+ modules" and end 45 minutes later.
|
|
||||||
@ -1,68 +0,0 @@
|
|||||||
---
|
|
||||||
title: "Transceiver RMA Done Right: The Process That Saves Arguments"
|
|
||||||
slug: "transceiver-rma-process-best-practices"
|
|
||||||
type: guide
|
|
||||||
category: "Operations"
|
|
||||||
tags: [RMA, transceiver-failure, DOA, returns, quality-control, procurement, grey-market]
|
|
||||||
seo_focus_keyword: "transceiver RMA process best practices"
|
|
||||||
---
|
|
||||||
|
|
||||||
The transceiver RMA process is one of those operational workflows that organizations don't think about until they need it — at which point they discover they have no documentation, no baseline data, and no way to distinguish a failed module from one that was damaged during deployment. Getting this right before you need it is straightforwardly valuable. Getting it right after a contentious RMA dispute with a vendor is also possible but less pleasant.
|
|
||||||
|
|
||||||
## What to Collect Before the Call
|
|
||||||
|
|
||||||
The single most common reason RMA claims get rejected or delayed is inadequate documentation. What the vendor needs to process a valid RMA is different from what your network team thinks it needs.
|
|
||||||
|
|
||||||
The vendor needs: the original order number or invoice, the module's serial number (readable from the label, from the EEPROM via `show interface transceiver` or `ethtool -m`, or from the vendor's packaging), the specific failure symptom with a timestamp, and evidence that the failure is in the module rather than the infrastructure.
|
|
||||||
|
|
||||||
The evidence step is what most people skip. "The link was down" is not evidence of a transceiver failure. A link can be down due to a bad patch cord, a dirty connector, a failed remote-end transceiver, a misconfigured port, or the transceiver itself. Before initiating an RMA, document: the DOM values at time of failure (TX bias current, TX output power, RX input power, temperature), the result of inserting the module in a known-good slot on a known-good switch with a known-good patch cord, and whether the replacement module works in the same slot.
|
|
||||||
|
|
||||||
That last test — whether a replacement works in the same slot — is the critical one. If the replacement fails in the same slot, the transceiver was not the problem and you'll be returning a good module and then asking for a second RMA for the replacement. If the replacement works and the original doesn't, you have good evidence of a module failure.
|
|
||||||
|
|
||||||
## DOA vs. Deployment Error: The Distinction
|
|
||||||
|
|
||||||
DOA (Dead on Arrival) modules genuinely fail to function on first use with no observable damage or installation error. Deployment errors are modules that fail because of how they were installed, stored, or used.
|
|
||||||
|
|
||||||
True DOA rate for quality transceivers from established vendors runs 0.1–0.3% of shipped units. If you're seeing DOA rates above 1%, you have either a receiving/storage problem (modules being damaged before installation) or an installation problem (ESD damage, mechanical damage) rather than a vendor quality issue. This matters because different problems need different solutions: a vendor QA issue is an RMA conversation, an installation problem is a training and process conversation.
|
|
||||||
|
|
||||||
Common deployment errors that look like DOA:
|
|
||||||
|
|
||||||
ESD damage during installation, especially in low-humidity environments or with ungrounded technicians. The module initializes, EEPROM responds, but laser output is zero or receiver sensitivity is degraded. The module hasn't failed yet in the "everything stops working" sense, but performance is off-spec. This appears as a DOA if the technician tests the link immediately after installation using a passive check rather than optical power measurement.
|
|
||||||
|
|
||||||
Incorrect seating — the module appears inserted but the electrical contacts aren't fully engaged. Some SFP+ cages require a firm push to latch; others have detents that can make the module feel locked without being fully mated. Symptom: intermittent transceiver detection, `sfpNotPresent` alternating with `sfpPresent` in the event log. Not DOA, just needs to be pushed in correctly.
|
|
||||||
|
|
||||||
Wrong optic for the application — 100G SR4 installed in a port intended for LR4, immediately failing because the 100 m fiber run is actually 1.5 km. Not DOA. Module works perfectly in a short-reach application.
|
|
||||||
|
|
||||||
Contaminated endface on first insertion — the transceiver was new and clean, but the port adapter in the switch was dirty. The insertion pushed contamination onto the transceiver endface. The first measurement shows high insertion loss, which looks like a DOA module but is actually a contamination problem.
|
|
||||||
|
|
||||||
Document the inspection findings before initiating an RMA. If the endface shows contamination or physical damage, take a photograph. This protects both parties: it tells you the failure mechanism, and it prevents a dispute about whether the vendor shipped a contaminated module.
|
|
||||||
|
|
||||||
## Why Grey-Market Returns Are a Problem
|
|
||||||
|
|
||||||
Returning a failed module to a grey-market vendor — a reseller without a formal relationship with the original manufacturer — creates a specific set of risks that aren't present with returns to the original vendor or a first-tier compatible vendor.
|
|
||||||
|
|
||||||
Traceability ends. A grey-market vendor processing an RMA return cannot trace the module to original manufacturing records, cannot perform a root-cause analysis against manufacturing parameters, and cannot improve future production based on field failure data. The module goes into a pool of returned units, gets tested with a basic pass/fail bench test, and either gets re-refurbished and resold or scrapped.
|
|
||||||
|
|
||||||
The re-refurbished module risk is significant. A module that failed due to latent ESD damage may pass a basic bench test after the ESD-damaged circuits have partially recovered, get cleaned and repackaged, and ship to the next customer — where it fails again under field operating conditions. This is not speculation; it's a documented failure pattern in the grey-market transceiver supply chain.
|
|
||||||
|
|
||||||
For modules from reputable first-tier compatible vendors (those with ISO 9001-certified manufacturing, published MTBF data, and factory refurbishment programs), the RMA process includes actual failure analysis, not just pass/fail testing. The manufacturer can identify whether a returned module was damaged post-shipment (voiding the warranty) or failed in manufacturing (triggering quality improvement actions).
|
|
||||||
|
|
||||||
## The Inspection Checklist
|
|
||||||
|
|
||||||
Before any RMA submission, document the following. This list is not bureaucratic — each item answers a question that the vendor will ask:
|
|
||||||
|
|
||||||
Module serial number and part number: confirm these match what was ordered and match the EEPROM data. Mismatch here indicates potential mislabeling at shipping or an EEPROM reprogramming issue.
|
|
||||||
|
|
||||||
Physical condition: any visible damage to the housing, bail latch, connector ferrule, or electrical contacts. Photograph any damage. Bent contacts or a cracked ferrule are deployment damage, not manufacturing defects.
|
|
||||||
|
|
||||||
Connector endface condition: inspect with a fiber microscope (≥200x). Photograph the result. Note whether contamination is present and characterize it (scratch, particle, smear). This is the most important physical inspection step.
|
|
||||||
|
|
||||||
DOM data at time of failure: TX bias current, TX output power, RX input power, temperature, and voltage. Pull this from the NOS logs if available. If not available because the module completely failed to respond, note that.
|
|
||||||
|
|
||||||
Operating history: how long was the module in service? How many mating cycles approximately? Was it in a high-temperature environment? Was it a port in a frequently-accessed patch area?
|
|
||||||
|
|
||||||
Replacement test result: did a replacement module in the same slot work? Did the original module fail in a different slot?
|
|
||||||
|
|
||||||
This documentation takes 30–45 minutes to compile for a single module. For a bulk RMA (10+ modules), it's 3–4 hours of work. That investment is worth it: it prevents rejected claims, speeds up resolution, and builds the data you need to identify systemic problems versus isolated failures.
|
|
||||||
|
|
||||||
The vendors who process RMAs fastest and most fairly are the ones who get the most useful data from their customers. The process serves both parties.
|
|
||||||
@ -1,52 +0,0 @@
|
|||||||
---
|
|
||||||
title: "Coherent DSP Power Consumption Reality: What 400G ZR Does to Your Switch"
|
|
||||||
slug: "coherent-dsp-power-consumption"
|
|
||||||
type: analysis
|
|
||||||
category: "Coherent Optics"
|
|
||||||
tags: [coherent, DSP, power-consumption, 400G-ZR, CFP2-DCO, QSFP-DD, thermal, ZR-plus]
|
|
||||||
seo_focus_keyword: "coherent DSP power consumption 400G ZR"
|
|
||||||
---
|
|
||||||
|
|
||||||
The first time someone inserts a 400G ZR QSFP-DD module into a router and checks the chassis power draw, the reaction is typically surprise. The module draws 15–20W. The 100G SR4 QSFP28 it's replacing drew 3.5W. That's a 4–5x increase in power per port, and in a 32-port switch, the implications for cooling, power infrastructure, and total cost of ownership are significant enough to require explicit planning.
|
|
||||||
|
|
||||||
## Why Coherent DSP Consumes So Much Power
|
|
||||||
|
|
||||||
A 400G ZR pluggable contains a complete coherent transceiver in a QSFP-DD housing. The coherent DSP (Digital Signal Processor) handles: advanced modulation (16QAM or higher), soft-decision FEC with overhead of roughly 15–27%, chromatic dispersion compensation across hundreds of km of accumulated fiber dispersion, polarization mode dispersion tracking, and nonlinear impairment compensation. This is genuinely massive computational work happening in real time.
|
|
||||||
|
|
||||||
The DSP chip in a 400G ZR module — implementations from companies like Acacia (Cisco), Lumentum, and Coherent (formerly II-VI) — runs at roughly 7 nm or 5 nm CMOS process nodes to keep power manageable. Even so, the DSP alone typically consumes 8–12W. The coherent optical engine — tunable laser, modulator, coherent receiver — adds another 6–8W. Total: 14–20W depending on implementation quality and operating margins.
|
|
||||||
|
|
||||||
Compare to 400G SR4 (short-reach, intensity modulation): four VCSELs at roughly 200 mW each = 0.8W for transmitters, four photodiodes and TIAs at roughly 0.5W each = 2W for receivers, plus CDR/equalization DSP at roughly 1W. Total: 3.5–4W. The coherent module is burning 4–5x more power to achieve something fundamentally different — not just 400 Gbps in a box, but 400 Gbps across 1000+ km of unrepeatered fiber.
|
|
||||||
|
|
||||||
The ZR+ variants (OpenZR+, vendor-specific ZR+ implementations targeting up to 2,000 km reach or 800G capacity) push power consumption higher still — 18–22W is typical for current-generation 400G ZR+ implementations. Higher modulation order and additional reach margin require more DSP computation.
|
|
||||||
|
|
||||||
## Switch Port Density Implications
|
|
||||||
|
|
||||||
A Cisco Nexus 9364C-GX has 64 QSFP-DD ports. Configured for 400G SR4, the optics contribution to switch thermal load is approximately 64 × 3.5W = 224W. Configured with a mix of 400G ZR pluggables, the optics thermal load jumps to 64 × 17W = 1,088W — nearly a kilowatt of additional heat in the same 2RU chassis, beyond what the switch ASIC itself generates.
|
|
||||||
|
|
||||||
Most datacenter switches are designed with a thermal budget that assumes pluggable optics at 3–5W per port. The switch chassis airflow, heat sink design, and power supply rating are based on that assumption. Fully populating a switch with high-power coherent pluggables can exceed the designed thermal envelope for the chassis.
|
|
||||||
|
|
||||||
Vendors have responded in different ways. Cisco Nexus 9300-GX2 explicitly supports high-power QSFP-DD up to 20W per port in all 64 slots, with enhanced fan speed control and a revised thermal design. Arista 7800R3 series supports per-port power class negotiation via CMIS 4.0, which allows the switch to negotiate with the module about its power consumption before allowing high-power operation. A module requesting Class 8 (20W) in a switch that can only allocate Class 6 (12W) to that slot will either operate in a reduced-performance mode or fail to come up.
|
|
||||||
|
|
||||||
The practical recommendation: if you're planning to deploy coherent ZR pluggables in an existing switch fleet, check the per-port and per-system power budget explicitly against the chassis specification. Don't assume that because QSFP-DD is physically compatible, the thermal design supports high-power coherent modules.
|
|
||||||
|
|
||||||
## CFP2-DCO vs. QSFP-DD-ZR: The Right Tool
|
|
||||||
|
|
||||||
Before QSFP-DD 400G ZR became viable, the standard coherent pluggable form factor was CFP2-DCO. CFP2 is larger (approximately 86 × 39 × 9.5 mm vs. QSFP-DD's 18 × 8.5 × 12 mm) and supports higher power — up to 28W or more. CFP2-DCO modules achieve longer reach and higher baud rates precisely because they have more thermal headroom.
|
|
||||||
|
|
||||||
For ultra-long-haul applications — trans-continental or submarine links, metro rings with total span loss above 25 dB — CFP2-DCO remains the appropriate form factor. The Nokia PSI-M and Ciena WaveLogic 5e both offer CFP2-DCO options for these use cases. The QSFP-DD physical constraints limit the DSP design space, and for the most demanding optical paths, that matters.
|
|
||||||
|
|
||||||
For DCI at metro distances (50–500 km), QSFP-DD 400G ZR is now the standard choice. The reach is sufficient, the form factor allows standard router integration without separate transponder hardware, and the economics are compelling. A Flexoptix or Acacia 400G ZR QSFP-DD module is $3,000–5,000 versus $15,000–25,000 for a proprietary coherent transponder card achieving similar performance. On a network with 40 wavelengths, that's $480,000–$800,000 in hardware savings.
|
|
||||||
|
|
||||||
The open ZR standard (IEEE 802.3ct for 400ZR, OpenZR+ MSA for enhanced reach) has created genuine multi-vendor interoperability for the first time in coherent optical transport. Two 400ZR modules from different vendors should interoperate without vendor-specific configuration — this has been demonstrated at plugfests and is deployed in production. The ZR+ extended profiles are less well standardized and may require vendor-matching for the extended-reach variants.
|
|
||||||
|
|
||||||
## The Case for Line-Side Amplification
|
|
||||||
|
|
||||||
One approach to reducing coherent pluggable count and therefore managing power consumption at the system level: use DWDM line-side amplification to serve more router ports from fewer coherent optics.
|
|
||||||
|
|
||||||
In a 400G ZR design without amplification, each router port needs its own coherent pluggable, and the metro link capacity is limited to a single wavelength per fiber pair. With an OLS (Open Line System) providing DWDM multiplexing and amplification, 40–96 wavelengths share the same fiber pair, and the aggregation routers at each end use far fewer coherent pluggables.
|
|
||||||
|
|
||||||
The tradeoff: DWDM OLS equipment (ROADMs, amplifiers, multiplexers) costs more than dark fiber plus ZR pluggables at low channel counts. The crossover point where DWDM becomes economically favorable is typically around 8–10 wavelengths on the same fiber pair. Below that, P2P ZR pluggables on individual fiber pairs are cheaper. Above that, DWDM equipment pays for itself in fiber cost savings.
|
|
||||||
|
|
||||||
This amplification-versus-pluggable-count calculation also directly affects total power consumption. A 40-wavelength DWDM system using two EDFA amplifiers per site (8W each) consumes 16W for amplification per site, shared across all 40 wavelengths. That's 0.4W per wavelength for the amplification function — compared to 17W for a dedicated ZR pluggable per wavelength in a P2P dark fiber design. The DWDM approach consumes less total power once you're above 8–10 channels.
|
|
||||||
|
|
||||||
Coherent optics power consumption is not a reason to avoid them — the value they deliver in spectral efficiency and reach makes them indispensable for DCI. But the power numbers are real and need to be incorporated into facility planning, not discovered after installation.
|
|
||||||
@ -1,62 +0,0 @@
|
|||||||
---
|
|
||||||
title: "OTDR for Optical Network Engineers: Reading Traces and Knowing the Limits"
|
|
||||||
slug: "fiber-optic-testing-otdr-basics"
|
|
||||||
type: tutorial
|
|
||||||
category: "Fiber Testing"
|
|
||||||
tags: [OTDR, fiber-testing, optical-reflectometer, splice-loss, connector-loss, dead-zone, troubleshooting]
|
|
||||||
seo_focus_keyword: "OTDR fiber optic testing"
|
|
||||||
---
|
|
||||||
|
|
||||||
An OTDR (Optical Time-Domain Reflectometer) is the most powerful tool for characterizing a fiber span, and it's also probably the most commonly misapplied tool in optical networking. Understanding what an OTDR trace actually shows, what the different event types look like, and — critically — what an OTDR cannot tell you prevents a class of expensive false-positive troubleshooting.
|
|
||||||
|
|
||||||
## How an OTDR Works
|
|
||||||
|
|
||||||
An OTDR launches a series of short optical pulses into one end of a fiber and measures the backscattered light returning to that end over time. The physics: as each pulse propagates down the fiber, it interacts with the glass molecules via Rayleigh scattering — a small fraction of the light is scattered backward at every point along the fiber. The OTDR measures this backscatter intensity as a function of time, which translates to distance (using the speed of light in glass, approximately 2×10^8 m/s, adjusted by the fiber's group refractive index — typically 1.4677 for standard SMF at 1550 nm).
|
|
||||||
|
|
||||||
The result is a plot of return loss (dB) versus distance (meters or km). A perfectly uniform fiber shows a steady downward slope — the backscatter level decreases with distance as the pulse attenuates. Events — splices, connectors, bends, breaks — show up as changes in the slope or as discrete reflections.
|
|
||||||
|
|
||||||
The OTDR's time resolution determines its spatial resolution. Modern OTDRs can resolve events separated by 1–5 meters, depending on the pulse width used. Shorter pulses give better spatial resolution but less dynamic range (shorter measurement distance). Longer pulses give more dynamic range at the expense of spatial resolution. You select the pulse width based on the span length: for a 1 km in-building run, use a short pulse (1–10 ns). For a 100 km terrestrial span, use a longer pulse (1–10 µs).
|
|
||||||
|
|
||||||
## Reading the Trace: What the Events Look Like
|
|
||||||
|
|
||||||
A splice (fusion or mechanical) appears as a discrete step downward in the backscatter trace. A good fusion splice with loss below 0.1 dB will show as a very small step, sometimes barely visible against the measurement noise floor. A bad splice at 0.5 dB shows as a clearly visible step. The sign of the loss should always be downward (more return loss at that point). If you see a step upward — apparent gain — at a splice location, this is a measurement artifact called "gainers," caused by the geometric mean of backscatter coefficients differing across the splice. The OTDR cannot directly measure the actual splice loss in this case; you need a bidirectional measurement and average the two readings.
|
|
||||||
|
|
||||||
A connector pair (two mating connectors) shows as a strong reflection spike followed by a step downward. The reflection spike arises because the air gap and endface geometry at a connector interface creates a Fresnel reflection — much larger than the distributed Rayleigh backscatter from a splice. A UPC connector pair reflects approximately -35 to -40 dB (return loss = 35–40 dB), which appears as a large, visible spike on the trace. An APC connector pair reflects approximately -60 dB, which appears as a much smaller spike or may not be visible above the noise floor.
|
|
||||||
|
|
||||||
The step loss associated with a connector pair — the actual insertion loss — is read as the difference between the backscatter level just before the reflection spike and just after. A good connector pair at 0.1–0.2 dB loss shows a modest step. A contaminated connector at 1.0 dB loss shows a much larger step.
|
|
||||||
|
|
||||||
A fiber break or sharp bend shows as an abrupt step down to the noise floor, with or without a reflection spike depending on whether the break is clean (Fresnel reflection present) or diffuse (crushed or tight-bend, no Fresnel reflection). A clean cleave at the far end of the fiber appears as a large Fresnel reflection followed by a rapid drop to noise — this is the normal "end of fiber" signature.
|
|
||||||
|
|
||||||
## The Dead Zone Problem
|
|
||||||
|
|
||||||
The dead zone is the OTDR's most significant practical limitation for short-link testing. After launching each pulse, the OTDR receiver is saturated by the injection signal (and by large Fresnel reflections from nearby connectors). It takes a recovery period — the dead zone — before the receiver can accurately measure the next event.
|
|
||||||
|
|
||||||
The event dead zone is defined as the minimum distance between two events for the second event to be detectable. The attenuation dead zone is the minimum distance from an event to where loss measurements are accurate again. For a typical OTDR with a 10 ns pulse, event dead zone is approximately 1.5–3 m and attenuation dead zone is 10–30 m.
|
|
||||||
|
|
||||||
For in-building runs of 50–200 m, the dead zone means that the first connector directly at the OTDR launch port is invisible. The first 10–30 m of the fiber run cannot be accurately characterized. This is a fundamental limitation: the connector at the OTDR end (the launch connector) is the most important one to characterize, and it's the one the OTDR cannot see.
|
|
||||||
|
|
||||||
The standard workaround is a launch cable (also called a launch reel or dead zone eliminator): a spool of fiber, typically 50–100 m long, inserted between the OTDR and the fiber under test. The launch cable moves the first event (the far end of the launch cable) outside the dead zone, and the connectors at both ends of the launch cable can then be characterized. The attenuation of the launch cable is calibrated out of the measurements.
|
|
||||||
|
|
||||||
## When OTDR Is the Wrong Tool
|
|
||||||
|
|
||||||
OTDR is excellent for: locating fault positions in a span (broken fibers, high-loss splices, damaged connectors), characterizing the loss distribution along a span, verifying splice quality during construction, and accepting a new fiber plant.
|
|
||||||
|
|
||||||
OTDR is the wrong tool for: verifying that a link meets its insertion loss budget for a specific application, characterizing end-to-end performance for transceiver compatibility, and testing short patch cords.
|
|
||||||
|
|
||||||
For verifying that a link will support a transceiver, use an optical power meter and light source (OPM/OLS set). The measurement is simple: connect the light source at one end, the power meter at the other, and read the end-to-end insertion loss at the operating wavelength. This directly tells you whether the link meets the transceiver's loss budget. OTDR tells you where the loss is distributed, but it doesn't give you an accurate end-to-end insertion loss number directly — OTDR measurements are affected by connector orientation, measurement artifacts, and dead zone effects in ways that make them unsuitable for absolute link budget verification.
|
|
||||||
|
|
||||||
For patch cords, OTDR is nearly useless. A 2 m patch cord is entirely within the dead zone. Use an insertion loss meter (ILM) with an appropriate reference cord and mandrel to characterize patch cords.
|
|
||||||
|
|
||||||
## Practical OTDR Use: A Checklist
|
|
||||||
|
|
||||||
Before making an OTDR measurement: clean both connectors — the OTDR port connector and the launch cable connector. A dirty OTDR port connector will produce a large, broad Fresnel reflection at the launch point that masks the first 50–100 m of the measurement.
|
|
||||||
|
|
||||||
Set the measurement wavelength to match the operating wavelength of your transceivers. A span characterized at 1310 nm will show different loss distribution than the same span at 1550 nm, because attenuation and splice behavior differ across wavelengths.
|
|
||||||
|
|
||||||
Set the pulse width and averaging time based on span length. For spans under 5 km, use 100 ns or less. For spans of 10–100 km, use 1–10 µs. More averaging (more pulses averaged) improves noise floor and dynamic range at the cost of measurement time.
|
|
||||||
|
|
||||||
Bidirectional measurement is more accurate than single-direction. The OTDR reads splice losses asymmetrically due to backscatter coefficient differences. Average the readings from both directions for the most accurate per-splice loss values.
|
|
||||||
|
|
||||||
Document baseline measurements during installation or commissioning. A trace taken when the fiber plant was new is invaluable when troubleshooting degradation months or years later — you can directly compare the current trace to the baseline and identify which event has changed.
|
|
||||||
|
|
||||||
OTDR is a diagnostic tool with specific strengths and specific blind spots. Used correctly for the right problems, it's irreplaceable. Used for the wrong problems — particularly verifying transceiver link budgets on short links — it produces misleading data that leads to incorrect conclusions.
|
|
||||||
@ -1,66 +0,0 @@
|
|||||||
---
|
|
||||||
title: "IEEE 802.3 Transceiver Standards Reference: Reading the Spec"
|
|
||||||
slug: "ieee-802.3-standards-transceiver-reference"
|
|
||||||
type: guide
|
|
||||||
category: "Standards & Compatibility"
|
|
||||||
tags: [IEEE-802.3, 400GbE, 802.3bs, 802.3cd, PMD, transceiver-standards, Ethernet-standards]
|
|
||||||
seo_focus_keyword: "IEEE 802.3 transceiver standards"
|
|
||||||
---
|
|
||||||
|
|
||||||
The IEEE 802.3 standard is a vast document — over 5,000 pages in current editions — and the portions relevant to transceiver selection and compatibility are spread across dozens of clauses. Knowing how to navigate it, and specifically what the clause numbers mean for practical optic selection, saves significant time when you're trying to determine whether a specific module actually conforms to the application it's labeled for.
|
|
||||||
|
|
||||||
## How the Standard Is Organized
|
|
||||||
|
|
||||||
IEEE 802.3 is divided into clauses, each addressing a specific topic area. The relevant structure for transceiver engineers:
|
|
||||||
|
|
||||||
Clauses 1–39 cover the foundational MAC layer, CSMA/CD (largely historical), and lower-speed interface definitions. For 1GbE transceivers, Clause 38 (1000BASE-X) and Clause 40 (10GBASE-R) are the relevant sections, though 1G fiber is now managed under 1000BASE-LX, -SX etc. in Clause 38.
|
|
||||||
|
|
||||||
Clause 52 covers 10GBASE-X. Clause 54 covers 10GBASE-W (WAN PHY). Clause 55 is 10GBASE-R (LAN PHY). Clause 57 is the 10GBASE-LRM specification for extended multimode reach.
|
|
||||||
|
|
||||||
For 40G: Clause 86 covers 40GBASE-R (the common prefix), with subclauses for specific PMDs. 40GBASE-SR4 is Clause 86.7. 40GBASE-LR4 is Clause 87.
|
|
||||||
|
|
||||||
For 100G: Clause 91 (100GBASE-R), Clause 95 (100GBASE-CR4), Clause 86 again for some variants. 100GBASE-SR4 is in Clause 95. 100GBASE-LR4 is in Clause 88. 100GBASE-ER4 is also Clause 88 range.
|
|
||||||
|
|
||||||
The important 400G clauses: 802.3bs (Clause 120–121) covers 400GBASE-DR4, -FR8, -LR8. 802.3cd (Clause 136–138) covers 50GBASE-R, 100GBASE-R, and 200GBASE-R PMDs including 400GBASE-DR4+ extensions. 802.3ck (Clause 162) covers 100GBASE-CR1, KR1, and the 400G variants that use 100G per lane SerDes.
|
|
||||||
|
|
||||||
## How to Read a PMD Specification
|
|
||||||
|
|
||||||
Within each clause, the Physical Medium Dependent (PMD) specification defines the transceiver's optical characteristics. Learning to read one of these sections directly answers questions that vendor datasheets often leave ambiguous.
|
|
||||||
|
|
||||||
The PMD spec covers, in order: the normative scope (what the clause applies to), the functional description, the optical specifications in a table, and the test procedures.
|
|
||||||
|
|
||||||
The optical specifications table is the most useful part for transceiver selection. It typically lists:
|
|
||||||
|
|
||||||
**Operating wavelength range**: for a single-mode DFB-based transceiver, this is a narrow range like 1295–1310 nm per lane for LR4. For a VCSEL-based multimode transceiver, it's wider: 840–860 nm for SR4.
|
|
||||||
|
|
||||||
**Transmitter characteristics**: minimum and maximum launch power (in dBm), minimum extinction ratio (in dB), maximum transmitter and dispersion penalty (TDP), and eye mask definition.
|
|
||||||
|
|
||||||
**Receiver characteristics**: minimum receive sensitivity (dBm, typically at BER = 1×10^-12 pre-FEC or 2.4×10^-4 pre-FEC depending on whether the spec uses FEC), maximum input power (the saturation point), and maximum stressed receiver sensitivity.
|
|
||||||
|
|
||||||
**Channel insertion loss budget**: the maximum total loss between the transmitter and receiver, which defines the reach when combined with the fiber attenuation per km and connector budget.
|
|
||||||
|
|
||||||
The extinction ratio specification is worth understanding explicitly. Extinction ratio is the ratio of optical power representing a "1" to optical power representing a "0," expressed in dB. Higher extinction ratio means the laser turns off more completely for a "0," which improves receiver sensitivity. The IEEE specs set a minimum extinction ratio — typically 3 dB for NRZ and 3 dB per eye level for PAM4. Transceivers running below minimum extinction ratio will show higher BER even with adequate received power.
|
|
||||||
|
|
||||||
## Why 802.3bs and 802.3cd Matter
|
|
||||||
|
|
||||||
Both of these amendments addressed 400G, but from different architectural angles, and the press coverage at the time underrepresented how significant the underlying differences were.
|
|
||||||
|
|
||||||
802.3bs (approved December 2017) defined 400GbE using 8 optical lanes, each carrying 50G. The PMDs defined: 400GBASE-SR8 (8 lanes over OM4 multimode, 100 m), 400GBASE-DR4 (4 lanes single-mode, 500 m, using 100G per lane via 2x50G PAM4), 400GBASE-FR8 (8 lanes single-mode to 2 km), and 400GBASE-LR8 (8 lanes single-mode to 10 km). The 8-lane approach was chosen to match the first-generation 400G ASIC SerDes at 56G PAM4, with two SerDes lanes merged optically for the DR4 variant.
|
|
||||||
|
|
||||||
802.3cd (approved December 2018) defined 50GBASE-R, 100GBASE-R, and 200GBASE-R using 1, 2, and 4 optical lanes respectively. This amendment introduced the single-lane 100G interface (100GBASE-DR, 100GBASE-KR, 100GBASE-CR) that became the building block for 400G using 4 lanes at 100G each. 400GBASE-DR4 using Clause 136 is the 400G interface that maps cleanly to 4x100G ASIC SerDes — which became standard in Tomahawk 4 and subsequent silicon.
|
|
||||||
|
|
||||||
The practical implication of this two-clause architecture: "400G DR4" technically refers to the Clause 120/121 variant (802.3bs) running 4x100G, and it's one of the most important and widely-deployed 400G interface types. Verifying that a specific transceiver conforms to the correct clause — especially when ordering from vendors whose datasheets say "400GBASE-DR4 compliant" without specifying which version — matters for interoperability with specific ASIC implementations.
|
|
||||||
|
|
||||||
## Where to Find the Actual Specifications
|
|
||||||
|
|
||||||
IEEE 802.3 is a paid standard — the current edition costs $435 from IEEE. However:
|
|
||||||
|
|
||||||
IEEE makes draft versions of amendments available for free during the balloting period, and some amendments remain freely accessible after approval. Search the IEEE Get standard page for specific amendment numbers.
|
|
||||||
|
|
||||||
The SFF Committee (now part of the SNIA) publishes companion technical specifications (SFF-8024, SFF-8436, CMIS) that reference IEEE 802.3 clauses and add implementation detail for module manufacturers. These are freely downloadable from snia.org.
|
|
||||||
|
|
||||||
The MSAs (Multi-Source Agreements) for specific form factors — QSFP-DD MSA, OSFP MSA, SFP-DD MSA — incorporate the relevant IEEE 802.3 PMD requirements by reference and add mechanical and electrical interface specifications. These are also freely available from the respective MSA websites.
|
|
||||||
|
|
||||||
For most practical transceiver selection questions, the combination of the IEEE 802.3 PMD table (for optical specs), the SFF specification (for EEPROM fields), and the MSA (for form factor details) covers everything you need. The full 5,000-page standard is useful for deep interoperability questions and for understanding the test procedures used in compliance qualification.
|
|
||||||
|
|
||||||
The IEEE 802.3 standard is not reading-for-pleasure material, but knowing the clause structure and what each table contains transforms it from an intimidating wall of text into a reference tool that directly answers questions about whether a transceiver will work in your application.
|
|
||||||
@ -1,56 +0,0 @@
|
|||||||
---
|
|
||||||
title: "Optics for AI/ML Inference Clusters: What Actually Works and Why"
|
|
||||||
slug: "ai-inference-cluster-optics-requirements"
|
|
||||||
type: guide
|
|
||||||
category: "AI & HPC Networking"
|
|
||||||
tags: [AI-networking, GPU-cluster, 400G-SR4, InfiniBand, 800G, spine-leaf, inference, training]
|
|
||||||
seo_focus_keyword: "AI inference cluster optics networking"
|
|
||||||
---
|
|
||||||
|
|
||||||
AI infrastructure has driven more high-speed optics adoption in the last three years than any other market segment. The optics requirements for GPU clusters are specific, driven by the density and traffic patterns of accelerator hardware, and differ meaningfully from general datacenter networking. Engineers who understand these requirements can avoid over-engineering some links while recognizing where spending more on connectivity is justified.
|
|
||||||
|
|
||||||
## The Standard Stack: Why 400G SR4 at the ToR
|
|
||||||
|
|
||||||
For GPU-to-ToR (Top-of-Rack) connectivity, 400G SR4 over OM4 multimode fiber has emerged as the near-universal choice in 2025–2026 deployments, and the reasons are worth stating explicitly rather than accepting as given.
|
|
||||||
|
|
||||||
GPU servers connecting to the network use either NVIDIA ConnectX-7, ConnectX-8 (for InfiniBand/Ethernet dual-mode), or Broadcom Thor-2 NICs. The NICs use QSFP-DD or OSFP host connectors, and at the 400G generation, 400G SR4 covers the ToR-to-server distance in any realistic rack configuration — 1 m to 100 m. A server NIC to the ToR switch is typically under 10 m, comfortably within the 100 m SR4 reach on OM4.
|
|
||||||
|
|
||||||
The cost point for 400G SR4 from compatible vendors has dropped to $150–250 per module in 2025. Given that a 64-GPU training cluster at 400G per GPU requires 128 modules (64 server-side, 64 switch-side), the total NIC-to-ToR optics cost is $20,000–30,000 — a small fraction of the overall cluster cost where each H100 GPU costs $30,000–40,000.
|
|
||||||
|
|
||||||
Active Optical Cables (AOCs) at 400G are an alternative for fixed-length runs: an AOC integrates the transceivers into the cable ends, eliminating the SFP connector interface. AOCs are slightly cheaper than transceiver-plus-passive-cable for the same length, but they're not field-repairable if one end fails. For short in-rack runs to ToR switches in production clusters, the preference has shifted toward passive direct-attach copper (DAC) at 1–3 m (no optical components, lowest latency, lowest cost) and 400G SR4 active optics for runs beyond 3–5 m.
|
|
||||||
|
|
||||||
## When DR4 Makes Sense for Spine
|
|
||||||
|
|
||||||
The spine layer in an AI cluster — the switches connecting ToR switches to each other in the leaf-spine fabric — typically uses single-mode optics because the inter-rack cabling distances exceed OM4 multimode reach.
|
|
||||||
|
|
||||||
400G DR4 (single-mode, 4 lanes at 100G PAM4, up to 500 m on OS2) is the standard spine optic for medium to large clusters. The 500 m reach covers any reasonable datacenter floor layout, including multi-building campus clusters. DR4 uses a parallel single-mode fiber array (PSM4 architecture for the optical interface — 4 transmit and 4 receive fibers in an MPO-12 connector), which means the fiber infrastructure between spine switches uses 8-fiber MPO trunk cables.
|
|
||||||
|
|
||||||
FR4 (single-mode, 4-wavelength CWDM, up to 2 km) is an option for clusters spread across wider geographies — campus interconnects or edge AI deployments where the compute nodes are distributed. FR4 costs roughly 40–60% more than DR4 for the same 400G capacity, so the additional cost needs to be justified by the actual distance requirement.
|
|
||||||
|
|
||||||
For clusters using all-NVLINK (NVSwitch-based all-to-all connectivity for training), the GPU-to-NVSwitch fabric is handled by NVIDIA's proprietary NVLink cables — not standard Ethernet optics. The Ethernet fabric in these configurations handles the "north-south" traffic (storage, user connections, parameter servers) rather than the all-reduce gradient traffic that dominates AI training bandwidth. The optics requirements for the management/external fabric are therefore less demanding than for the training fabric.
|
|
||||||
|
|
||||||
## InfiniBand vs. Ethernet from an Optics Perspective
|
|
||||||
|
|
||||||
The InfiniBand versus Ethernet debate for AI cluster networking involves many considerations — latency, software stack, operational complexity — but from a pure optics perspective, the differences are modest.
|
|
||||||
|
|
||||||
HDR InfiniBand (200G) uses QSFP56 or 2x100G interfaces. 400G HDR200 uses QSFP-DD. The optics for InfiniBand at these speeds are physically identical to Ethernet optics (same form factors, same fiber types, same wavelengths). The distinction is in how they're programmed: an InfiniBand HCA uses the same SR4 optic as an Ethernet NIC, but the EEPROM may declare the module as InfiniBand-protocol-supporting via the media type field in the SFF-8636 extended identifier.
|
|
||||||
|
|
||||||
NDR InfiniBand (400G) and XDR InfiniBand (800G) use OSFP or QSFP-DD form factors. The physical optics market has largely converged for both protocols at these speeds.
|
|
||||||
|
|
||||||
The practical OpEx difference: InfiniBand switches (Mellanox/NVIDIA QM9700, for example) are more restrictive about optic compatibility than Ethernet switches. NVIDIA requires Mellanox-qualified or NVIDIA-tested optics for supported configurations, and the list of approved compatible vendors is shorter than for Ethernet. Engineers planning InfiniBand-based clusters should verify optic compatibility against the specific switch model before procurement.
|
|
||||||
|
|
||||||
## What 800G Changes at the Rack Level
|
|
||||||
|
|
||||||
800G is starting to appear in production AI clusters, primarily in hyperscale training deployments. The transition from 400G to 800G at the ToR level has specific fiber infrastructure implications.
|
|
||||||
|
|
||||||
800G SR8 requires MPO-16 or dual MPO-12 per port, compared to 400G SR4's single MPO-12. In a fully-wired 64-port 800G ToR switch, the fiber count entering the switch increases proportionally. A 400G ToR switch with 64 ports requires 64 MPO-12 fiber connectors; the same chassis running 800G SR8 requires 128 MPO-12 (or 64 MPO-16). This doubles the fiber density at the top of the rack and requires pre-wiring the floor with 16-fiber-per-direction infrastructure rather than 12-fiber-per-direction.
|
|
||||||
|
|
||||||
For clusters being built from scratch in 2026, designing for 800G fiber infrastructure while deploying 400G today is the correct approach. The incremental cost of running 16-fiber-per-direction trunk cables versus 12-fiber-per-direction is modest at installation time, and avoiding a complete re-cabling when upgrading to 800G pays for the upfront investment.
|
|
||||||
|
|
||||||
The GPU NIC side of 800G is also advancing. NVIDIA's B100 and B200 GPU servers use ConnectX-8 NICs at 400G Ethernet per port (two ports per NIC = 800G per GPU), not single 800G ports. The GPU fabric bandwidth is achieved by port bonding rather than single 800G pluggables, which means the current generation of AI servers still maps well to 400G switch ports and 400G SR4 optics.
|
|
||||||
|
|
||||||
## Practical Procurement Guidance
|
|
||||||
|
|
||||||
For AI cluster procurement in 2026, the practical recommendations are straightforward: use 400G SR4 OM4 for all server-to-ToR connections, use 400G DR4 OS2 for ToR-to-spine connections, plan the fiber plant for 16-fiber-per-direction capacity even if deploying 12-fiber-per-direction optics today, and verify InfiniBand optic compatibility against the switch model if using InfiniBand fabric.
|
|
||||||
|
|
||||||
The compatible transceiver market is well-established for 400G SR4 and DR4. Multiple vendors (Innolight, Eoptolink, Coherent, Flexoptix) supply these in large quantities with competitive pricing and good technical documentation. Total optic cost for a 1,000 GPU cluster in a standard leaf-spine architecture is approximately $400,000–600,000 — budget accordingly, and verify pricing before locking into a BOM with only OEM optics.
|
|
||||||
@ -1,50 +0,0 @@
|
|||||||
---
|
|
||||||
title: "How Hyperscalers Buy Optics: A Playbook the Enterprise Will Never See"
|
|
||||||
slug: "hyperscale-optics-purchasing-strategy"
|
|
||||||
type: analysis
|
|
||||||
category: "Market & Procurement"
|
|
||||||
tags: [hyperscale, procurement, compatible transceivers, white-box optics, operator-qualified, vendor-qualified, 400G, CWDM4]
|
|
||||||
seo_focus_keyword: "hyperscale optics procurement"
|
|
||||||
---
|
|
||||||
|
|
||||||
There is a persistent myth in enterprise networking that if you wait long enough, hyperscale pricing will trickle down. The reasoning sounds logical: Google buys millions of 400G QSFP-DD modules, volume drives cost down, and eventually you'll pay something close to that. This is not how it works. The mechanisms that produce hyperscale unit economics are structural, and most of them are simply not available to anyone outside the top five cloud operators. Understanding why requires looking at how the buying actually happens.
|
|
||||||
|
|
||||||
## Qualification Timelines: The Hidden Moat
|
|
||||||
|
|
||||||
When AWS or Microsoft qualifies a new optical transceiver family, the process takes 12 to 18 months and involves a level of engineering scrutiny that most equipment vendors apply only to line cards. A hyperscaler qualification lab will run temperature cycling between -5°C and 70°C across a population of 500 or more units, measure BER at every corner of the operating envelope, validate EEPROM data against internal specifications rather than SFF standards, and run multi-week continuous burn-in at elevated case temperature. The reject rate during qualification can exceed 15%.
|
|
||||||
|
|
||||||
This is not paranoia. When you're deploying 200,000 ports in a single data center build, a 0.1% infant mortality rate means 200 dead transceivers in the first 90 days. That's a maintenance burden with real operational cost. The qualification rigor is economic, not academic.
|
|
||||||
|
|
||||||
The consequence is that hyperscalers maintain short lists of approved vendors that change slowly. II-VI (now Coherent), Innolight, Oclaro (now Lumentum), and Hisense Broadband appear on most of these lists for 100G and 400G. New entrants spend years in evaluation before touching production. This stability keeps prices low because approved vendors can commit to multi-year production forecasts, amortize tooling across guaranteed volume, and run fabs at high utilization rates.
|
|
||||||
|
|
||||||
## Vendor-Qualified vs. Operator-Qualified: A Meaningful Distinction
|
|
||||||
|
|
||||||
The enterprise market operates almost entirely on vendor-qualified optics. Cisco qualifies what works in Cisco gear. Juniper qualifies what works in Juniper gear. The transceiver vendor gets a "Compatible with Cisco Nexus 93360YC-FX2" listing, ships accordingly, and everyone moves on. The equipment OEM holds the qualification authority.
|
|
||||||
|
|
||||||
Hyperscalers have inverted this. They run operator-qualified programs where the cloud operator defines the acceptance specification and the transceiver manufacturer builds to it. Google's internal optical module specification is more detailed than most equipment vendor specs. It covers not just optical performance but mechanical tolerances on the bail latch, the thermal interface material between the module heat spreader and the host cage, and the acceptable variation in EEPROM field formats.
|
|
||||||
|
|
||||||
The practical effect is that hyperscale operators are buying commodity optics to their own spec, not to a vendor's spec. This creates leverage the enterprise buyer simply doesn't have. If an equipment vendor changes the way their NOS validates EEPROM fields, a hyperscaler can push back and demand that the validation logic not break their installed base. An enterprise customer calling their Cisco account manager to complain about a firmware update that rejects third-party optics gets significantly less traction.
|
|
||||||
|
|
||||||
## Volume Commitments and Their Structural Effects
|
|
||||||
|
|
||||||
A midsize hyperscaler deploying a new availability zone might contract for 500,000 to 800,000 400G modules over 18 months. This is not a purchase order; it is a capacity reservation. The transceiver manufacturer allocates wafer starts, reserves assembly line time, and prices the unit accordingly. The manufacturer's overhead is spread across guaranteed volume. Yield loss is predictable. Inventory risk is borne by the buyer, not the seller.
|
|
||||||
|
|
||||||
Contrast this with an enterprise buying 2,000 modules on a project-by-project basis, usually with 6 to 8 weeks lead time expectation and no multi-year commitment. The manufacturer prices this through distribution, adding margin at the transceiver vendor, the distributor, and the VAR. The enterprise unit price can be three to five times the hyperscale unit price for identical hardware at comparable performance specifications.
|
|
||||||
|
|
||||||
The 400G QSFP-DD SR4 module is a useful example. Hyperscale operators pay under $40 per unit at current pricing. Enterprise customers sourcing through Cisco or Arista as vendor-branded optics pay $250 to $400 per port. Compatible transceiver vendors like Flexoptix can close part of that gap — typically delivering validated modules in the $60 to $90 range — but cannot fully replicate hyperscale economics because the volume commitment and qualification overhead structures are different.
|
|
||||||
|
|
||||||
## The Compatible Market's Actual Position
|
|
||||||
|
|
||||||
What the compatible transceiver market captures is not hyperscale pricing. It captures the manufacturing efficiency of high-volume production that has now diffused to second-tier manufacturers. A transceiver built on InnoLight's production line for a hyperscale customer and a transceiver built on a similar line for the compatible market are using comparable component costs and similar assembly processes. The compatible vendor's advantage is eliminating the equipment OEM markup, which can be 300% to 500% on optical modules.
|
|
||||||
|
|
||||||
This is a meaningful advantage, but it exists in a different space than hyperscale procurement. The compatible market serves enterprises, service providers, and telcos that need cost discipline but cannot negotiate operator-qualified programs. The qualification standard shifts from "does it meet our internal spec" to "does it meet the equipment vendor's NOS acceptance criteria" — which is what Flexoptix's compatibility testing actually validates.
|
|
||||||
|
|
||||||
The segment where hyperscale procurement practices most directly benefit the broader market is in driving standardization. CWDM4 MSA for 100G is the clearest example. Hyperscalers were unhappy with the cost trajectory of 100G LR4 using four-wavelength LWDM, which required precise wavelength control and costly DML lasers. They co-authored the CWDM4 MSA in 2014, specifying a simpler approach using four CWDM wavelengths (1271, 1291, 1311, 1331 nm) with relaxed wavelength accuracy requirements. The result was a significant BOM cost reduction that eventually propagated into enterprise pricing for 100G 2km reach modules.
|
|
||||||
|
|
||||||
## Why Hyperscale Pricing Never Reaches Enterprise
|
|
||||||
|
|
||||||
Even when the underlying manufacturing cost converges, the delivery mechanism diverges. Hyperscalers buy from manufacturers directly, absorb logistics, and accept more quality risk in exchange for price. Enterprises buy from distributors, require pre-sales support, need post-sales warranty coverage, and expect the equipment vendor to own compatibility problems. Each of those services has a cost.
|
|
||||||
|
|
||||||
There's also a timing asymmetry. Hyperscalers lock in pricing at early product lifecycle when manufacturer margins are higher but guaranteed volume offsets this. By the time a new generation reaches enterprise catalog pricing, the hyperscaler is already two generations ahead and negotiating the next round. The gap is structural, not temporary.
|
|
||||||
|
|
||||||
The practical upshot for enterprise procurement teams is that chasing hyperscale pricing directly is not a productive exercise. The more useful question is where in the supply chain margin is being added without corresponding value. Equipment vendor optical surcharges are the primary target. The compatible transceiver market exists precisely because those surcharges are large and the underlying technical barrier to qualification is manageable.
|
|
||||||
@ -1,52 +0,0 @@
|
|||||||
---
|
|
||||||
title: "SFP Copper vs. Built-in RJ45: When the Penalty Is Worth Paying"
|
|
||||||
slug: "rj45-vs-sfp-copper-1g-switches"
|
|
||||||
type: deep-dive
|
|
||||||
category: "Transceiver Selection"
|
|
||||||
tags: [SFP, 1000BASE-T, copper SFP, RJ45, switch design, power consumption, ASIC, 1G copper]
|
|
||||||
seo_focus_keyword: "SFP copper 1000BASE-T vs RJ45"
|
|
||||||
---
|
|
||||||
|
|
||||||
The 1000BASE-T SFP — a copper transceiver that fits in an SFP cage and terminates to an RJ45 connector — occupies a peculiar position in the market. It costs more than the switch port it occupies costs to build. It draws more power than a native copper port. It adds complexity to the signal path that wasn't there before. And yet there are real scenarios where using one is the correct engineering decision. The key is being clear about which scenarios those are, because there are also plenty of cases where people reach for a copper SFP out of habit or confusion.
|
|
||||||
|
|
||||||
## What a 1000BASE-T SFP Actually Contains
|
|
||||||
|
|
||||||
A native RJ45 port on a switch integrates a PHY chip — typically a Marvel 88E1111 or similar — directly onto the switch motherboard or linecard. The PHY handles 1000BASE-T encoding, echo cancellation, and auto-negotiation in silicon that's optimized for low power on a mature process node. Total power consumption for a Marvell 88E1111 is in the range of 0.5W per port at 1G.
|
|
||||||
|
|
||||||
An SFP copper module contains its own PHY chip inside the module housing. The signal path becomes: switch ASIC → SFP electrical interface (SGMII or 1000BASE-X over the SFP cage pins) → PHY inside the module → RJ45 connector → cable. You've added a MAC-to-PHY interface and a second piece of silicon. Power consumption for a copper SFP is typically 0.8W to 1.5W per port, and some older designs draw up to 2.5W. The SFF-8431 spec sets the maximum SFP power at 1W, but copper SFPs often qualify under the extended power provisions.
|
|
||||||
|
|
||||||
The cost difference is significant. A native copper port on a 48-port switch adds roughly $2 to $4 to the BOM when built at volume. A copper SFP module, even sourced from a compatible vendor, costs $15 to $40 per port in reasonable quantities. You are paying a 10x premium over the native solution.
|
|
||||||
|
|
||||||
## What Switch ASICs Treat Differently
|
|
||||||
|
|
||||||
This is where the technical picture gets interesting. A Broadcom Trident 4 or Tomahawk 4 ASIC handles all switching, forwarding, and QoS in silicon. The ASIC connects to optical transceivers using SERDES lanes running at speeds from 10G to 112G. When you plug a fiber SFP into an SFP+ port, the ASIC's SERDES talks directly to the transceiver's CDR. Simple.
|
|
||||||
|
|
||||||
When you plug a copper SFP into the same port, the ASIC's SERDES is running at 1.25G (1000BASE-X encoding) and talking to a PHY inside the module. That PHY then runs a completely different physical layer (1000BASE-T with four pairs, PAM-5 encoding, echo cancellation) out to the copper cable. The ASIC itself doesn't "know" it's talking to copper — it sees the same 1000BASE-X signal it would see from any fiber SFP.
|
|
||||||
|
|
||||||
This indirection creates a behavioral difference that matters for two things: auto-negotiation and latency.
|
|
||||||
|
|
||||||
For auto-negotiation, native copper ports run the full 1000BASE-T negotiation handshake on the wire. The PHY on the linecard talks to the PHY on the remote device and they negotiate speed and duplex through a well-defined Clause 28/37 exchange. With a copper SFP, the negotiation visible to the switch ASIC is always 1000BASE-X (or SGMII, depending on implementation), and the PHY inside the module runs a separate 1000BASE-T negotiation on the copper side. These two negotiation states are effectively decoupled. Some implementations handle the decoupling cleanly. Some don't, particularly when you mix copper SFP vendors with specific switch platform firmware versions.
|
|
||||||
|
|
||||||
Latency adds roughly 1 to 2 microseconds compared to a native copper path due to the additional serialization/deserialization stage inside the module. For most applications this is irrelevant. For high-frequency trading connections running over copper — which is the use case that actually drives some copper SFP deployments — it can matter.
|
|
||||||
|
|
||||||
## The Cisco Warning Problem
|
|
||||||
|
|
||||||
On Cisco Catalyst and Nexus platforms, a copper SFP in an SFP+ port will frequently generate a console log along the lines of: "SFP-1000T type is not supported on this port" or "unsupported transceiver." This is a NOS validation check comparing the transceiver's SFP EEPROM identifier byte against a whitelist of supported module types. A copper SFP has a distinct identifier (0x16 for 1000BASE-T) that some platforms handle correctly and some don't.
|
|
||||||
|
|
||||||
The solution is usually not hardware — the port will often pass traffic regardless of the warning. It's a compatibility matrix problem. Cisco's supported media list for a given IOS-XE version and platform SKU determines whether the warning appears. A copper SFP with Cisco-compatible EEPROM programming will suppress the warning. This is a place where EEPROM customization by a compatible vendor makes a real practical difference.
|
|
||||||
|
|
||||||
Juniper's NOS generally handles copper SFPs more gracefully on EX series hardware. The EX2300, EX3400, and EX4300 platforms all have documented support for 1000BASE-T SFPs in their combo SFP ports. Arista's EOS similarly accepts them on combo ports without drama. The problematic cases tend to be older Cisco platforms and any platform where the SFP+ ports were designed with the assumption that they would only see fiber.
|
|
||||||
|
|
||||||
## The Use Cases That Actually Justify the Cost
|
|
||||||
|
|
||||||
The scenario where copper SFP makes clear economic sense is a switch with SFP-only uplink ports that needs to connect to a copper-only device over an existing Cat6 run that you don't want to pull new fiber to. Examples include small switches used at the edge of enterprise wiring closets, aggregation switches in industrial environments where fiber is impractical, and cable head-end equipment where the patch panel infrastructure is copper.
|
|
||||||
|
|
||||||
A second valid scenario is flexibility at the port level. A switch with 24 combo ports (each usable as either SFP or RJ45 native copper) gives you hardware flexibility at no transceiver cost. But a switch with 24 SFP-only ports and no built-in copper gives you the same flexibility via copper SFPs — at the cost of buying the modules. If you're deploying a mix of fiber and copper connections and the switch SKU you want for other reasons happens to be SFP-only, copper SFPs are a reasonable operational solution.
|
|
||||||
|
|
||||||
The third scenario — less common but technically sound — is when you need 1G copper reach beyond 100m. 1000BASE-T max reach over Cat6A is 100m. Some proprietary copper SFPs support extended reach over shorter distances using active electronics, but the standard 1000BASE-T spec doesn't change. If your structured cabling exceeds that, you're looking at fiber regardless.
|
|
||||||
|
|
||||||
## What Not to Do
|
|
||||||
|
|
||||||
Don't use copper SFPs to save money on a switch where native copper ports are available. Don't use them in high-density deployments where the power overhead adds up — 48 copper SFPs versus 48 native copper ports could be a 50W to 100W difference at the port blade level, which is not trivial in a large wiring closet. Don't assume they're plug-and-play across all platforms without checking the compatibility matrix first.
|
|
||||||
|
|
||||||
The copper SFP is a useful tool for specific connectivity problems, not a general-purpose alternative to native copper. The power penalty is real, the cost premium is real, and the compatibility surface area is larger than with fiber SFPs. Used for the right reasons, it solves genuine problems. Used as a default, it adds cost and complexity without justification.
|
|
||||||
@ -1,54 +0,0 @@
|
|||||||
---
|
|
||||||
title: "How Transceiver Standards Get Made: Inside the SFF Committee"
|
|
||||||
slug: "transceiver-sff-committee-history"
|
|
||||||
type: deep-dive
|
|
||||||
category: "Standards & Industry"
|
|
||||||
tags: [SFF Committee, MSA, QSFP-DD, OSFP, standards, IEEE, Finisar, Lumentum, Cisco, 400G form factors]
|
|
||||||
seo_focus_keyword: "SFF Committee transceiver standards MSA"
|
|
||||||
---
|
|
||||||
|
|
||||||
If you've ever wondered why the 400G transceiver market launched with two competing form factors — QSFP-DD and OSFP — and why both became de-facto standards before any IEEE ratification, the answer lies in understanding how transceiver standards actually get made. It's less tidy than the official process documents suggest, and the political dynamics explain a lot of the product decisions you'll encounter when specifying high-speed optics.
|
|
||||||
|
|
||||||
## The SFF Committee: Not IEEE, Not IETF
|
|
||||||
|
|
||||||
The Small Form Factor Committee is an industry working group, not a formal standards body. It operates under SNIA (the Storage Networking Industry Association) as an accreditation umbrella but functions largely through voluntary participation by member companies. Attendance is open to anyone who pays the membership fee, but the organizations that actually shape specifications are the usual suspects: Cisco, Intel, Broadcom, Finisar (now II-VI, now Coherent), Lumentum, Inphi (now Marvell), Acacia (now Cisco), and a handful of others.
|
|
||||||
|
|
||||||
The SFF Committee produces INF documents — specifications with SFF-8xxx numbering. These are multi-source agreements in everything but name, created through a process where member companies draft specifications, circulate drafts for comment, and iterate until enough participants are willing to sign off. The resulting document is not mandatory for anyone. It becomes a market standard only if enough equipment vendors and transceiver manufacturers choose to implement it.
|
|
||||||
|
|
||||||
This is where the distinction from IEEE becomes important. An IEEE 802.3 standard defines the electrical and optical parameters for a technology like 1000BASE-LX or 100GBASE-SR4 in a form that becomes part of the official standards corpus, often referenced by regulatory bodies and procurement specifications. SFF documents define the mechanical and electrical interface of the host cage and connector — the physical form factor — rather than the optical technology itself. You need both: IEEE tells you what the optics must do; SFF tells you what shape the module must be.
|
|
||||||
|
|
||||||
## How MSAs Precede Ratification
|
|
||||||
|
|
||||||
Multi-source agreements (MSAs) are essentially pre-competitive agreements where competing manufacturers agree on a common form factor specification so that their products can interoperate at the host interface level. The QSFP28 MSA, which defined the physical interface for 100G quad small form factor pluggable modules, was signed and published in 2013. IEEE 802.3bm, which standardized 100GBASE-SR4 and 100GBASE-LR4 as the optical interfaces that typically use QSFP28, was ratified in 2015. Equipment manufacturers were designing QSFP28 ports into switching ASICs before the optical standard existed in final form. This is the normal sequence.
|
|
||||||
|
|
||||||
The reason it works this way is industrial pragmatism. Chip design cycles for a 400G ASIC are three to four years. Switch ASICs need to incorporate the physical cage and connector interface before optical standards are finalized, because the form factor decision affects PCB routing, thermal design, and front-panel density. The MSA provides enough specification stability for ASIC tape-out while the optical standards group is still debating dispersion limits.
|
|
||||||
|
|
||||||
The political implication is that whoever controls the MSA drafting process has significant influence over which products succeed in the market. If a large equipment vendor commits to a particular form factor early in the design cycle, it creates a gravitational pull: transceiver manufacturers who want their modules designed into that equipment follow, which creates availability, which makes other equipment vendors more likely to adopt the same form factor.
|
|
||||||
|
|
||||||
## The QSFP-DD vs. OSFP Schism
|
|
||||||
|
|
||||||
The 400G form factor competition is the most visible recent example of how these political dynamics play out. QSFP-DD (Quad Small Form Factor Double Density) was developed by a consortium that included Cisco, Arista, Juniper, and several major transceiver manufacturers. The key selling point was backward compatibility with QSFP28: a QSFP-DD port can accept a QSFP28 module, which meant switch vendors could deploy QSFP-DD ports and maintain a migration path for customers still using 100G.
|
|
||||||
|
|
||||||
OSFP (Octal Small Form Factor Pluggable) was developed by a separate consortium with backing from Mellanox (now NVIDIA), Microsoft, and several European carriers. OSFP is physically larger — the module is taller and slightly deeper than QSFP-DD — which allows more room for optical components and thermal dissipation. The design target was 400G initially but with a cleaner path to 800G and 1.6T, since the larger form factor provides better thermal headroom for higher-power coherent and silicon photonics implementations.
|
|
||||||
|
|
||||||
There are two honest engineering perspectives here. The QSFP-DD camp is correct that backward compatibility has real operational value, particularly for large enterprise and service provider deployments where a mixed 100G/400G environment will persist for years. The OSFP camp is correct that the QSFP-DD form factor is pushing thermal limits at 400G with high-power coherent implementations, and that the larger module envelope makes the next generation of silicon photonics transceivers more tractable.
|
|
||||||
|
|
||||||
Both are now mature MSAs with broad vendor support. QSFP-DD dominates switching platforms. OSFP has established a stronger position in coherent line-system applications and in the hyperscale co-packaged optics transition path. The market split largely along the lines you'd expect given the initial consortium membership.
|
|
||||||
|
|
||||||
## Who's Actually in the Room
|
|
||||||
|
|
||||||
The SFF Committee working sessions — held as in-person meetings several times per year with video participation — include engineers from transceiver manufacturers, equipment OEMs, and hyperscalers. The hyperscalers have become more active participants since the 400G generation, because at their scale, form factor decisions have direct operational implications for data center density and thermal planning.
|
|
||||||
|
|
||||||
Finisar (now part of Coherent) has historically been one of the most active technical contributors, reflecting their position as a component supplier to the entire industry. When Finisar engineers proposed draft specifications, they carried weight because every significant transceiver manufacturer and many equipment vendors were Finisar customers or competitors who needed to understand their roadmap. The II-VI acquisition of Finisar and subsequent merger with Coherent has restructured some of this, as the combined entity now supplies to an even broader base.
|
|
||||||
|
|
||||||
Intel's photonics group participates heavily, particularly on specifications related to silicon photonics integration. Intel's silicon photonics business (originally acquired from Kotura) has driven interest in form factors that accommodate co-packaged optics, which is effectively a post-pluggable architecture where the optical engine is integrated with the ASIC package rather than sitting in a separate cage.
|
|
||||||
|
|
||||||
## Why This Matters for Procurement
|
|
||||||
|
|
||||||
Understanding the standards process explains several practical realities. First, "compliant with SFF-8636" (QSFP28) is a weaker statement than it appears, because the spec has multiple revisions and optional feature sets. A transceiver can be SFF-8636 compliant in ways that still fail NOS compatibility checks on specific platforms if the optional fields aren't implemented correctly.
|
|
||||||
|
|
||||||
Second, the timing gap between MSA publication and IEEE ratification means there are often early-generation modules in the market built to pre-final specifications. This is more common with new high-speed form factors. 100G CWDM4 modules from 2016 may behave differently from 2019 production in ways that matter for specific use cases.
|
|
||||||
|
|
||||||
Third, the political dynamics of the SFF Committee mean that a major equipment vendor can effectively delay or constrain a competing form factor by withholding their host cage specification from the MSA process. This has happened, and it's one reason why the competitive landscape in 400G form factors took several years to clarify.
|
|
||||||
|
|
||||||
The SFF Committee process is imperfect, driven by competitive interests as much as technical merit, and produces standards that are voluntary in adoption. It is also faster and more pragmatic than any formal standards body would allow, and the optical industry's pace of innovation would not be possible with a slower process. The resulting complexity in compatibility matrices is the tax you pay for that speed.
|
|
||||||
@ -1,50 +0,0 @@
|
|||||||
---
|
|
||||||
title: "Metro DWDM: The Case For and Against Going Open"
|
|
||||||
slug: "metro-dwdm-open-vs-proprietary"
|
|
||||||
type: analysis
|
|
||||||
category: "DWDM & Coherent"
|
|
||||||
tags: [metro DWDM, OpenROADM, coherent pluggables, 400G ZR, disaggregation, ROADM, transponder, open line system]
|
|
||||||
seo_focus_keyword: "metro DWDM open vs proprietary OpenROADM"
|
|
||||||
---
|
|
||||||
|
|
||||||
The traditional metro DWDM architecture looks like this: a proprietary ROADM platform from Ciena, Infinera, or Fujitsu handles the optical layer, transponder cards convert between client grey optics and tunable colored wavelengths, and the whole system operates under a single vendor's network management system. It works reliably. It's also expensive, slow to provision, and vendor-locked in ways that become more uncomfortable as network capacity demands accelerate.
|
|
||||||
|
|
||||||
The alternative — disaggregated metro DWDM with coherent pluggable transceivers — has moved from architecture concept to deployable reality over the past three years, driven primarily by the 400G ZR and 400G ZR+ standards. Understanding where the disaggregated model genuinely works and where vendor integration still wins requires being clear about the technical tradeoffs.
|
|
||||||
|
|
||||||
## What 400G ZR Actually Is
|
|
||||||
|
|
||||||
The 400G ZR specification (OIF Implementation Agreement IA-400ZR) defines a single-carrier 400G coherent interface using DP-16QAM modulation, targeting up to 120km on standard G.652 SMF at -21 dBm launch power with no optical amplification. The specification was developed by the Optical Internetworking Forum and published in 2020. Unlike previous coherent interfaces that required 19-inch rack transponder equipment, 400G ZR fits in a QSFP-DD form factor — the same module used for 400G grey optics.
|
|
||||||
|
|
||||||
The implication is significant: a switch with QSFP-DD ports can, in principle, terminate a 400G coherent wavelength directly without a separate transponder shelf. Arista introduced this capability in the 7280R3 and 7800R3 series. Cisco implemented it in Nexus 9000 with appropriate line cards. The router or switch becomes its own transponder.
|
|
||||||
|
|
||||||
400G ZR+ extends this concept with a family of enhanced coherent implementations from vendors including Ciena (WaveLogic 5 Nano), Infinera (ICE-X), and Acacia/Cisco's variants. ZR+ modules typically support adaptive modulation (stepping between DP-16QAM, DP-8QAM, and DP-QPSK) to trade capacity for reach. A 400G ZR+ module might operate at 400G for 120km spans or back down to 200G to traverse a 1000km path. The tradeoff is power consumption — ZR+ QSFP-DD modules run at 15-20W, compared to 3.5W for a grey 400G SR8 — but you're eliminating an entire transponder shelf.
|
|
||||||
|
|
||||||
## The OpenROADM Promise and Delivery
|
|
||||||
|
|
||||||
OpenROADM is an industry initiative, hosted under the Linux Foundation as part of the O-RAN ecosystem, that defines vendor-neutral YANG data models for ROADM configuration and management. The stated goal is to allow operators to mix ROADM hardware from different vendors and manage the whole through a common interface. AT&T has been the primary driver since the initiative started in 2016.
|
|
||||||
|
|
||||||
In practice, OpenROADM has delivered meaningful value in two specific areas: wavelength provisioning automation and multi-vendor NMS integration. The YANG models are detailed enough to enable programmatic control of amplifier gain settings, ROADM port attenuation, and wavelength routing matrices without vendor-specific CLI. Operators who have deployed OpenROADM-compliant ROADMs from vendors including Ciena and Fujitsu report meaningful improvements in provisioning time — from days to hours for wavelength turn-up.
|
|
||||||
|
|
||||||
What OpenROADM has not delivered is true vendor interoperability at the optical layer. The ROADM hardware itself remains vendor-specific. The colorless/directionless/contentionless (CDC) ROADM architecture from Ciena is not interchangeable with Infinera's spatial switching implementation at a physical level. You can manage them with the same north-bound API, but you cannot mix ROADM chassis from different vendors in the same optical span and expect the transponders to be agnostic about what they're sending through.
|
|
||||||
|
|
||||||
The amplifier chain is the critical constraint. Coherent DSPs (like those used in ZR+ modules) perform electronic dispersion compensation and can adapt to impairments in the fiber path, but they need accurate optical power management from the ROADM to function correctly. Different ROADM vendors implement power equalization algorithms differently, and a ZR+ module optimized for a Ciena ROADM chain may not behave identically on an Infinera platform without re-validation.
|
|
||||||
|
|
||||||
## Where Disaggregation Works
|
|
||||||
|
|
||||||
The cleanest case for coherent pluggable disaggregation is the point-to-point metro ring where span lengths are 80km or less, chromatic dispersion is manageable, and the application is capacity expansion on existing dark fiber. A telco or cable operator running a 5-node ring with 40 to 80km between nodes can deploy 400G ZR modules in IP routers and eliminate transponder shelves entirely. The operational model simplifies: the IP layer directly drives the optical layer, reducing the number of devices to manage and monitor.
|
|
||||||
|
|
||||||
This scenario is exactly the use case that has driven actual deployment. At least a dozen Tier 2 and Tier 3 operators in North America and Europe have deployed 400G ZR in this configuration since 2021. The Flexoptix-compatible 400G ZR QSFP-DD portfolio covers this use case at significantly lower cost than single-vendor transponder solutions.
|
|
||||||
|
|
||||||
## Where Vendor Integration Still Wins
|
|
||||||
|
|
||||||
Long-haul and ultra-long-haul are the clearest counter-examples. Spans exceeding 1000km require Raman amplification, careful optical power budget management, and DSP algorithms tuned for the specific chromatic dispersion and polarization mode dispersion characteristics of the fiber plant. These requirements are still best addressed by integrated transponder/ROADM solutions from vendors who have co-engineered the DSP and the line system. Mixing a ZR+ pluggable with a Ciena 6500 line system on a 2000km path is theoretically possible but practically fraught — the DSP operating point assumptions in the pluggable may not match the amplifier gain tilt the ROADM produces.
|
|
||||||
|
|
||||||
High-channel-count metro core networks are another case where integration advantages persist. A 96-channel C-band deployment with high power channels, mixed modulation formats, and tight channel spacing (50GHz or 37.5GHz) benefits from ROADM-integrated optical power control that understands the full channel loading. The open line system model here requires accurate optical modeling of the entire span, which is achievable but requires sophisticated controller software that most operators don't run internally.
|
|
||||||
|
|
||||||
## Lock-in That Remains
|
|
||||||
|
|
||||||
Even in fully disaggregated deployments, some lock-in is unavoidable. The coherent DSP inside a ZR+ module is proprietary — Acacia's AC400, Marvell's Polaris, Broadcom's Orion, and Coherent/Ciena's own implementations each have different performance characteristics, operational interfaces, and tuning parameters. You can swap optical form factors (QSFP-DD to CFP2 to OSFP) more easily than you can swap DSP vendors without performance regression.
|
|
||||||
|
|
||||||
The network management layer also concentrates lock-in. The domain controller that manages a disaggregated optical layer — performing topology discovery, route computation, and optical impairment modeling — is typically proprietary software from a systems integrator or equipment vendor even when the hardware itself is multi-vendor. OpenROADM addresses the south-bound device interface but doesn't solve the optical path computation problem, which requires physics-aware software that carriers typically don't develop themselves.
|
|
||||||
|
|
||||||
The honest assessment is that metro DWDM disaggregation has delivered real value for the use cases it was designed for, reduced costs significantly in point-to-point and simple ring topologies, and created a healthy coherent pluggable market. It has not eliminated the need for integrated vendor solutions where optical span engineering complexity is high. Both architectures will coexist for at least the next decade.
|
|
||||||
@ -1,54 +0,0 @@
|
|||||||
---
|
|
||||||
title: "Optics for 5G Fronthaul and Midhaul: The Bandwidth Math and What It Means"
|
|
||||||
slug: "optics-for-5g-fronthaul-midhaul"
|
|
||||||
type: tutorial
|
|
||||||
category: "5G & Telecom"
|
|
||||||
tags: [5G, fronthaul, midhaul, eCPRI, 25G SR, CRAN, WDM fronthaul, optical latency, 50G, 100G]
|
|
||||||
seo_focus_keyword: "5G fronthaul optics eCPRI 25G SR"
|
|
||||||
---
|
|
||||||
|
|
||||||
The optics question in 5G transport gets treated as a straightforward capacity problem — more antenna bandwidth, more fiber, more ports. The reality is more constrained. Fronthaul in particular imposes latency requirements that eliminate certain transceiver types from consideration regardless of their data rate capability, and the bandwidth math for a realistic 5G NR deployment produces numbers that many network planners underestimate until they're deep into a deployment.
|
|
||||||
|
|
||||||
## The eCPRI Bandwidth Math
|
|
||||||
|
|
||||||
The evolved Common Public Radio Interface (eCPRI) specification defines the fronthaul split between a Remote Radio Unit (RRU/RRH) and a Distributed Unit (DU). The bandwidth requirement per sector depends on carrier bandwidth, numerology (subcarrier spacing), MIMO layers, and the compression scheme used.
|
|
||||||
|
|
||||||
A 5G NR carrier at 100MHz channel bandwidth with 64 antenna ports (64T64R massive MIMO) using eCPRI Option 7-2x compression requires approximately 25 Gbps of fronthaul capacity per carrier per sector. A three-sector gNodeB with two 100MHz carriers per sector needs 150 Gbps of aggregate fronthaul to the DU. This is why 25G SR is the fronthaul default, not 10G — a single 100MHz 64T64R carrier already exceeds 10G uncompressed, and most deployments use multiple carriers.
|
|
||||||
|
|
||||||
The specific math using eCPRI Equation: Required_bps = num_ports × bits_per_sample × sample_rate × IQ_factor × overhead. For 64T64R at 100MHz 5G NR, with 15-bit I and 15-bit Q samples, 30.72 Msps sample rate (3.84 MHz × 8 oversampling), the raw IQ data rate is approximately 59 Gbps. eCPRI Option 7-2x compression targeting 23:1 brings this to around 25 Gbps. With eCPRI overhead and timing messages, 25G links run at around 75% utilization for a single carrier.
|
|
||||||
|
|
||||||
At 26 GHz mmWave or mid-band 5G with multiple carriers stacked, this pushes toward 50G and 100G fronthaul requirements even for a single macro site. This is why Nokia and Ericsson have both specified 25G and 100G fronthaul interfaces on their latest generation RRU products.
|
|
||||||
|
|
||||||
## Why 25G SR Is the Fronthaul Default
|
|
||||||
|
|
||||||
The IEEE 802.3by 25GBASE-SR standard specifies multi-mode fiber operation at 850nm with reach up to 70m on OM3 or 100m on OM4/OM5. For fronthaul this means very short links between street-level cabinets or rooftop equipment and the nearby DU equipment. The 25G SFP28 SR module is the standard choice because: the reach is sufficient for most fronthaul topologies, the module cost is substantially lower than 25G LR or ER, and the power consumption (under 1W for a typical SFP28 SR) is manageable in antenna-side equipment with tight power budgets.
|
|
||||||
|
|
||||||
The critical constraint for fronthaul optics is not bandwidth — it's latency. The 3GPP specification for 5G NR fronthaul (eCPRI) targets a one-way transport latency of 100 microseconds or less for the HARQ process to work correctly. This 100 µs budget covers all sources of delay: propagation delay on the fiber, serialization delay at 25G, and any switching or processing in the transport network. Propagation delay on fiber is approximately 5 µs/km. A 25G serial link has a serialization delay of roughly 0.04 µs per 125-byte frame — negligible at this link rate.
|
|
||||||
|
|
||||||
What this latency constraint rules out is any transceiver type that adds buffering or retiming. WDM-PON and some CWDM aggregation schemes introduce queuing delays that can push the fronthaul latency above the HARQ deadline. For this reason, passive point-to-point fiber or passive WDM (using fixed-wavelength SFP28 modules) is preferred over any active switching layer between RRU and DU.
|
|
||||||
|
|
||||||
## 50G and 100G in Midhaul
|
|
||||||
|
|
||||||
Midhaul connects the DU to the Centralized Unit (CU), which handles RRC and PDCP protocol layers. The midhaul bandwidth requirement aggregates multiple DU sites and is therefore higher in total but more tolerant in latency. 3GPP targets are 10 milliseconds for fronthaul-to-midhaul delay, which opens up more transport options.
|
|
||||||
|
|
||||||
50G QSFP28 SR (IEEE 802.3cd 50GBASE-SR) has emerged as the midhaul interface for medium-aggregation scenarios: 4 to 8 DU sites converging at a CU. The 50G rate provides headroom for the aggregated fronthaul traffic plus signaling overhead. 100G QSFP28 SR4 or CWDM4 handles larger aggregation nodes where 16 to 32 sectors converge.
|
|
||||||
|
|
||||||
For midhaul over longer distances — 10km to 40km between DU aggregation sites and metro CU locations — 25G LR (10km, SMF, 1310nm) and 25G ER (40km, SMF) are widely deployed. The 25G LR SFP28 module draws around 1.5W and is available from compatible vendors at competitive cost. For 100G midhaul over 10km, 100GBASE-LR4 (four-lambda LWDM at 1295-1310nm) is the standard choice.
|
|
||||||
|
|
||||||
## WDM's Role in CRAN
|
|
||||||
|
|
||||||
Centralized RAN (CRAN) architectures that aggregate many RRU sites through passive WDM before reaching the DU pool create specific transceiver selection challenges. Passive CWDM muxes typically support 8 or 18 channels, with channels spaced at 20nm intervals across the O-band and C-band. Each channel uses a fixed-wavelength SFP28 module tuned to its CWDM wavelength.
|
|
||||||
|
|
||||||
The CWDM grid for fronthaul is standardized in ITU-T G.694.2. The commonly used fronthaul window spans 1271nm to 1371nm (O-band), supporting 6 channels at 20nm spacing with insertion loss below 1.5 dB per channel for passive mux/demux. This fits 5G NR fronthaul requirements because O-band chromatic dispersion on G.652 SMF is near zero (≈3.5 ps/nm/km at 1310nm), minimizing dispersion penalty at 25G per channel.
|
|
||||||
|
|
||||||
A typical CWDM fronthaul installation uses a passive 1×6 or 1×8 CWDM mux at the antenna site, fixed-wavelength 25G SFP28 modules (1271nm, 1291nm, 1311nm, 1331nm, 1351nm, 1371nm) at each RRU interface, and a corresponding demux at the DU aggregation point. Each 25G channel carries one sector's fronthaul traffic. Eight CWDM channels on two fibers (one transmit, one receive) support an 8-sector cell site on a single fiber pair.
|
|
||||||
|
|
||||||
The limitation of passive CWDM is fixed channel assignment. If an RRU is moved or reconfigured, the wavelength assignment must be coordinated with the mux port. For dynamic CRAN deployments that expect frequent reconfiguration, tunable DWDM SFP28 modules (typically based on EML or VCSEL designs with thermal tuning) offer wavelength flexibility at higher cost. Tunable 25G DWDM SFP28 modules supporting the full C-band ITU-T 50GHz grid are available from several vendors including ADVA (now Adtran), Lumentum, and compatible suppliers, at roughly 3 to 4 times the price of fixed-wavelength CWDM modules.
|
|
||||||
|
|
||||||
## Transceiver Selection Checklist for 5G Fronthaul
|
|
||||||
|
|
||||||
The practical decision tree for fronthaul optics starts with distance. Under 100m: 25G SR (OM4) or 25G SR (OM3, derated reach). 100m to 500m: 25G BiDi SFP28 (1270/1330nm over single SMF strand, useful where fiber is scarce). 500m to 10km: 25G LR (SMF, 1310nm). Beyond 10km: 25G ER (SMF, 1310nm, class 2 laser safety) or CWDM/DWDM wavelength multiplexed approach.
|
|
||||||
|
|
||||||
For all fronthaul applications, avoid any transceiver that introduces buffering or Forward Error Correction (FEC) with latency overhead. The 25G SR and LR families in the SFP28 form factor meet this requirement. Some 25G modules include Reed-Solomon FEC with latencies below 50ns, which is acceptable. Modules advertising "FEC-enhanced sensitivity" with higher latency FEC codes should be validated against the 100 µs fronthaul budget before deployment.
|
|
||||||
|
|
||||||
The transceiver question in 5G fronthaul has a clear answer for the dominant deployment scenarios, but the answer changes with scale. A single 100MHz carrier sector uses 25G comfortably. Twenty sectors of 100MHz 64T64R mmWave push the midhaul into 100G territory, and the aggregation point needs 400G. Planning the full capacity cascade before specifying transceivers avoids the upgrade cycle problem.
|
|
||||||
@ -1,52 +0,0 @@
|
|||||||
---
|
|
||||||
title: "Wavelength Selective Switches: The Component That Defines Your Metro Ring"
|
|
||||||
slug: "wavelength-selective-switch-wss-explainer"
|
|
||||||
type: deep-dive
|
|
||||||
category: "DWDM & Coherent"
|
|
||||||
tags: [WSS, ROADM, wavelength selective switch, CDC ROADM, flex-grid, colorless directionless contentionless, MEMS, LCoS]
|
|
||||||
seo_focus_keyword: "wavelength selective switch WSS ROADM CDC"
|
|
||||||
---
|
|
||||||
|
|
||||||
The Wavelength Selective Switch is the optical component that makes a modern ROADM function, and understanding its properties — particularly its degree count and switching architecture — is what determines whether a metro ring design will have the flexibility you actually need or the flexibility that sounded good in a vendor presentation. The gap between those two things can be significant.
|
|
||||||
|
|
||||||
## What a WSS Does at the Component Level
|
|
||||||
|
|
||||||
A Wavelength Selective Switch is an optical cross-connect element that can route individual wavelengths independently between its input and output ports. The "1x9 WSS" designation means one common port (typically connected to the fiber line) and nine wavelength ports. The WSS can route any of the 96 C-band channels (at 50GHz spacing) to any of its nine ports, including routing different wavelengths to different output ports simultaneously, and can also perform per-wavelength attenuation.
|
|
||||||
|
|
||||||
The physical implementation is typically either MEMS-based (micro-electromechanical mirrors that steer wavelengths optically) or LCoS-based (Liquid Crystal on Silicon, which uses a diffraction grating and programmable liquid crystal cell array). LCoS implementations dominate in modern ROADM equipment because they support programmable wavelength bandwidth — the "flex-grid" capability — while MEMS approaches are typically fixed to the ITU grid.
|
|
||||||
|
|
||||||
Commercially, Lumentum (which absorbed JDSU and inherited their photonics IP) and II-VI/Coherent supply the majority of WSS modules to ROADM equipment manufacturers. Perle, Finisar/Coherent, and Huawei subsidiary HiSilicon supply the Chinese market. The WSS subsystem is essentially a commodity component that ROADM OEMs (Ciena, Infinera, Nokia, Fujitsu, Huawei) integrate into their platforms. When a Ciena 6500 node has different WSS degree options than a Fujitsu FLASHWAVE, it's usually reflecting different WSS module selections rather than fundamentally different optical architectures.
|
|
||||||
|
|
||||||
## Colorless, Directionless, Contentionless: What Each Actually Means
|
|
||||||
|
|
||||||
These three attributes are frequently listed together as CDC-ROADM but they're independent capabilities that add cost incrementally. It's worth understanding what each one buys operationally.
|
|
||||||
|
|
||||||
Colorless means a local add/drop port can accept or originate any wavelength. Without colorless capability, an add/drop port is fixed to a specific ITU channel, which means a transponder plugged into that port can only transmit on that pre-assigned wavelength. With colorless ports, the transponder's tunable transmitter can be assigned any wavelength in the C-band, and the ROADM routes it appropriately. This is the fundamental requirement for automation-friendly metro deployments and is now standard on any modern ROADM equipment. The WSS provides colorless add/drop by connecting the add/drop modules to the WSS common port side.
|
|
||||||
|
|
||||||
Directionless means a local add/drop port can connect to any line direction without being pre-assigned to a specific fiber pair. In a 4-degree node (where four fiber routes converge), a non-directionless architecture has specific transponder slots pre-cabled to specific degrees. A directionless architecture adds an optical switch fabric between the add/drop modules and the directional WSS ports, allowing any transponder to connect to any degree. This is expensive — the optical switch fabric is additional hardware — but essential for automated wavelength restoration, where a failed route needs to be re-routed through a different degree without physical recabling.
|
|
||||||
|
|
||||||
Contentionless means multiple add/drop ports can be assigned the same wavelength simultaneously. This is the most commonly misunderstood attribute. In a CDC-ROADM without contentionless, only one add/drop port can use wavelength λ1 at a node, even if the ROADM could technically route it from multiple sources. Contentionless capability, implemented through additional WSS stages or coherent multicasting elements, allows multiple transponders at the same node to use the same wavelength on different routes. This matters for high-capacity nodes that are provisioning many 100G or 400G wavelengths toward the same destinations.
|
|
||||||
|
|
||||||
## 1x9 vs. 1x20 WSS Degree
|
|
||||||
|
|
||||||
The "degree" of a WSS describes how many output ports it has. A 1x9 WSS can connect its common port to any of nine wavelength ports; a 1x20 WSS can connect to any of twenty. In ROADM context, the degree determines how many network directions (fiber routes) the node can connect to.
|
|
||||||
|
|
||||||
A 4-degree ROADM node — common in ring topologies — can use 1x9 WSS modules and have plenty of ports to spare. A large mesh node with 8 or 12 network directions requires 1x9 WSS modules deployed in multi-stage configurations or, more commonly, 1x20 WSS modules to achieve the necessary port count without increasing stage count (which adds insertion loss).
|
|
||||||
|
|
||||||
Each additional WSS stage adds approximately 4 to 6 dB of insertion loss. For a metro network where power budgets are often running within 3 to 5 dB of the sensitivity threshold, adding a WSS stage to achieve higher degree count can force a decision between adding an optical amplifier (cost and complexity) or reducing the span lengths (not always possible). This is the practical reason why metro ring topologies are often limited to 4 or 6 degrees even when more fiber routes exist — the optical power budget constraint makes 8+ degree nodes expensive.
|
|
||||||
|
|
||||||
## Flex-Grid: What It Costs in Practice
|
|
||||||
|
|
||||||
Traditional DWDM uses the ITU-T 50GHz fixed grid: 96 channels spaced at exactly 50GHz intervals across the C-band, each 50GHz wide. Flex-grid extends this to variable channel widths: channels can be assigned widths of 12.5GHz, 25GHz, 37.5GHz, 50GHz, 75GHz, 100GHz, or more in multiples of 12.5GHz.
|
|
||||||
|
|
||||||
The motivation for flex-grid is accommodating super-channels — wide coherent signals produced by multi-carrier transponders that span 150GHz or 200GHz. An Infinera ICE6 super-channel might span 750GHz of optical bandwidth. On a fixed 50GHz grid, you can't allocate this efficiently; on a flex-grid system, you allocate exactly the bandwidth needed.
|
|
||||||
|
|
||||||
In practice, flex-grid deployment requires LCoS-based WSS (which is now universal in modern ROADMs), network management software that understands variable spectral assignments, and coherent modems that can operate correctly at non-standard channel spacings. All of these are available from major vendors. The cost overhead is not in the WSS hardware itself but in the planning and management complexity: a flex-grid spectrum assignment database is more complex to manage than a simple 50GHz channel number, and wavelength conflict resolution in dynamic wavelength assignment algorithms becomes harder when channels are variable width.
|
|
||||||
|
|
||||||
## Why Port Count Constrains Your Metro Ring
|
|
||||||
|
|
||||||
The critical operational consequence of WSS port count is that it limits how many circuits you can add/drop at a node simultaneously. A 1x9 WSS with 4 ports used for network degrees (in a 4-degree node) has 5 ports remaining for local add/drop. Each add/drop port can handle one wavelength (or wavelength band, if branching stages are added). With 5 add/drop ports and 96 channels possible across the C-band, you cannot add/drop more than 5 wavelengths at this node unless you cascade additional WSS stages.
|
|
||||||
|
|
||||||
This sounds abstract until you're planning a 400G coherent deployment where a single customer circuit is one wavelength and you have 15 customers to add/drop at the same node. Suddenly the WSS port budget is your primary design constraint, more than fiber capacity or optical power. The upgrade path is a new node design with higher-degree WSS — which typically means replacing the WSS modules, redesigning the optical cabling within the chassis, and repricing the node.
|
|
||||||
|
|
||||||
The WSS degree and port count decisions made in the initial ROADM deployment are difficult to reverse without hardware replacement. This is the constraint that deserves more attention in metro ring planning discussions than it typically receives.
|
|
||||||
@ -1,52 +0,0 @@
|
|||||||
---
|
|
||||||
title: "SFP vs. SFP+: The Backward Compatibility That Isn't Always Compatible"
|
|
||||||
slug: "sfp-sfp-plus-backward-compatibility"
|
|
||||||
type: tutorial
|
|
||||||
category: "Transceiver Selection"
|
|
||||||
tags: [SFP, SFP+, backward compatibility, 1G SFP, 10G SFP+, Cisco, Juniper, auto-negotiation, BiDi SFP, EEPROM]
|
|
||||||
seo_focus_keyword: "SFP SFP+ backward compatibility 1G 10G"
|
|
||||||
---
|
|
||||||
|
|
||||||
The claim that SFP and SFP+ are backward compatible is technically correct at the mechanical and electrical hardware level and functionally misleading in practice. The same physical connector, the same cage dimensions, the same gold-contact pin interface — and yet inserting a 1G SFP module into an SFP+ port on a Cisco Nexus will frequently generate error messages, and the behavior depends on software version as much as hardware. Understanding exactly where the compatibility breaks down, and why, is useful knowledge for anyone managing mixed-speed deployments.
|
|
||||||
|
|
||||||
## What the Electrical Interface Shares
|
|
||||||
|
|
||||||
SFP and SFP+ both use the same 20-pin connector defined in SFF-8432. The mechanical housing is identical; an SFP module will physically lock into an SFP+ cage and vice versa. The management interface — a two-wire I2C bus over pins 4 and 5 — is the same in both standards, which means the host switch's management plane can read the EEPROM contents of any module using the same register map defined in SFF-8472.
|
|
||||||
|
|
||||||
The signaling interface is also similar: both use low-voltage differential signaling (LVDS) for the transmit and receive data lanes. The fundamental SERDES protocol running over those lanes is where the divergence begins. SFP+ was designed to carry 10G NRZ data, which requires a serial data stream at 10.3125 Gbps (including 64b/66b encoding overhead for 10GBASE-SR/LR) or 10.5185 Gbps for OTU2. A 1G SFP module expects 1.25 Gbps (1000BASE-X 8b/10b encoded) or 1.0625 Gbps (Fibre Channel 1GFC).
|
|
||||||
|
|
||||||
A switch ASIC with an SFP+ port has a SERDES lane designed to operate at 10G. Some ASIC designs allow that SERDES lane to run at a reduced rate to accommodate 1G SFP modules. The SFF-8431 specification for SFP+ explicitly states that host hardware "may" support 1G SFP operation in SFP+ slots. "May" is doing significant work in that sentence.
|
|
||||||
|
|
||||||
## The NOS Validation Layer
|
|
||||||
|
|
||||||
Even when the ASIC hardware supports 1G mode, the software determines whether the module is accepted. Modern NOS platforms perform a qualification check on every inserted module by reading the EEPROM type identifier (byte 0 of the A0h register page, the "identifier" field) and comparing against a platform-specific acceptance list. An SFP+ port expecting 10G-class modules has an acceptance list that may or may not include 1G SFP type identifiers.
|
|
||||||
|
|
||||||
On Cisco Nexus platforms, the validation is strict. A 1G SFP in an SFP+ port on a Nexus 93180YC-FX will typically log "unsupported transceiver" and the port may not come up. The resolution requires either using Cisco-branded 1G SFP modules that have been whitelisted, or enabling the "service unsupported-transceiver" global configuration command, which bypasses the EEPROM whitelist check. Without that command, even a perfectly functional 1G SFP from a reputable compatible vendor will be blocked.
|
|
||||||
|
|
||||||
Juniper's EX and QFX platforms take a different approach. EX3400 and EX4300 series explicitly document 1G SFP support in their SFP+ ports, and Junos does not block 1G modules in 10G slots by default. The port autonegotiates to the module's speed. You may see a warning in show chassis hardware about a non-standard configuration, but traffic flows.
|
|
||||||
|
|
||||||
Arista's EOS is generally permissive. The 7280 and 7050 series accept 1G SFPs in SFP+ slots and bring up the port at 1G without special configuration. The interface speed is reported correctly in show interfaces.
|
|
||||||
|
|
||||||
This variability is platform and software version dependent. A Cisco Catalyst 9300 may behave differently from a Nexus 9000 on the same firmware branch. Before deploying 1G SFPs in SFP+ ports at scale, test on the specific platform with the specific software version you're running.
|
|
||||||
|
|
||||||
## The Speed Auto-Negotiation Problem
|
|
||||||
|
|
||||||
Even on platforms that accept 1G SFPs in SFP+ ports, there's a subtle failure mode around auto-negotiation. 1000BASE-T SFP modules (copper SFPs — see the separate article on this topic) perform 1000BASE-T electrical negotiation on the copper side and present a fixed 1000BASE-X signal to the SFP host. Fiber 1G SFPs (1000BASE-SX, 1000BASE-LX) do not perform electrical auto-negotiation; they transmit at 1.25 Gbps continuously and expect the far end to match.
|
|
||||||
|
|
||||||
The problem arises when a 1G fiber SFP is in an SFP+ port and the NOS tries to run speed auto-negotiation on the electrical interface between the ASIC and the module. SFP+ ports configured for 10G do not autoneg on the electrical SERDES — they lock at 10.3125 Gbps. If the port needs to drop to 1.25 Gbps for the SFP, the ASIC must be explicitly told to do this through port configuration ("speed 1000" or equivalent). In some NOS implementations this works cleanly. In others, the port will cycle through link-up/link-down as the ASIC and module fail to agree on a common rate.
|
|
||||||
|
|
||||||
## The BiDi Scenario That Breaks Everything
|
|
||||||
|
|
||||||
The failure scenario that causes the most operational confusion is attempting to use a BiDi SFP pair in an SFP+ port. BiDi (Bidirectional) SFPs run TX and RX over a single fiber using a wavelength-division duplex scheme: typically 1310nm TX / 1490nm RX and 1490nm TX / 1310nm RX for a complementary pair.
|
|
||||||
|
|
||||||
Most BiDi SFP deployments are 1G (1000BASE-BX). When deployed in SFP+ ports, BiDi SFPs face all the standard SFP-in-SFP+ compatibility challenges plus one more: the port speed negotiation must complete correctly before any fiber-layer link establishment can happen. If the port stays at 10G electrical state, the module receives 10G signaling on its SERDES pins and will not initialize correctly, which means no optical output, which means no fiber link, which means no evidence to help diagnose whether the problem is the module, the wavelength pairing, or the port speed.
|
|
||||||
|
|
||||||
This diagnostic opacity is why BiDi SFP deployments in SFP+ ports generate a disproportionate share of support tickets. A systematic check — confirm port speed is forced to 1000 in NOS configuration, confirm the module's EEPROM is accepted (no "unsupported transceiver" messages), confirm the complementary wavelength pair is correctly oriented — is necessary before concluding that the modules or the fiber is faulty.
|
|
||||||
|
|
||||||
## Practical Guidance
|
|
||||||
|
|
||||||
For deliberate 1G deployment in SFP+ ports, the cleanest approach is to use SFP modules that have been tested on the specific platform and NOS version, force port speed to 1000 explicitly in configuration rather than relying on auto-negotiation, and on Cisco platforms, either use Cisco-branded optics or enable service unsupported-transceiver globally.
|
|
||||||
|
|
||||||
The compatible transceiver market complicates this because compatible vendors typically program EEPROM to match a specific equipment vendor's expected field values. A compatible 1G SFP with Cisco-compatible EEPROM programming suppresses the "unsupported transceiver" warning on Cisco platforms even when service unsupported-transceiver is not enabled. This is one of the concrete operational benefits of EEPROM customization: it's not about fooling anyone — the optical performance is the same — it's about clearing the NOS validation hurdle so the port behaves predictably.
|
|
||||||
|
|
||||||
The mechanical interoperability of SFP and SFP+ is real. The operational interoperability depends on hardware generation, NOS policy, EEPROM configuration, and sometimes firmware version. Treating it as fully automatic is optimistic.
|
|
||||||
Some files were not shown because too many files have changed in this diff Show More
Loading…
x
Reference in New Issue
Block a user