transceiver-db/blog-training-data/blog-006-dom-diagnostics.md
Rene Fichtmueller 80aa85961b feat: add 7 gold-standard blog training articles for BlogLLM
Reference quality articles covering: 400G DR4 pricing, vendor lock-in,
silicon photonics, fiber plant readiness, 400ZR reality check,
DOM diagnostics, 800G readiness. All follow strict FO Blog Pipeline
rules — no markdown headers, no spec dumps, one thesis per article.
2026-04-06 01:58:05 +02:00

42 lines
6.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "Reading DOM Data Correctly: What the Numbers Are Actually Telling You"
type: tutorial
audience: network_engineers_noc_operators
quality_score: 9
generated_by: claude-sonnet-4-20250514
generated_at: 2026-04-06
training_data: true
---
The DOM readout is on every transceiver in your network. Most engineers look at it when something's broken. The ones who look at it before something's broken find things earlier and fix them for less money.
Digital Optical Monitoring gives you five parameters: transmit power, receive power, supply voltage, bias current, and temperature. That's the base set. Coherent modules add more — OSNR, laser frequency, pre-FEC BER. For this, focus on the base five, because those are what you have on every port, and what most teams systematically underuse.
The CLI for getting DOM data varies by platform. On Junos, `show interfaces diagnostics optics xe-0/0/0` gives you the full picture including alarm and warning thresholds. On EOS (Arista), `show interfaces transceiver detail` is equivalent. On IOS-XE, `show interface GigabitEthernet1/0/1 transceiver detail`. Every platform has it. The output format is different but the parameters are the same.
Here's what each one means operationally.
Transmit power is the output of the laser. It's specified in dBm and it has a valid range that's in the module spec. For an SFP+ SR module, the range is typically -8.2 to +0.5 dBm. For a QSFP28 LR4, the Tx spec per lane is -4.3 to +4.5 dBm. The absolute values matter less than the trend. A new module installed eighteen months ago with Tx at -1.2 dBm, now reading -4.8 dBm — that's laser degradation. It's slow and it's real. The module may not be failing today, but it's showing you the trajectory.
Receive power is what's arriving at the photodetector after traveling through the fiber. This is the number that tells you about your fiber plant, not about your transceiver. If Tx looks normal but Rx is low, the problem is between the ports. Dirty connectors. High-loss splice. Wrong fiber type. A cable that was pulled too hard around a tight bend radius. When Rx drops suddenly and Tx hasn't changed, something physical happened.
Bias current is how hard the laser is being driven to maintain its output. As a laser ages, the control circuit increases bias current to compensate for declining efficiency. A module with Tx power in spec but bias current at 80-90% of the maximum range is a module that's compensating. Tx looks fine, bias tells you it won't last. This is the parameter most teams ignore and the one that gives the earliest warning of laser end-of-life.
Temperature matters more than most teams account for. Transceivers have operating ranges — COM grade (0-70°C) and Industrial grade (-40 to +85°C) are the main ones. Most data center optics are COM grade. At sustained temperatures above 65°C, you start seeing performance degradation and accelerated aging. The temperature alarm threshold is usually 75°C for COM modules — when you hit an alarm, you're already well into reduced-lifespan territory.
Voltage is usually boring. Power supply instability causes voltage anomalies, but well-maintained infrastructure rarely shows voltage deviations. If you're seeing voltage alarms, look at the switch power supply first.
The threshold values in the DOM output — high alarm, high warning, low warning, low alarm — come from the module itself. They're programmed by the manufacturer and they reflect what the module is designed to tolerate. A high alarm on Rx power doesn't mean the link is about to fail; it means the input power is above what the photodetector was calibrated for, which can cause receiver saturation. For LR4 in a short patch context — somebody put an LR4 in a rack-to-rack run that's effectively 3 meters — this is a real scenario. Add an attenuator, don't replace the module.
The most useful thing you can do with DOM data isn't checking it reactively. It's baseline logging. Record the DOM values for every module at installation. For Tx power, Rx power, and bias current, record the reading once a month. Three months of data shows you trends. Six months of data shows you which modules in your deployment are degrading faster than others, and it shows you before those modules cause outages.
This is routine in carrier and hyperscale environments. In enterprise and service provider environments below a certain size, it's often not done because it requires tooling and someone to look at the output. The tooling options are simpler than they used to be — LibreNMS, Netdisco, and several commercial NMS platforms will poll and graph DOM data automatically if you configure them to. The cost of not doing it is a Tx power alarm at 2 AM that would have been a planned maintenance window if you'd been watching the trend.
One practical trap: DOM data from a module is only as useful as the calibration of that module's internal sensors. Most well-made transceivers have sensor accuracy within 2-3 dB on power readings and within 3-5°C on temperature. Generic or extremely low-cost modules sometimes have wider tolerance. If you're seeing DOM readings that don't match an external power meter measurement, the module sensor may be the issue — it's a calibration problem with the module itself, not a fiber plant problem.
When DOM data and physical measurements disagree, trust the power meter on the fiber, not the module readout. The fiber doesn't lie. The module sensor calibration occasionally does.
For coherent 400ZR modules, pre-FEC BER is the additional parameter that matters most. Pre-FEC BER below 2.4×10^-4 is normal operating range for KP4 FEC. Above that threshold, the FEC is correcting errors that it may not be able to keep up with under degraded conditions. A stable pre-FEC BER of 1×10^-4 is fine. A pre-FEC BER that varies from 10^-5 to 10^-3 depending on traffic load is a span with marginal OSNR. That's a different problem than a dirty connector, and it requires a different fix.
DOM data doesn't replace physical inspection and fiber characterization. What it does is tell you where to start.