transceiver-db/blog-training-data/blog-038-cpo-pluggable-future.md
Rene Fichtmueller 99fca6b531 feat(training): add blog-031 through blog-040 — 10 expert articles
Topics: CWDM4/PSM4, MSA compliance, DAC/AOC TCO, grey vs DWDM,
ESD damage, tunable DWDM, FEC deep-dive, CPO hype cycle,
CMIS 4.0, vendor evaluation. Ø 1,180 words each.
2026-04-06 18:15:46 +02:00

7.1 KiB
Raw Permalink Blame History

title type target_audience score
Co-Packaged Optics: What CPO Actually Means for the Pluggable Transceiver Market hype_cycle technical 9/10

The CPO narrative that dominated networking conferences from 2022 through 2024 was built on a genuine engineering insight wrapped in a timeline that was chronically optimistic. The insight is real: the fundamental constraint limiting I/O efficiency in switch ASICs at 51.2 Tbps and beyond is the electrical interface between the ASIC die and the optical transceiver, specifically the PCB traces, electrical connectors, and SerDes front-end circuitry that collectively introduce 10-15 dB of electrical insertion loss at 56 Gbaud PAM4 signaling rates. Co-Packaged Optics addresses this constraint by integrating the optical I/O directly into the switch ASIC package, eliminating most of that electrical path. The timeline claims — "CPO will displace pluggable by 2025" — were engineering theater, not engineering analysis.

The physics problem CPO solves is concrete. At 51.2 Tbps switching capacity, a merchant silicon ASIC (Broadcom Tomahawk 4 or equivalent) drives 512 SerDes lanes at 100 Gbps each to reach total fabric capacity. Each SerDes lane drives a signal from the die through the ASIC package substrate, across PCB traces of 5-15 cm, through an electrical connector (SFP, QSFP, or OSFP cage), and into the pluggable transceiver. The total electrical insertion loss at 56 Gbaud on a typical route is 8-14 dB, which the SerDes driver must overcome with pre-emphasis and equalization. This equalization consumes power: roughly 20-30 pJ per bit for the SerDes on the ASIC die, which at 51.2 Tbps becomes 1.0-1.5 kW of SerDes power alone. By moving the optical engine to within 2-3 mm of the ASIC die — co-packaged in the same flip-chip BGA package or in an adjacent silicon bridge die — the electrical path length drops to 3-5 mm of silicon interposer, reducing insertion loss to 1-2 dB. This reduces SerDes power by an estimated 60-75%, from roughly 25 pJ/bit to 6-10 pJ/bit.

Broadcom has publicly discussed the "Bailly" architecture for 102.4 Tbps CPO implementations, and Intel has demonstrated CPO chiplets with its own roadmap for integration into future Tofino successors. The claimed system-level power reduction is 3-4x for the I/O subsystem, which at hyperscale volumes translates to tens of megawatts of avoided data center power consumption. This is why Google, Meta, and Amazon have been funding CPO research — not because they care about per-unit optics cost, but because their power bills for switching I/O infrastructure are measured in hundreds of megawatts.

The manufacturing problem that makes the timeline claims unrealistic is multi-die package integration yield. A co-packaged optical ASIC combines the switch fabric die (approximately 900mm² in 5nm TSMC for Tomahawk 4 equivalent), silicon photonics transceiver dies (one per port group, typically), and the package substrate routing them together. The overall package yield is the product of individual die yields: if the fabric die yields at 85% and each of eight optical dies yields at 90%, the assembled package yield is 0.85 × (0.90)⁸ = 0.85 × 0.43 = 36%. A 36% package yield on a package that costs $5,000-8,000 in materials makes per-unit economics catastrophic during ramp. Pluggable transceivers can fail a manufacturing test and be discarded individually; in a CPO package, a failed optical die means a $5,000+ assembly goes to scrap. This is the yield calculus that silicon photonics manufacturers must solve before CPO reaches production economics, and it is why IBM and Intel's own internal presentations at OFC 2023 showed first-production-volume targets of 2027-2029, not 2025.

Field replaceability is the operational argument that keeps CPO off the procurement roadmap for most enterprise and carrier deployments through at least 2030. A pluggable transceiver failure — MTBF typically 500,000-1,000,000 hours for quality 400G modules — is resolved by a field technician removing the failed module (30-second operation) and inserting a replacement. A CPO switch failure is a board replacement or system swap: the optical I/O is permanently integrated, so a single failed optical port group requires maintenance of the entire chassis. MTTR for a pluggable failure in a managed environment is typically 2-4 hours including parts dispatch. MTTR for a CPO system failure requiring chassis swap is 8-24 hours minimum, plus the cost of maintaining a full-system hot spare. For carrier-grade infrastructure with 99.999% availability requirements, this MTTR difference disqualifies CPO entirely until on-site optical repair and testing capabilities develop to the point where individual photonic die replacement becomes feasible — a capability that doesn't exist in field maintenance practice anywhere today.

The burn-in testing problem deserves mention as a secondary manufacturing challenge. Standard pluggable transceiver manufacturing includes 24-168 hours of burn-in at elevated temperature under electrical stress, a process that screens for infant mortality failures before the module leaves the factory. In a CPO package, you cannot burn in the optical dies independently after they're co-packaged with the ASIC — the burn-in temperature required to screen optical components (85°C, 168 hours per Telcordia GR-468) would degrade the CMOS gate oxides in the switch fabric die. This forces CPO manufacturers to either burn in optical dies before assembly (limiting the screen effectiveness) or accept higher field infant mortality rates on deployed systems. Neither is an acceptable answer for infrastructure with 7-10 year operational life expectations.

The practical impact on 800G infrastructure buying decisions today is precisely zero. Pluggable 800G QSFP-DD (IEEE 802.3ck Clause 153 for 800G-DR8) and OSFP 800G modules are in production from InnoLight, Coherent, Lumentum, and Earing. These modules will be in service in data center deployments through 2033-2035. CPO will begin appearing in hyperscale pilot deployments around 2028 at 51.2T or 102.4T fabric capacity points where the power economics justify the operational trade-offs. The pluggable market will expand to 1.6T (224 Gbaud per lane, 8 lanes per QSFP) before CPO reaches commercial maturity. Anyone presenting CPO as a near-term threat to pluggable investments in 2024-2027 infrastructure is projecting technology roadmap aspirations, not product availability.

The correct framing for CPO in 2026 is: a genuine long-term architectural evolution for hyperscale switching fabrics with compelling power economics, currently in the late R&D and early pilot phase, with no production deployments at commercial volume, and with three unsolved engineering problems (yield, burn-in, replaceability) that prevent economically rational deployment at enterprise or carrier scale before approximately 2028-2030. Pluggable transceivers at 400G, 800G, and eventually 1.6T will remain the dominant form factor for all foreseeable purchasing decisions. The investment in 800G pluggable infrastructure today faces zero technological obsolescence risk from CPO within its expected service life.