AI Introduces Varied Infrastructure Considerations for Data Centers

By Ethernet Alliance

Blog


Share

“Artificial intelligence is transforming every industry,” reads Evolving Infrastructure Requirements for AI Data Centers. “No other technology since the advent of the internet has brought such a wave of change in how business is done and how data centers need to be built.”

A recent Ethernet Alliance white paper—co-authored by Bob Wagner, Sr. Business Development Manager, Panduit; Jose Castro, Distinguished Manager, Optical Research, Panduit; Brian Kelly, Sr. Solutions Architect, Panduit, and Justin Blumling, Technical Sales Engineer, Panduit—explores the contemporary landscape of requirements in building data centers. AI is bringing change and new choices in a wide range of considerations:

  • Power Density—“The sudden increase in demand for AI has brought substantial stress to data-center owners over the past two years, as they had to pivot to new building designs while also looking at retrofitting even newly built data centers. Large-scale AI data centers often require hundreds of megawatts and scale to several gigawatts in the most advanced facilities. To put this in perspective, these power demands are comparable to small- or mid-sized cities. This new normal is challenging the way data center buildings get their power.” Indeed, many data centers are generating their own power because of multiple-year lead times to access more power from the grid. Table stakes for any new data-center building is 415V to the rack, and innovation of specialized cooling technology and techniques for racks is rampant.
  • Cooling Approaches—“Today, most data center companies are spending a lot of time researching the best way to supply liquid cooling. (…) There continues to be disagreements about whether water or dielectric refrigerants are best, but nearly all customers are looking to add liquid cooling.” Data centers must weigh the strengths and limitations of techniques such as direct-to-chip (DTC) and rear-door heat exchangers (RDHX) relative to power requirements per rack. Often, a combination of techniques is employed, depending on the specific needs of a given data center. Immersion-cooling technologies, for example, promise tremendous efficiencies, but the value of that benefit must be weighed, however, against associated requirements around infrastructure changes and other complexities.
  • Network Design and Cabling—“Once a peripheral consideration, the network is now central to data-center performance, and its needs can be met with physical infrastructure capable of supporting high power, dense fiber cabling, and scalable growth for even the largest AI facilities.” Data centers are under pressure to carve out larger pathways and adopt more efficient cable-management strategies to accommodate the higher data rates, flat rail-optimized network topologies, and multiple networks within AI systems. The back-end compute network, which connects the graphic processing units (GPUs) to one other, demands the highest data rates ( 400G -> 800G); whereas, the storage (200G -> 400G); front-end, in-band management (100G) and out-of-band management networks (<10G) utilized for AI systems deliver specialized functionality requiring their own data rates.
  • Protocol Support—Though front-end communication, storage and management networks have successfully relied on Ethernet, the first back-end networks—with their legacy rooted in High-Performance Computing (HPC) —typically favored InfiniBand. Environments such as HPC with the greatest latency sensitivities historically have utilized InfiniBand because one of that protocol’s core features, Remote Direct Memory Access (RDMA), allows one computer to read or write directly into another’s memory without involving the central processing unit (CPU) to reduce latency. The global Ethernet community has undertaken innovation through various activities—RDMA over Converged Ethernet (RoCE), the Ultra Ethernet Consortium (UEC), and IEEE 802.3—to better serve these data-center needs, especially since the emergence of AI, and industry consensus has mounted that Ethernet is now poised as the protocol best suited for even the highest-performance deployments.

AI is dramatically changing the data-center landscape via its need for much higher power and new, rapidly evolving network architectures. The new white paper, Evolving Infrastructure Requirements for AI Data Centers, is one way that the Ethernet Alliance is helping smooth the transition through industry education.

To learn more about considerations and tradeoffs in areas such as power density, cooling approaches, network-design and cabling, and protocol support, download Evolving Infrastructure Requirements for AI Data Centers paper from the Ethernet Alliance.

Share

Subscribe to our blog

Subscribe to receive the latest insights and updates from the Ethernet Alliance.

No, thanks!

*

*

*