Ethernet in the Age of AI: Key Takeaways from TEF 2024, Part II
The rapid advancement of AI is driving an urgent need for faster, more efficient Ethernet networks. At the Ethernet Alliance’s Technology Exploration Forum (TEF 2024): Ethernet in the Age of AI, experts explored the shift from 200G to 400G Ethernet, emphasizing how this technology must evolve to meet AI’s growing demands.
TEF 2024 underscored the crucial need for collaboration across the Ethernet ecosystem, focusing on tackling technical challenges and fostering innovation. As Ethernet adapts to AI’s needs, industry leaders are working together to ensure scalable, high-performance networks for the future.
Racing Toward 400G: Navigating the Future of Ethernet in AI
“The main takeaway from the TEF 2024 is the swift growth of the industry. While IEEE is just now finalizing its 200G/lane specification, the AI community is already clamoring for 400G/lane. Despite the complexities of faster signal speeds, the Ethernet ecosystem is fearlessly and enthusiastically embracing both current challenges and those yet to come. The test and measurement community (T&M) faces increasingly sophisticated obstacles, necessitating robust cooperation between the T&M, interconnect, and IP sectors in order to adequately support validation and ensure interoperability as we journey forward.” – Ethernet Alliance Events and Conferences Chair, David J. Rodgers, EXFO
“The best in the industry were present for two full days on one important topic – 400G for AI. The event had the right balance of technical deep-dive presentations and informal interactions. The conversational format was very helpful due to the range of system considerations and related complexity. While 400G is challenging, across the various sessions, I felt that the technology is in great hands and a good trade-off is possible with a focused effort. This was a special effort by TEF2024 Chair John D’Ambrosia and the Ethernet Alliance in pulling the event together.” – AI-Centric Data Centers and their Diverse Network Requirements keynote speaker, Ram Huggahalli, Microsoft
“The 2024 Technology Exploration Forum (TEF 2024) exceeded expectations, making history as the first event to explore 400G per lane discussions with a focus on AI applications. TEF 2024 brought together key stakeholders from hyperscalers, vendors and standardization organizations. This collaboration showcased the power of unity and emphasized the importance of collective effort. The forum highlights the need for industry collaboration and ongoing dialogue. TEF 2024 demonstrated that shared expertise and cooperation are vital for driving innovation and advancement in the technology sector.” – Exploration of AI Interconnect panelist, Halil Cirit, Meta
“I thoroughly enjoyed attending the TEF 2024 conference. The panels were informative, and the speakers provided valuable insights into the different hurdles and innovative approaches for achieving 400Gbps per lane. I particularly appreciated the interactive approach of all the panels with enough time for questions and the opportunity to network with professionals from diverse backgrounds. The event was well-organized, and the range of topics covered allowed me to deepen my knowledge and gain practical takeaways. I look forward to attending future editions of the conference and continuing to engage with the community.” – Exploration of AI Interconnect panelist, Ashika Pandankeril Shaji, TE Connectivity
“As a system vendor and ASIC provider, we prioritize delivering seamless connectivity solutions for diverse needs, from intra-rack to inter-data center connections reaching up to 120km. To optimize ROI and scalability, we see optical interconnect as the preferred choice when lane rates reach 400Gbps. For distances up to 2km, 400Gbps IMDD technology using PAM4 modulation offers an efficient solution, while beyond 2km, coherent optical technology becomes essential to support longer reach connections while interfacing with 400Gbps electrical links. Although 200G bidirectional (Bidi) optics may serve as an interim solution, it faces significant challenges related to cost, power efficiency, and packaging, which impact its feasibility. Furthermore, as electrical link reach is limited, we anticipate Near-Packaged Optics (NPO) and Co-Packaged Optics (CPO) as integral to future solutions, with NPO requiring further investment to build the ecosystem.” – Exploration of AI Interconnect panelist, Guangcan Mi, Huawei
Shaping the Future of AI Networks: Collaboration, Scalability, and Latency Solutions
“Attendees’ high level of engagement highlighted a tremendous interest in scalable AI connectivity solutions. Industry-wide collaboration is essential to establish the role of standards in rapidly-evolving AI networks. Early efforts at Alphawave Semi and other industry leaders are pinpointing the most promising technologies to meet emerging demands.” – The Role of Optics in Future AI Applications panelist, Tony Chan Carusone, Alphawave Semi
“The conference was exceptionally rich, both in terms of content and the number of industry leaders in attendance. The strong presence of major hyperscalers underscored the critical importance of AI and the challenges they face in meeting the growing demand for AI back-end networks. There was broad consensus on the necessity of scalable, reliable, high-bandwidth, and low-latency networks that are also low-power, cost-effective, and quick to market. However, significant debate arose over prioritization and the best path forward for enabling next-generation speeds, particularly with 400G SerDes lanes. While PAM6 and PAM8 appear to be leading candidates, the final decision remains uncertain.” – The Future of Ethernet, Networks and AI panel moderator Sameh Boujelbene, Dell’Oro Group
“The traditional boundaries and definition of Ethernet will need to stretch and morph once-again to support the AI networks of the future. For example, AI “scale-up” and “scale-out” networks are quite different from the front-end data center networks in terms of expected reliability, performance, reach, power and latency. Furthermore, the “AI scale-out network” itself is diverse in definition as there are four distinct cases with competing optimizations to consider: GPU-GPU remote, GPU-GPU local, GPU-CPU, GPU-memory. Industry standard body participants must collaborate and innovate at a new level to deliver Ethernet solutions that will meet the demands of the AI operators.” – The Future of Electrical Signaling panelist, Kent Lusted, IEEE
“Latency has long been a common criticism when discussing Ethernet’s suitability for data centers. Yet, no specific standard has been proposed for how low latency should be, particularly at the PHY layer. As port speeds climb to 1.6 Tb/s with the rapid rise of AI, we conducted an in-depth review of how latency impacts AI applications – specifically, large language model (LLM) training and inference. Surprisingly, PHY layer latency is not the critical factor. We are dealing with latencies in the microsecond range, while Ethernet’s PHY layer latency is around 100 nanoseconds. Ethernet is entirely suitable for use in AI data centers, whether for scale-out or scale-up architectures.” – The Future of Ethernet, Networks and AI panelist, Xiang He, Huawei
The Voice of Ethernet: Leading the Charge in Network Innovation
TEF 2024 demonstrated the Ethernet Alliance’s role as the Voice of Ethernet, helping to unite key industry stakeholders to tackle Ethernet’s dynamically evolving challenges and opportunities. This collaboration fostered valuable discussions about its future, driving innovation and shaping its role in the networking ecosystem.
For a deeper dive into TEF 2024, explore our on-demand TEF Video Showcase to hear insights from our speakers.