Ultra Ethernet Consortium: Advancing Network Technology for AI Workloads

Backed by the Linux Foundation, the Ultra Ethernet Consortium (UEC) has taken a decisive step towards enhancing Ethernet technology to meet the unprecedented performance and capacity demands brought on by AI workloads. With the exponential growth of AI, networking vendors have banded together to develop a transport protocol that can scale, stabilize, and improve the reliability of Ethernet networks, catering to AI’s high-performance networking requirements.

The Need for Enhanced Ethernet Technology for AI Workloads

AI workloads are anticipated to exert immense strain on networks, necessitating the need for advanced Ethernet capabilities. The UEC recognizes these demands and is working towards optimizing Ethernet technology to handle the scale and speed that AI requires.

The Development of a Transport Protocol Leveraging Proven Techniques

In their pursuits, the UEC aims to develop a transport protocol that leverages efficient session management, authentication, and confidentiality techniques from modern encryption methods like IPSec and SSL/TLS. By integrating these proven core techniques, the UEC seeks to enhance the performance and reliability of Ethernet networks.

Key Management Mechanisms for Efficient Sharing of Keys

Efficient sharing of keys among a large number of computing nodes participating in a job is crucial for enabling seamless operations in AI workloads. The UEC plans to incorporate new key management mechanisms to facilitate efficient key sharing, minimizing bottlenecks while maintaining data security.

Dell’Oro Group’s Forecast on AI Workloads and Ethernet Data Center Switch Ports

The recent “Data Center 5-Year July 2023 Forecast Report” by the Dell’Oro Group projects that by 2027, 20% of Ethernet data center switch ports will be connected to accelerated servers supporting AI workloads. This statistic highlights the growing demand for enhanced AI connectivity technology.

Generative AI Applications and Growth in the Data Center Switch Market

The increasing popularity of generative AI applications is expected to fuel significant growth in the data center switch market. According to Sameh Boujelbene, Vice President at Dell’Oro, the market is projected to surpass $100 billion in cumulative sales over the next five years. This growth reinforces the importance of optimizing Ethernet infrastructures for AI workloads.

Limitations of Interconnects for AI Workload Requirements

For many years, interconnects such as InfiniBand, PCI Express, and Remote Direct Memory Access over Ethernet have been the primary options for connecting processor cores and memory. However, these protocols have limitations when it comes to meeting the specific requirements of AI workloads. The UEC aims to address these limitations by fine-tuning Ethernet to enhance efficiency and performance at scale.

Ethernet’s Anniversary and Its Role in Supporting AI Infrastructures

Celebrating its 50th anniversary, Ethernet stands as a testament to its versatility and adaptability. As AI continues to grow in prominence, Ethernet will undoubtedly play a critical role in supporting the infrastructure needed for AI workloads.

Core Technologies and Capabilities in the Ethernet Specification by UEC

The UEC is actively working on an Ethernet specification that encompasses various core technologies and capabilities, including multi-pathing and packet spraying, flexible delivery order, modern congestion-control mechanisms, and end-to-end telemetry. These advancements will enable Ethernet networks to deliver improved performance and efficiency for AI workloads.

The Ultra Ethernet Consortium’s mission to enhance Ethernet networks for AI workloads reflects the pressing need for advanced connectivity technology. By leveraging proven techniques, incorporating efficient key management mechanisms, and fine-tuning Ethernet from the physical to software layers, the UEC aims to meet the challenges posed by AI’s unprecedented performance demands. As Ethernet continues to evolve and adapt, it will remain an integral component in supporting the growth and development of AI infrastructures.

Explore more