With the rapid advancement of artificial intelligence (AI), the power and cooling demands of AI processing have surpassed what standard hardware configurations can deliver. Traditional methods of server-side applications are simply insufficient to meet the unique requirements of AI workloads. In this article, we will explore the need for specialized infrastructure for AI and delve into the key considerations and recommendations put forth by Schneider Electric, a leading provider of energy management and automation solutions.
The Need for Specialized Infrastructure for AI
AI workloads differ significantly from traditional server-side applications such as databases. The old ways of handling data centers just don’t cut it anymore. AI processing demands power, cooling, and bandwidth on an unprecedented scale. To ensure optimal performance and efficiency, it is essential to address these key requirements.
The Three Key Requirements for AI
AI processing relies heavily on computational power. Standard server configurations are ill-equipped to handle the immense power demands of AI workloads. As a result, data centers need to adopt specialized power distribution systems that can deliver the necessary levels of energy required for AI processing.
The heat generated by AI servers is substantial, surpassing what conventional air cooling methods can effectively handle. In the past, air cooling through heat sinks and fans was sufficient for rack densities of up to 10kW to 20kW. However, for racks exceeding 30kW, alternative cooling methods, such as liquid cooling, become imperative to maintain optimal operating temperatures.
For AI training, each GPU requires its own high-throughput network port. However, the rapid advancements in GPU capabilities have outpaced the development of network ports. This bottleneck hampers the efficiency of AI training and necessitates the implementation of a robust networking infrastructure that can keep up with the demands of AI processing.
Projected Global Data Center Power Consumption
According to Schneider Electric’s projections, the total cumulative data center power consumption worldwide is expected to reach 54GW this year. This figure is estimated to surge to a staggering 90GW by 2028. With the increasing adoption of AI technologies, it is crucial to revamp existing data center infrastructures to meet these ever-growing power demands.
Challenges of GPU Networking for AI Training
The exponential growth in GPU capabilities has posed a significant challenge for network port development. While GPUs have advanced, network ports have struggled to keep pace. To overcome this, data centers must equip each GPU with its own high-throughput network port to avoid bottlenecks during AI training.
Schneider’s Recommendations for AI Infrastructure
Schneider Electric offers several suggestions to address the power, cooling, and bandwidth challenges posed by AI processing.
1. Power Distribution: Replace traditional 120/280V power distribution systems with higher-voltage alternatives like 240/415V systems. This upgrade allows for more efficient power delivery, reducing energy waste.
2. Cooling Solutions: Implement liquid cooling for high-density racks. While different forms of liquid cooling exist, direct liquid cooling is advocated for its superior efficiency and ability to handle the extreme heat generated by AI servers.
Importance of Infrastructure Assessment
Given the lack of standardization in liquid cooling technologies, conducting a thorough infrastructure assessment is of paramount importance. Such an assessment ensures that the implementation of liquid cooling is tailored to the specific needs and demands of the data center, guaranteeing optimal performance and reliability.
Integration of Liquid Cooling During Data Center Construction
It is worth noting that most data centers incorporate liquid cooling infrastructure during the initial construction phase. Adding liquid cooling systems retrospectively can be challenging and disruptive. Therefore, careful planning and foresight during the data center design phase can significantly streamline the implementation of liquid cooling for AI workloads.
AI processing demands specialized infrastructure solutions that go beyond the capabilities of traditional hardware configurations. Power, cooling, and bandwidth are vital components that must be adequately addressed to ensure optimal performance and efficiency. By embracing Schneider Electric’s recommendations, data centers can meet the ever-increasing demands of AI processing and pave the way for a future powered by artificial intelligence.