The investments required to upgrade data center infrastructure to power and cool AI hardware are substantial. This transition for African enterprises will not happen quickly, and data center administrators must look for ways to make power and cooling future-ready, explain Wojtek Piorko and Jonathan Duncan at Vertiv. AI is already transforming people’s everyday lives, with local use of technology like ChatGPT, virtual assistants, navigation applications, and chatbots on the upswing. And just as it is transforming every single industry, it is also beginning to fundamentally change data center infrastructure, driving significant changes in how high-performance computing is powered and cooled.
AI applications are increasingly demanding, requiring vast computational power and, accordingly, enhanced cooling systems to ensure optimal performance and longevity of data center hardware. Traditional cooling methods struggle to manage the heat generated by AI-specific hardware like GPUs, necessitating an upgrade to liquid cooling technologies. This article outlines a strategic approach for African data centers to efficiently manage power and cooling in the context of rising AI workloads.
Assess Current and Future Needs
The first step in adapting data centers to AI is for IT and facility teams to assess how much space will be needed to support new workloads. As the demand for AI and high-performance computing (HPC) grows, it is critical to ensure that there is adequate space to meet current needs while also accommodating future expansion. This evaluation might entail converting a small number of existing racks or even dedicating entire rooms to handle these intensive workloads. Deciding on the extent of space allocation is vital for optimally integrating liquid cooling systems essential for handling high-density heat loads produced by AI hardware.
AI chips typically require around five times as much power and cooling capacity as traditional servers. This significant increase creates challenges across the entire power infrastructure, from electricity grids supplying the data center to each individual chip. Facility teams must devise plans that detail how new power and cooling systems can be incrementally added to manage both current and future loads. Proper planning ensures that as AI usages grow, the necessary cooling infrastructure will be incrementally deployed, optimizing performance and efficiency at every stage.
Site Inspection
Conducting a thorough site inspection is crucial before committing to retrofitting existing infrastructure with advanced liquid cooling systems. Detailed inspections, performed in collaboration with technology partners, will ensure that adding these systems is both technically feasible and economically viable. One essential task involves conducting a computational fluid dynamics (CFD) study to analyze existing airflow patterns within the facility. This study identifies any inefficiencies in current air-cooling systems and determines where improvements can be made or if existing systems can be incorporated into hybrid cooling solutions.
Existing air-cooling equipment must be assessed to see if it can be integrated into new liquid cooling systems. This process includes analyzing the capacity of current piping to determine if it can support increased coolant flow rates needed for high-density AI workloads. Flow network modeling studies provide insights into whether the proposed liquid cooling systems can effectively manage the thermal demands of advanced server setups. Additionally, teams should conduct water and power usage efficiency analyses to evaluate how effectively resources are currently utilized and where improvements can be made.
Modeling the Preferred Configuration
With comprehensive data from site inspections, IT and facility teams can proceed to create a model of the desired hybrid cooling setup. This modeling phase is critical for identifying potential obstacles before physical implementation begins. The data gathered will highlight concerns such as weight limitations on raised floors, the availability of on-site water resources, and the need for additional piping installations. Addressing these issues in the modeling phase ensures that no unexpected challenges arise during actual deployment.
Partnering with specialists, teams can develop a digital twin replica of the proposed cooling infrastructure. This 3D model aids in visualizing new systems and processes, allowing for a detailed exploration of potential configurations. Utilizing digital twin technology enables teams to simulate various scenarios and optimize the design for both current and future cooling needs. Any logistical concerns, such as access route constraints or new piping needs, can be resolved in this virtual planning stage, thus streamlining the eventual physical installation process.
Operational Impact Analysis
The next step involves analyzing how the implementation of liquid cooling systems will impact current operations within the data center. The insights gained from modeling and site inspections will guide the development of a robust business case to present to executives for approval. This analysis should include considerations on how on-site construction activities might disrupt ongoing workflows. Adding new cooling and power systems will introduce additional heat loads, affecting existing workloads and potentially altering service-level agreements.
Understanding the full scope of disruption helps in creating mitigation strategies that minimize operational downtime. Teams should plan for phased installations that coincide with low-activity periods to ensure minimal impact on data center operations. Additionally, assessing the impact of new cooling loads on existing power and cooling infrastructure helps in creating realistic timelines and cost estimates that consider potential complications and operational pauses.
Efficiency and Sustainability Considerations
While the primary goal is to manage the thermal demands of AI workloads, the efficiency and sustainability of the cooling solutions are equally important. Liquid cooling technologies, by removing heat directly at the source, can be significantly more efficient than traditional air-cooling methods alone. This efficiency translates into lower PUE (Power Usage Effectiveness) metrics for the facility. Efficient cooling reduces the overall energy consumption of the data center, helping to minimize operational costs and environmental impact.
One notable advantage of liquid cooling is the ability to recapture and reuse heat. Waste heat from cooling systems can be repurposed to warm nearby buildings, offices, or even agricultural installations, supporting the circular economy. By reducing energy waste and allowing for heat reuse, liquid cooling systems contribute to the sustainability goals of enterprises. These improvements can lead to reduced direct and regulated energy emissions, making them an integral part of a company’s green initiatives.
RFP and RFQ Creation
Finally, with a thorough understanding of site-specific needs and a detailed plan in place, teams can draft Requests for Proposal (RFP) and Requests for Quotation (RFQ). These documents will outline the design, bill of materials, and required services necessary for the new cooling solution. Issuing these requests helps in identifying and selecting manufacturers equipped to build and integrate the tailored liquid cooling system.
The RFP and RFQ processes will solicit detailed quotes and project timelines from multiple vendors. Selecting the right partners ensures that the cooling systems will be implemented effectively and efficiently, leveraging the newest technologies to meet both current and future demands. Comprehensive planning and execution of these steps will enable African data centers to handle the increasing thermal loads of AI workloads, providing the required infrastructure to support advanced computational needs while aligning with sustainability objectives.