Liquid Cooling Adoption: Roadblocks and Key Learnings

Authors: David Martinez (Sandia National Laboratories), Chris DePrater (Lawrence Livermore National Laboratory), Susan Coghlan (Argonne National Laboratory (ANL)), Karsten Kutzer (Lenovo), Aaron Anderson (National Renewable Energy Laboratory (NREL)), Michael Ott (Leibniz Supercomputing Centre)

Abstract: Liquid cooling is key to dealing with heat density, reducing energy consumption and increasing performance. With more than a decade's experience with liquid cooling in the large-scale supercomputing centers, many data centers are still facing challenges with adoption. This BoF will bring together people who are knowledgable in liquid cooling from supercomputing sites, system integrators, liquid cooling vendors, and engineering design companies to identify common roadblocks and key learnings for helping to resolve roadblocks.

Long Description: This BoF is about identifying roadblocks and key learnings to liquid cooling adoption. It will bring together people who are knowledgable in liquid cooling to identify common roadblocks and key learnings for resolving roadblocks. It will provide a real-time forum for those experiencing roadblocks to seek feedback and advice from others with more experience or different perspectives.

The main goal is to gather experiences of roadblocks and lessons learned from the community, both from those who will give lightning talks (invited participants) as well as from the audience (general participants). These experiences will create content for a white-paper to be published by the Energy Efficient HPC Working Group [ ].

This topic is very relevant to the HPC community. The increase of compute densities in supercomputers pose a growing demand to more effectively cool power-dense equipment and improve energy efficiency.

The EE HPC Working Group has held 5 Birds of Feather sessions on various aspects of liquid cooling at SC since 2013. All of them have been well attended with strong audience participation. Three of the five have been 90 minute evening sessions. This is an ideal time, especially Tuesday night, because it is generally the only evening session on that night with a facilities focus.

A 90 minute BoF at SC21 on “Liquid Cooling Challenges and Facility Experiences: What's Next?” was focused on exascale-class installations and the unique challenges of scale. The discussion at the SC21 BoF brought to light difficulties in adoption of liquid cooling for all HPC sites, not just exascale-class sites. Many sites have not yet deployed any liquid cooling technologies, or have minimal experience. This BoF will focus on roadblocks and lessons learned for all sites, independent of their size and level of experience. This

Prior BoFs: SC18 “The Facility Perspective on Liquid Cooling: Experiences and Proposed Open Specification”. SC15 “Dynamic Liquid Cooling, Telemetry and Controls: Opportunity for Improved TCO?” SC14 “Design, Commissioning and Controls for Liquid Cooling Infrastructure”. SC13 “Best Practices for Commissioning Liquid Cooling Infrastructure”

Audience interactivity, community building and audience draw: We will do outreach through the 850+ members of the Energy Efficient HPC Working Group [ ] via announcements and email. The EE HPC WG has a booth at SC22 and this BoF will be highlighted in that booth. Besides operations managers from HPC sites, this BoF will attract many participants from the vendor community; system integrators, liquid cooling suppliers, architecture and engineering firms. We know that there will be strong participation from large-scale HPC facilities in the United States, Europe and Japan via the Energy Efficient HPC Working Group.

The EE HPC WG has active collaboration with ASHRAE and the Open Compute Project. The work developed by the EE HPC WG and vetted with the HPC community at SC has been adopted by these broader enterprise and cloud focused organizations. For example, the Transfer Fluid Chapter of the Proposed Open Specification described in the 2018 BoF listed above is being actively reviewed and revised by OCP and ASHRAE.


