Authors: Ryan Grant (Queen's University, Belfast), Siddhartha Jana (Intel Corporation), Tapasya Patki (Lawrence Livermore National Laboratory), Natalie Bates (Energy Efficient HPC Working Group)
Abstract: This BoF will bring together academia, government research laboratories, and industry to discuss and contribute to the two active community-driven, vendor-neutral forums focusing on energy efficiency in HPC software stacks. For more than 7 years, these two complementary forums- HPC-PowerStack and PowerAPI - have led the efforts in identifying and building software solutions across the software stack.
This highly interactive BoF will enable the community to discuss ongoing challenges in designing cost-effective, cohesive, portable, and interoperable implementations of HPC software that enable monitoring and control of system efficiency. Attendees will contribute brainstorming solutions for addressing imminent exascale power challenges.
Long Description: Relevance:
Despite recent advances at Exascale, power, and energy are still of great concern to HPC sites of all sizes. There are several parallel R&D efforts in energy-efficient solutions due to the importance of the topic and a community built around solving future energy problems. The majority of techniques developed so far have been designed in accordance with vendor-/site-specific restrictions. State-of-art specifications stop short of defining which software components should actually be interoperating in a unified stack. System-wide coordination is critical for avoiding the underutilization of system FLOPS/Watts.
For 7+ years, two complementary forums: HPC-PowerStack and PowerAPI have been addressing power challenges from within the software stack. The efforts have focused on: (A) identifying the critical software actors needed in a system stack; (B) reaching consensus on their roles and responsibilities; (C) designing protocols for bidirectional control and feedback signals among them for enabling scalable coordination at multiple granularities; (D) establishing unified hierarchical communication models/APIs to access power monitor and control knobs in hardware and software; and (E) leveraging existing prototypes and building a community that actively participates in open development and engineering efforts.
Pre-SC22: Within the PowerStack consortium, 40+ representatives from industry, labs, and academia have convened twice a year (SC/ISC timeframe) for knowledge transfer and collaboration on community-wide standardization efforts for designing a power-management stack. Likewise, the PowerAPI community has convened monthly to focus on the design of the API specification that enables interoperability between the stack components. Over these past ~7 years, the community has arrived at a consensus that (1) job/application awareness is going to be critical for boosting system-wide optimization. This implies the need to drive interoperation between a job-level runtime and the job scheduler; (2) hierarchical control systems are good models for scalable global optimization across the system, so the power stack should be a multi-tiered system with bidirectional control and feedback signals flowing between the layers. Today’s systems are inefficiently designed, in that, they break this hierarchy model. And we as a community need to work towards fixing this. These align with the attendee feedback from our ISC19, SC19, and ISC21 BoFs.
BoF-Goals: Despite the community making progress toward engineering power management solutions, questions about the stack’s design remain unresolved. Community-driven efforts to design prototypes and share experiments will help tackle these. Since designing an entire stack from the ground up is a gargantuan effort, it is extremely important that the entire global HPC community is made aware of, and willing to contribute to this effort. Hence, this BoF.
Goals are: (1) make attendees aware of the emerging community effort to design a common power stack and discuss the lessons learned during the past seminar; (2) provide updates on the current and future prototyping efforts that have begun; and (3) align efforts across the community so that the SC22 BoF attendees reach a consensus with regards to sharing R&D resources, avoid duplicating effort, agree on standard interfaces, and reap the rewards together as a community.
Back to Birds of a Feather Archive Listing