Authors: Jesus Carretero (University Carlos III of Madrid, Spain), Estela Suarez (Forschungszentrum Jülich, University of Bonn), Martin Schulz (Technical University Munich), Michela Taufer (University of Tennessee), Michèle Weiland (Edinburgh Parallel Computing Centre (EPCC)), Martin Schreiber (Grenoble Alpes University, France)
Abstract: Traditional interest in increasing parallelism for individual jobs in HPC systems is being conditioned by the variety and dynamicity of resource demands of jobs at runtime. Malleability techniques can help to adapt resource usage dynamically to achieve maximum efficiency. Malleable HPC systems, however, face a series of fundamental research challenges, such as resource management, scheduling, malleability control, applications co-design, and data movement. All aforementioned issues will be addressed in the proposed Birds of a Feather session, which aims at building a community of developers and users around the topic of malleability in High Performance Computing, Networking, and Storage.
Long Description: Traditional interest in increasing parallelism for individual jobs in HPC systems has been impressed by the variety and dynamicity of resource demands of jobs, both applications and workflows, at runtime. Malleability techniques can help to dynamically adapt resource usage dynamically to achieve maximum efficiency by adjusting the computation and storage needs of applications, on the one side, and the allocation of hardware resources to them, on the other, when applications enter into execution phases requiring less or more resources consumption that those currently allocated. Malleable HPC systems, however, face a series of fundamental research challenges, such as resource management, scheduling, malleability control, applications co-design, and data movement. All aforementioned issues will be addressed in the proposed Birds of a Feather session, which aims at building a community of developers and users around the topic of malleability in High Performance Computing, Networking and Storage.
Goal:
The goal of this BoF session is to discuss malleability techniques and their impact on applications and systems. We will use the BoF to solicit input from interested parties to identify challenges and opportunities to drive the development of future academic and commercial solutions to support malleability computing and I/O with the final objective of including them in standards such as MPI or PMIx.
Topics:
Malleable systems, however, face a series of fundamental research challenges, including: who initiates changes? How is it communicated to applications? How to determine the optimal usage of the available resources? How can applications cope with dynamically changing resources? What should malleable programming models and abstractions look like? How to design scalable resource management frameworks for malleable systems? Which resources may benefit from malleability and which (if any) should still be managed statically?
To advance in the solutions of those challenges, the BoF session will focus on the following topics of discussion:
System architecture considerations to enable efficient implementation of malleability.
Runtimes, parallel programming models and techniques, and libraries supporting malleability
Malleable scheduling and load distribution considering multicriteria aspects, such as computing, I/O, fault tolerance, and energy efficiency.
Potential usage of AI techniques to steer malleability in systems and applications.
Support for malleable applications in performance, debugging and correctness tools.
Expected HPC audience:
In order to address the aforementioned challenges, this BoF session will bring together researchers from diverse areas of HPC, AI and Data processing that are impacted or actively pursuing malleability concepts, from application developers to system software researchers and system architects. The BoF will provide a lively discussion for researchers and vendors working in HPC and pursuing the concepts of and around malleability. Experiences and use cases applying malleability to HPC applications and runtimes are specially welcome to the discussion.
Expected outcome:
Identifying challenges and future perspectives to support I/O and computation malleability in HPC and proposal of solutions to them. Strengthen or establish new international research and collaborations on malleability topics. Pushing the definition of a roadmap for the adoption standards specific for malleability. Raise awareness of this topic to the HPC community.
URL: https://www.admire-eurohpc.eu/sc22-bof-enabling-i-o-and-computation-malleability-in-high-performance-computing/
Back to Birds of a Feather Archive Listing