Authors: Estela Suarez (Forschungszentrum Jülich, University of Bonn), Philippe Deniel (Atomic Energy and Alternative Energies Commission (CEA)), Kengo Nakajima (University of Tokyo, RIKEN Center for Computational Science (R-CCS)), Anshu Dubey (Argonne National Laboratory (ANL), University of Chicago), Pierre Axel Lagadec (Bull Atos Technologies), Nick Wright (Lawrence Berkeley National Laboratory (LBNL), National Energy Research Scientific Computing Center (NERSC))

Abstract: This BoF will be a forum to discuss most recent topics of research around disaggregated heterogeneous architectures, their operation and use. “Disaggregated” aka “modular supercomputing” refers to a system-level architecture in which heterogeneous resources are organized in partitions or modules, each one with a different type of node-configuration. This approach is gaining traction in the HPC landscape, with Perlmutter, Lumi, JUWELS and MeluXina representing just some examples. This BoF discusses the challenges seen by operators, vendors, developers of system software, programming models and tools, as well as application developers when adapting their codes to make use of such machines.

Long Description: The goal of this BoF is building up a community of providers, administrators, and users of large-scale modular HPC systems, in which hardware elements are “disaggregated”. This concept refers to a system-level architecture in which heterogeneous resources are organised in partitions or modules, each one with a different type of node-configuration: CPU-only, CPU+GPU, CPU+FPGA etc. Examples of such systems are Perlmutter at LBNL/NERSC (USA), Wisteria/BDEC-01 at University of Tokyo (Japan), or JUWELS at JSC (Germany).

This BoF will be the forum to discuss the most recent topics of research in disaggregated heterogeneous architectures, identify their key drivers, and establish links between them leading to a long-term forum for exchange of ideas and experience.

Today’s HPC systems are highly heterogeneous machines combining different processors, network, memory, and storage technologies. This diversification is expected to grow further with the integration of disruptive technologies, such as AI-accelerators, neuromorphic devices, or even quantum computers. Orchestrating and using this hardware-zoo poses enormous challenges. System developers and operators require scalable ways to interconnect the different technologies, advanced scheduling and management techniques, and I/O and data management mechanisms to deal with increasingly data-intensive workflows. The users, on their side, need methods to efficiently transfer data between compute, memory and storage elements, and strategies for programming thousands of devices with partially different instruction set architectures and vendor-specific environments.

The exact manifestation of the above challenges depends on how the hardware resources are organised at system level. Some experts advocate for a monolithic approach in which all nodes are equal, each node containing a variety of computing elements. Others go in exactly the opposite direction and segregate the resources at system level, grouping the different types into partitions or modules. This latter category is the focus of this BoF.

Addressed audience comprises HPC centres operating or planning to deploy modular/disaggregated supercomputers, vendors building them including storage and network administrators, developers of system software, programming models and tools that address system-level heterogeneity, and application developers that are adapting their codes to make use of such machines. The panel of speakers represent these sectors and will raise their respective challenges.

This BoF is conceived as a forum for open discussion and community building and reproducibility does not apply. In order to attract a diverse audience, the event will be announced by the organisers in their forums, communities and geographies, as well as on mailing lists and distribution channels such as HiPEAC and Women in HPC. The organising committee and the speaker selection is gender and geographically diverse. During moderation of audience participation priority of speech will be given to women and minority groups. Online-posting of questions will be enabled to include participants who are hesitant to pose questions openly.

Main topics and outcome of the discussion will be summarised in a blog entry to be published after the event in the workshop website, which shall serve as a hub to continue interaction and follow up activities on the topic of disaggregated heterogeneous architectures.


