Workshop: ESPM2 2022: Seventh International Workshop on Extreme Scale Programming Models and Middleware
Authors: Adrian Jackson (Edinburgh Parallel Computing Centre (EPCC))
Abstract: The move to larger, more powerful, compute nodes on large scale HPC systems has been significant in recent years. It's not uncommon for nodes now to have 128+ computational cores, and significant amount of GPU resources. This provides potential scope for active middleware to run on these nodes, managing anything from storage and I/O to compute kernels and network traffic. However, there needs to be a stronger understanding of the impact of on-node workloads on application performance, especially when we are aiming to scale to exascale systems with many millions of workers. I will discuss work we are doing to evaluate and characterize the impact of on-node workloads, and explore some of the active middleware that could enable scaling up to very large node and system sizes without requiring significant user application changes.