Workshop: PMBS22: The 13th International Workshop on Performance Modeling, Benchmarking, and Simulation of High-Performance Computer Systems
Authors: Sascha Hunold and Sebastian Steiner (Technical University Wien (Vienna University of Technology))
Abstract: Collective communication operations are fundamental cornerstones in many high-performance applications. MPI libraries typically implement a selection logic that attempts to make good algorithmic choices for specific collective communication problem. It has been shown in the literature that the hard-coded algorithm selection logic found in MPI libraries can be improved by prior offline tuning.
We go a fundamentally different way of improving the algorithm selection for MPI collectives. We integrate the probing of different algorithms directly into the MPI library. Whenever an MPI application is started, the tuner, instead of the default selection logic, finds the next algorithm to complete an issued MPI collective call and records its runtime. With the recorded performance data, the tuner is able to build a performance model that allows selecting an efficient algorithm.
We show in a case study using miniAMR that our approach can effectively tune the performance of Allreduce.