OMPICollTune: Autotuning MPI Collectives by Incremental Online Learning
DescriptionCollective communication operations are fundamental cornerstones in many high-performance applications. MPI libraries typically implement a selection logic that attempts to make good algorithmic choices for specific collective communication problem. It has been shown in the literature that the hard-coded algorithm selection logic found in MPI libraries can be improved by prior offline tuning.

We go a fundamentally different way of improving the algorithm selection for MPI collectives. We integrate the probing of different algorithms directly into the MPI library. Whenever an MPI application is started, the tuner, instead of the default selection logic, finds the next algorithm to complete an issued MPI collective call and records its runtime. With the recorded performance data, the tuner is able to build a performance model that allows selecting an efficient algorithm.

We show in a case study using miniAMR that our approach can effectively tune the performance of Allreduce.
Event Type
Workshop
TimeMonday, 14 November 20229:45am - 10am CST
LocationC155
Registration Categories
W
Tags
Applications
Architectures
Benchmarking
Exascale Computing
Modeling and Simulation
Performance
Performance Portability
Session Formats
Recorded
Back To Top Button