Performance Portability of Sparse Block Diagonal Matrix Multiple Vector Multiplications on GPUs

SC22 Proceedings

Workshops Archive

Performance Portability of Sparse Block Diagonal Matrix Multiple Vector Multiplications on GPUs

Workshop: 2022 International Workshop on Performance Portability and Productivity (P3HPC)

Authors: Khaled Ibrahim and Chao Yang (Lawrence Berkeley National Laboratory (LBNL)) and Pieter Maris (Iowa State University)

Abstract: The emergence of multiple accelerator based computer architectures and programming models makes it challenging to achieve performance portability for large-scale scientific simulation software. In this paper, we focus on a sparse block diagonal matrix multiple vector (SpMM) computational kernel and discuss techniques that can be used to achieve performance portability on NVIDIA and AMD based accelerators using CUDA, HIP, OpenACC, Kokkos. We show that performance portability can vary significantly across programming models, GPU architectures, and problem settings, up to 52x in the explored problems. Our study visits the performance portability aggregation metric to guide the development and the selection of performance portable variants.

Back to 2022 International Workshop on Performance Portability and Productivity (P3HPC) Archive Listing

Back to Full Workshop Archive Listing