SC22 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Workshops Archive

Piper: Pipelining OpenMP Offloading Execution through Compiler Optimization for Performance

Workshop: 2022 International Workshop on Performance Portability and Productivity (P3HPC)

Authors: Konstantinos Parasyris and Giorgis Georgakoudis (Lawrence Livermore National Laboratory), Johannes Doerfert (Argonne National Laboratory (ANL)), and Ignacio Laguna and Tom Scogland (Lawrence Livermore National Laboratory)

Abstract: OpenMP offload improves the application development complexity of HPC GPU codes and provides portability. A source of poor performance is the lockstep execution of data transfers and computation. Overlapping these operations can provide significant performance gains. However, the developer must manually slice data transfers and kernel execution, and efficiently schedule these operations for execution, which is a hard and error-prone task.

We propose Piper, an automatic mechanism for OpenMP offload to perform overlapping. Piper statically analyzes offload kernels and associates computations with memory locations. The extended runtime system exploits this analysis information, divides a kernel into independent sub-tasks, and schedules them for pipelined execution for overlapping. At any point in time, Piper also controls the coarseness and number of sub-tasks executed. By doing so, Piper allows the execution of kernels whose memory requirements exceed the GPU device memory. Piper speeds up execution up to 2.67× compared to OpenMP-offload execution.

Back to 2022 International Workshop on Performance Portability and Productivity (P3HPC) Archive Listing

Back to Full Workshop Archive Listing