Workshop: LLVM-HPC2022: The Eighth Workshop on the LLVM Compiler Infrastructure in HPC
Authors: Rafael Andres Herrera Guaitero (University of Delaware, Computer Architecture and Parallel System Laboratory (CAPSL)); Jose Manuel Monsalve Diaz and Thomas Applencourt (Argonne National Laboratory (ANL)); Xiaoming Li (University of Delaware); and Johannes Doerfert (Argonne National Laboratory (ANL))
Abstract: Use of heterogeneous architectures has steadily increased during the past decade. However, non-homogeneous systems present a challenge to the programming model as the execution models between CPU and accelerator might differ considerably. OpenMP, since version 4.0, has been trying to bridge this gap by allowing to offload a code block to a target device. Among the additions to the OpenMP offloading API since, the most notably probably is asynchronous execution between device and host. By default, offloaded regions are executed synchronously, thus the host thread blocks until their completion. The nowait clause allows work to overlap between the host and target device. However, nowait must be manually added by the user, along with the tasks data dependencies and appropriate synchronization to avoid race conditions, increasing the program complexity and developer burden.