Authors: Geng Liu (Argonne National Laboratory (ANL)); Amanda Randles (Duke University); Joseph Insley (Argonne National Laboratory (ANL), Northern Illinois University); and Saumil Patel, Silvio Rizzi, and Victor Mateevitsi (Argonne National Laboratory (ANL))
Abstract: In this work, we study the performance-portability of offloaded lattice Boltzmann kernels and the trade-off between portability and efficiency. The study is based on a proxy application for the lattice Boltzmann method (LBM). The performance portability programming framework of Kokkos (with CUDA or SYCL backend) is used and compared with programming models of native CUDA and native SYCL. The Kokkos library supports the mainstream GPU products in the market. The performance of the code can vary with accelerating models, number of GPUs, scale of the problem, propagation patterns and architectures. Both Kokkos library and CUDA toolkit are studied on the supercomputer of ThetaGPU (Argonne Leadership Computing Facility). It is found that Kokkos (CUDA) has almost the same performance as native CUDA. The automatic data and kernel management in Kokkos may sacrifice the efficiency, but the parallelization parameters can also be tuned by Kokkos to optimize the performances.
Best Poster Finalist (BP): no
Poster summary: PDF
Back to Poster Archive Listing