Authors: Philip Munksgaard and Troels Henriksen (University of Copenhagen), Ponnuswamy Sadayappan (University of Utah), and Cosmin Oancea (University of Copenhagen)
Abstract: We present a technique for introducing and optimizing the use of memory in a functional array language, aimed at GPU execution, that supports correct-by-construction parallelism. Using linear memory access descriptors as building blocks, we define a notion of memory in the compiler IR that enables cost-free change-of-layout transformations (e.g., slicing, transposition), whose results can even be carried across control flow such as ifs/loops without manifestation in memory. The memory notion allows a graceful transition to an unsafe IR that is automatically optimized (1) to mix reads and writes to the same array inside a parallel construct, and (2) to map semantically different arrays to the same memory buffer. The result is code similar to what imperative users would write. Our evaluation shows that our proposed optimizations offer significant speedups (1.1x-2x) and result in performance competitive to hand-written code from challenging public benchmarks, such as Rodinia's NW, LUD, and Hotspot.
Back to Technical Papers Archive Listing