Authors: Maxim Moraru and Mina Warnet (University of Reims Champagne-Ardenne (URCA)) and Julien Loiseau, Vinay Ramakrishnaiah, Nirmal Prajapati, Hyun Lim, Sumathi Lakshmiranganatha, Jamal Mohd-Yusof, Karen Tsai, Richard Berger, and Patrick McCormick (Los Alamos National Laboratory (LANL))
Abstract: GPU matrix chain multiplication serves as a basis for a wide range of scientific domains like computer graphics, physics, and machine learning. While its time performance was studied for years, there has been significantly less effort in optimizing its energy efficiency. GPU power consumption is heavily impacted by the number of data transfers performed. In fact, a data transfer from global memory needs a thousand times more energy than a double precision arithmetic operation. Thus, minimizing data transfers is key for reducing the energy consumption. We present an energy efficient solution for Matrix Chain Multiplication on GPUs that minimizes computation as well as off-chip data transfers. For this, optimizations at three different levels are provided. For a single matrix multiplication, we use a large tile blocking strategy. Then, we extend our approach to three matrices. Finally, we propose a solution for a sequence of matrices.
Best Poster Finalist (BP): no
Poster summary: PDF
Back to Poster Archive Listing