BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Chicago
X-LIC-LOCATION:America/Chicago
BEGIN:DAYLIGHT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
TZNAME:CDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
TZNAME:CST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20230124T171524Z
LOCATION:C1-2-3
DTSTART;TZID=America/Chicago:20221117T083000
DTEND;TZID=America/Chicago:20221117T170000
UID:submissions.supercomputing.org_SC22_sess275_rpost172@linklings.com
SUMMARY:Transformations for Energy Efficient Accelerated Chain Matrix Mult
 iplication (TEE-ACM2)
DESCRIPTION:Posters, Research Posters\n\nTransformations for Energy Effici
 ent Accelerated Chain Matrix Multiplication (TEE-ACM2)\n\nMoraru, Warnet, 
 Loiseau, Ramakrishnaiah, Prajapati...\n\nGPU matrix chain multiplication s
 erves as a basis for a wide range of scientific domains like computer grap
 hics, physics, and machine learning. While its time performance was studie
 d for years, there has been significantly less effort in optimizing its en
 ergy efficiency. GPU power consumption is heavily impacted by the number o
 f data transfers performed. In fact, a data transfer from global memory ne
 eds a thousand times more energy than a double precision arithmetic operat
 ion. Thus, minimizing data transfers is key for reducing the energy consum
 ption. We present an energy efficient solution for Matrix Chain Multiplica
 tion on GPUs that minimizes computation as well as off-chip data transfers
 . For this, optimizations at three different levels are provided. For a si
 ngle matrix multiplication, we use a large tile blocking strategy. Then, w
 e extend our approach to three matrices. Finally, we propose a solution fo
 r a sequence of matrices.\n\nRegistration Category: Tech Program Reg Pass,
  Exhibits Reg Pass
END:VEVENT
END:VCALENDAR
