BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Chicago
X-LIC-LOCATION:America/Chicago
BEGIN:DAYLIGHT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
TZNAME:CDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
TZNAME:CST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20230124T171527Z
LOCATION:C155
DTSTART;TZID=America/Chicago:20221113T105200
DTEND;TZID=America/Chicago:20221113T111500
UID:submissions.supercomputing.org_SC22_sess428_ws_p3hpc116@linklings.com
SUMMARY:Portable and Efficient Dense Linear Algebra in the Beginning of th
 e Exascale Era
DESCRIPTION:Workshop\n\nPortable and Efficient Dense Linear Algebra in the
  Beginning of the Exascale Era\n\nGates, YarKhan, Sukkari, Akbudak, Cayrol
 s...\n\nThe SLATE project is implementing a distributed dense linear algeb
 ra library for highly-scalable distributed-memory accelerator-based comput
 er systems. The goal is to provide a library that can easily be ported to 
 different hardware (CPUs, GPUs, accelerators) and will provide high perfor
 mance for machines into the future. Current ports include CPUs, CUDA, ROCm
 , and oneAPI. We achieve both performance and portability by leveraging se
 veral layers and abstractions, including OpenMP tasks to track data depend
 encies, MPI for distributed communication, and the BLAS++ and LAPACK++ lib
 raries developed as a portable layer across vendor-optimized CPU and GPU B
 LAS and LAPACK functionality.  We rely on the C++ standard library and tem
 plating to reduce code duplication for better maintainability.  The few ke
 rnels not present in BLAS are implemented in CUDA, HIP, and OpenMP target 
 offload, and are easily ported to new platforms.\n\nSession Format: Record
 ed\n\nTag: Performance Portability\n\nRegistration Category: Workshop Reg 
 Pass
END:VEVENT
END:VCALENDAR
