BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Chicago
X-LIC-LOCATION:America/Chicago
BEGIN:DAYLIGHT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
TZNAME:CDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
TZNAME:CST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20230124T171525Z
LOCATION:C140-142
DTSTART;TZID=America/Chicago:20221116T103000
DTEND;TZID=America/Chicago:20221116T110000
UID:submissions.supercomputing.org_SC22_sess154_pap265@linklings.com
SUMMARY:Efficient Quantized Sparse Matrix Operations on Tensor Cores
DESCRIPTION:Paper\n\nEfficient Quantized Sparse Matrix Operations on Tenso
r Cores\n\nLi, Osawa, Hoefler\n\nThe exponentially growing model size driv
es the continued success of deep learning, but it brings prohibitive compu
tation and memory cost. From the algorithm perspective, model sparsificati
on and quantization have been studied to alleviate the problem. From the a
rchitecture perspective, hardware vendors provide Tensor cores for acceler
ation. However, it is very challenging to gain practical speedups from spa
rse, low-precision matrix operations on Tensor cores, because of the stric
t requirements for data layout and lack of support for efficiently manipul
ating the low-precision integers. We propose Magicube, a high-performance
sparse-matrix library for low-precision integers on Tensor cores. Magicube
supports SpMM and SDDMM, two major sparse operations in deep learning wit
h mixed precision. Experimental results on an NVIDIA A100 GPU show that Ma
gicube achieves on average 1.44x (up to 2.37x) speedup over the vendor-opt
imized library for sparse kernels, and 1.43x speedup over the state-of-the
-art with a comparable accuracy for end-to-end sparse Transformer inferenc
e.\n\nSession Format: Recorded\n\nTag: Machine Learning and Artificial Int
elligence\n\nRegistration Category: Tech Program Reg Pass\n\nAward Finalis
t: Best Paper Finalist\n\nReproducibility Badges: Artifact Available, Arti
fact Functional, Results Reproduced
END:VEVENT
END:VCALENDAR