SUMMARY:Deinsum: Practically I/O Optimal Multi-Linear Algebra
DESCRIPTION:Paper\n\nDeinsum: Practically I/O Optimal Multi-Linear Algebra
\n\nZiogas, Kwasniewski, Ben-Nun, Schneider, Hoefler\n\nMultilinear algebr
a kernel performance on modern massively-parallel systems is determined ma
inly by data movement. However, deriving data movement-optimal distribute
d schedules for programs with many high-dimensional inputs is a notoriousl
y hard problem.\n\nState-of-the-art libraries rely on heuristics and often
fall back to suboptimal tensor folding and BLAS calls. We present Deinsu
m, an automated framework for distributed multi-linear algebra computation
s expressed in Einstein notation, based on rigorous mathematical tools to
address this problem. Our framework automatically derives data movement-o
ptimal tiling and generates corresponding distributed schedules, further o
ptimizing the performance of local computations by increasing their arithm
etic intensity.\n\nTo show the benefits of our approach, we test it on two
important tensor kernel classes: Matricized Tensor Times Khatri-Rao Produ
cts and Tensor Times Matrix chains. We show performance results and scali
ng on the Piz Daint supercomputer, with up to 19x speedup over state-of-th
e-art solutions on 512 nodes.\n\nSession Format: Recorded\n\nTag: Applicat
ions, Numerical Algorithms, Security\n\nRegistration Category: Tech Progra
m Reg Pass\n\nReproducibility Badges: Artifact Available, Artifact Functio
nal
