Authors: Alexander Breuer (Friedrich Schiller University Jena, Germany); Alexander Heinecke (Intel Corporation); Antonio Noack (Friedrich Schiller University Jena, Germany); and Evangelos Georganas and Kirill Voronin (Intel Corporation)
Abstract: Many HPC and certainly AI or DL applications are comprised in their core of small linear algebra operations which are then used to compose large and more complicated tensor operations. Especially in the field of AI/DL portability among different hardware platforms is essential due to an extensive reliance on Python and the high-level nature of many frontends. However, scientists are often faced with the challenge to run their codes in vastly different environments. They therefore have to restrict themselves to high-level languages and hope for good compiler optimizations. Especially for complicated linear algebra operators, as they arise in high-order methods in the computational sciences, this is huge leap of faith. In this work we demonstrate how Tensor Processing Primitives, a low-dimensional SIMD abstraction for various CPU architectures, can be used to obtain very high fractions of floating point peak on seven different CPU micro-architectures offering four different ISAs.
Best Poster Finalist (BP): no
Poster summary: PDF
Back to Poster Archive Listing