Authors: Olivier Beaumont (French Institute for Research in Computer Science and Automation (INRIA)); Philippe Duchon (LaBRI, France); Lionel Eyraud-Dubois (French Institute for Research in Computer Science and Automation (INRIA)); Julien Langou (University of Colorado, Denver); and Mathieu Verite (French Institute for Research in Computer Science and Automation (INRIA))
Abstract: We consider the distributed Cholesky factorization on homogeneous nodes. Inspired by recent progress on asymptotic lower bounds on the total communication volume required to perform Cholesky factorization, we present an original data distribution, Symmetric Block Cyclic (SBC), designed to take advantage of the symmetry of the matrix. We prove that SBC reduces the overall communication volume between nodes by a factor of square root of 2 compared to the standard 2D block-cyclic distribution. SBC can easily be implemented within the paradigm of task-based runtime systems. Experiments using the Chameleon library over the StarPU runtime system demonstrate that the SBC distribution reduces the communication volume as expected, and also achieves better performance and scalability than the classical 2D block-cyclic allocation scheme in all configurations. We also propose a 2.5D variant of SBC and prove that it further improves the communication and performance benefits.
Back to Technical Papers Archive Listing