Toward Increasing Trust in Exascale Simulations
DescriptionIn recent decades, High Performance Computing (HPC) and simulations have become determinant in many areas of engineering and science. Since many HPC applications rely extensively on floating-point arithmetic operations, many kinds of numerical errors can be introduced during the program execution, leading to instability or reproducibility problems. One kind of these error sources is cancellation which produces inaccurate results when two nearby numbers are subtracted. In this article, we present Candy, a new dynamic library that detects cancellations in numerical codes. Our method computes the number of significant bits of floating-point numbers by attaching a shadow value in higher precision to each number. This helps to detect in an accurate way if a program suffers from cancellation problems and thus to increase the trust in large-scale HPC applications and exascale simulations. We evaluate Candy over a set of real-world numerical applications. Also, we compare Candy against the state-of-art tool FPChecker.
TimeMonday, 14 November 20222:40pm - 3pm CST
