SC22 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Workshops Archive

Toward Increasing Trust in Exascale Simulations

Workshop: The 4th Annual Workshop on Extreme-Scale Experiment-in-the-Loop Computing

Authors: Dorra Ben Khalifa (University of Perpignan, France; LAMPS Laboratory); Xinyi Li (University of Utah); Ignacio Laguna (Lawrence Livermore National Laboratory); Matthieu Martel (University of Perpignan, France; LAMPS Laboratory); and Ganesh Gopalakrishnan (University of Utah)

Abstract: In recent decades, High Performance Computing (HPC) and simulations have become determinant in many areas of engineering and science. Since many HPC applications rely extensively on floating-point arithmetic operations, many kinds of numerical errors can be introduced during the program execution, leading to instability or reproducibility problems. One kind of these error sources is cancellation which produces inaccurate results when two nearby numbers are subtracted. In this article, we present Candy, a new dynamic library that detects cancellations in numerical codes. Our method computes the number of significant bits of floating-point numbers by attaching a shadow value in higher precision to each number. This helps to detect in an accurate way if a program suffers from cancellation problems and thus to increase the trust in large-scale HPC applications and exascale simulations. We evaluate Candy over a set of real-world numerical applications. Also, we compare Candy against the state-of-art tool FPChecker.

Back to The 4th Annual Workshop on Extreme-Scale Experiment-in-the-Loop Computing Archive Listing

Back to Full Workshop Archive Listing