Third International Symposium on Checkpointing for Supercomputing (SuperCheck-SC22)
Session Chairs
Event TypeWorkshop
Recorded
Reliability and Resiliency
W
TimeMonday, 14 November 20228:30am - 12pm CST
LocationC143-149
DescriptionAs a primary approach to fault-tolerant computing, Checkpoint/Restart (C/R) is essential to a wide range of HPC communities. While there has been much C/R research and tools development, continued C/R research is indispensable to keep pace with ever-changing HPC architectures, technologies, and workloads. More effort is also needed to narrow the gap between proof-of-concept C/R research codes and production-quality codes capable of deployment in real-world workloads. In this workshop, we will bring together C/R researchers and tools developers, practitioners, application developers, and end users to focus on C/R research and successes in production use, motivating the development of usable C/R tools, the closing of the gap between state-of-the-art research and production, and the harnessing of the full benefits of C/R for the HPC community.
Workshop Website
Workshop Website
Archive
view
Presentations