SC22 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Birds of a Feather Archive

The Impact of Data Management on HPC Workloads: How Well Do You Know Your Data?

Authors: Matt Starr (Spectra Logic Corporation), Jason Lohrey (Arcitecta), Allan Williams (National Computational Infrastructure, Australian National University), Mike Martinez (Sandia National Laboratories)

Abstract: Effective data and storage management are crucial for efficient HPC workflows and can accelerate research while achieving reproducibility and preserving data for future reference. Join us as we discuss the impact of data management on HPC workflows and explore real-world use cases and best practices from organizations optimizing their data management in support of breakthrough research. The session will explore data management for immediate computational needs as well as alternatives for long-term data access, management, and preservation. This is an interactive session where we invite the audience to share best practices.

Long Description: Groundbreaking discoveries often come with their own unique data management challenges. The datasets associated with these research initiatives are ever-growing, yet nonetheless require effective data and storage management. Management that balances the interests of researchers with that of infrastructure managers mindful of long-term data access, security, and storage costs. The importance of achieving the right data management strategy becomes paramount as the size of HPC datasets continues its inexorable increase.

Finding the right data and getting it to the right place at the right time is elemental for HPC workflow orchestration. Embedded metadata and custom tagging make today’s datasets easily searchable, yet to realize the full value of their data, HPC storage administrators must ensure that scratch space remains available and currently-needed data is rapidly accessible, whilst ensuring that data not currently needed is not occupying fast storage. As research computing becomes increasingly distributed and collaborative, organizations also face data management challenges such as siloed or missing data; unprotected data that can be jeopardized by ransomware; overburdened environments that lose performance as they scale to exabytes and beyond; and primary storage that becomes overloaded with inactive data sets making storage costs prohibitive.

Join our panel as we discuss the benefits of data management for HPC workflows. The session will address the challenge of “wrangling” data from persistent to scratch storage then back out for long-term archival, preservation, and later reuse. We will cover real-world use cases and best practices from organizations optimizing their data management strategies in support of breakthrough research. The session will feature a dialog between the session leaders and attendees facing data management challenges from the HPC and analytics communities in academia, government, and industry.


Back to Birds of a Feather Archive Listing