HDF5 in the Era of Exascale and Cloud Computing

Birds of a Feather Archive

HDF5 in the Era of Exascale and Cloud Computing

Authors: Suren Byna (Lawrence Berkeley National Laboratory (LBNL)), Quincey Koziol (Amazon Web Services), Dana Robinson (HDF Group)

Abstract: HDF5 is a pivotal I/O library for scientific applications. In this BoF, we will present new features that target exascale and “cloud HPC” environments, HDF5’s role in the ECP project, and the HDF5 roadmap. We will moderate a panel with representatives from research, commercial, and government organizations who will present case studies on how they use HDF5 for both cloud and exascale systems. This will provide a forum for users to discuss their experiences with HDF5, including new features to access data in object stores and the cloud. Session leaders will moderate open discussion with attendees and solicit feedback.

Long Description: HDF5 is a unique, open-source, high-performance technology suite that consists of an abstract data model, library, and file format used for storing and managing extremely large and/or complex data collections. The technology is used worldwide by government, industry, and academia in a wide range of science, engineering, and business disciplines. The HDF5 suite is included by every major HPC system vendor as part of their core software due to its broad adoption in science applications and ability to improve I/O performance and data organization within HPC environments.

There are >1000 projects on Github utilizing HDF5 due to its (1) versatile, self-describing data model that can represent very complex data objects, relationships between the objects and objects’ metadata; (2) portable binary file format with no limits on the number or size of data objects; (3) software library optimized for efficient I/O; and (4) tools for managing, manipulating, viewing, and analyzing HDF5 data.

Recently, the HDF5 development team has added features to access data in object and cloud storage, as well as exploit storage systems being deployed on today’s exascale systems. These features take advantage of the new storage paradigms and require minimum changes to current HDF5 applications. In addition, for more than two decades the HDF5 development team has been working with researchers all over the globe helping to capture, store and analyze experimental and observational data (EOD) in HDF5, e.g., data collected at light sources and particle accelerators. In the past decade, the amount of simulation, modeling, experimental, and observational data stored in HDF5 and the rate at which this data is collected have created new challenges for the scientists and triggered requests for using these new storage paradigms.

The HDF Group, Lawrence Berkeley Lab, and Amazon AWS HPC teams have been working on enhancing HDF5 to address these challenges. We are excited to present new HDF5 capabilities that will help applications run on exascale systems, exploit object storage, and migrate to the cloud. These features include Single-Writer/Multiple-Reader (SWMR) concurrent access to HDF5 containers, versioned updates to HDF5 files, and multi-dataset collective I/O. HDF5 Virtual Object Layer (VOL) connectors improve performance and allow access to non-POSIX storage systems and we will describe implementing and using them to access Intel's Distributed Asynchronous Object Storage (DAOS), asynchronous I/O using background threads, caching using non-volatile memory (NVM), and object stores in the cloud.

We encourage HDF5 users to share their experiences with the HDF5 latest features applied to real-world problems and will solicit feedback on HDF5 improvements. The HDF5 BoF session format includes time for HDF5 community members to discuss challenges when using HDF5. The discussion will help prioritize features on the HDF5 roadmap and encourage discussion on how the HDF5 community can contribute to the maintenance and future development of HDF5.

URL:

Back to Birds of a Feather Archive Listing