Authors: Michael Hennecke (Intel Corporation), Kevin Harms (Argonne National Laboratory (ANL)), Dean Hildebrand (Google Cloud), Panagiotis Adamidis (German Climate Computing Centre (DKRZ)), Kelsey Prantis (Intel Corporation)
Abstract: DAOS (https://docs.daos.io/) is an open-source scale-out object store that delivers extremely high performance to the most data-intensive HPC/AI workloads. With growing adoption, DAOS has seen significant community contributions like domain-specific container types, additional hardware support beyond x86_64 (e.g. ARM), and enabling DAOS in the cloud.
This BoF brings together the DAOS community to discuss, share experiences, and brainstorm on future enhancements of DAOS. Topics include practical experiences with on-prem and cloud deployments, application use cases, and the software roadmap. This session targets end users, HPC/AI middleware developers, system administrators, DAOS core software developers, and vendors of DAOS-based hardware/software/cloud offerings.
Long Description: The primary goal of this BoF is to gather DAOS community members to share their experiences with DAOS and brainstorm on future developments. Socializing with other community members is an important secondary goal. The session leaders will present short lightning talks, which will be used to spark the discussion among the participants:
Intel: The current community roadmap will be presented, including an overview of the major features under development (functional enhancements, OS and networking support, supporting next-gen Xeon hardware accelerators, ARM support, ...). Plus a discussion on evolving the CI infrastructure to accommodate broader testing on different platforms and Linux distributions.
ANL: DAOS will be the primary storage of ALCF’s Aurora system with a 230PB DAOS installation and 25TB/s of aggregated bandwidth. The talk will give an update on the Aurora DAOS installation, and discuss the operational aspects of running a system at this scale.
Google Cloud: A discussion of DAOS usage in the Cloud is planned, building on the lessons learned from the collaboration between Intel and Google to automate DAOS deployments on GCP.
DKRZ: DAOS for weather/climate applications has been a focus area of customer PoC activities, and DKRZ will report on first experiences running full ICON simulations with DAOS as the backend storage.
These topics are particularly relevant in 2022: Several large DAOS production deployments are underway, and the technology has demonstrated its performance capabilities with both synthetic workloads (five of the top 10, and 11 of the top 22 entries in the IO500 run DAOS) and full HPC/AI applications. DAOS is also being evaluated for the next round of Exascale and post-Exascale system procurements in major HPC institutions across the US, Europe and Asia.
Expected outcome: Users (and prospective users) of DAOS will get an update on the features, performance, and usability aspects of DAOS from practitioners in the field, and will have the opportunity to share their own work and experiences. The feedback from these conversations will also help the development teams in prioritizing future development efforts.
Background:
The DAOS project started in 2012 through the DoE Fast Forward Storage and I/O program. It aims at supporting nextgen HPC workflows combining simulation, big data and AI in a single storage tier. DAOS presents a rich and scalable storage interface for both structured and unstructured data. DAOS v1.0 was released in 2020; v2.0 is available since Dec/2021. DAOS supports multiple application interfaces including a parallel filesystem, MPI-IO and HDF5 backends, native key/value APIs with C and Python bindings, Hadoop/Spark connector, TensorFlow-IO, and several domain-specific data models.
DAOS is completely open-source (see https://docs.daos.io/ and https://github.com/daos-stack/daos/). It is organized around a vibrant and active community composed of partners (e.g. Lenovo, HPE, Google, Croit, RSC) and end-users (e.g. ANL, ZIB, CERN, ECMWF, DKRZ) collaborating on github, slack, and the public mailing list. An in-person DAOS BoF was held at ISC’22 with around 45 attendees. The community meets virtually once a year for the DAOS User Group (https://dug.daos.io/), which gathered more than 150 engineers last year.
URL:
Back to Birds of a Feather Archive Listing