SC22 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Birds of a Feather Archive

Supporting Distributed Research Teams and Their Data Sharing Needs through Federated, Performant, On-Prem Object Storage: The Open Storage Network

Authors: John Goodhue (Massachusetts Green High Performance Computing Center (MGHPCC)), Christine Kirkpatrick (San Diego Supercomputer Center (SDSC)), Melissa Cragin (San Diego Supercomputer Center (SDSC))

Abstract: The Open Storage Network (OSN) provides a performant and cost efficient distributed data sharing and transfer service for active scientific data sets, providing easy access and high bandwidth delivery of large data sets to researchers and compute resources. Following its inception in 2017, the OSN transitioned to a production-level pilot and welcomed others to utilize the network. Today, users can request storage allocations of 1TB-50TB via the ACCESS allocation process. This BoF provides an update to the community, and open discussion to address questions on joining the network, and gather input on user requirements.

Long Description: The Open Storage Network (OSN) is an NSF-funded distributed data sharing service intended to facilitate exchanges of active scientific data sets between research organizations, communities and projects, providing easy access and high bandwidth delivery of large data sets to researchers. The OSN serves two principal purposes: (1) enable the smooth flow of large data sets between resources such as instruments, campus data centers, national supercomputing centers, and cloud providers; and (2) facilitate access to long tail data sets by the scientific community. Examples of data currently available on the OSN include synthetic data from ocean models; the widely used Extracted Features Set from the Hathi Trust Digital Library; open access earth sciences data from Pangeo; and Geophysical Data from BCO-DMO. These data sets are being used by researchers to train machine learning models, validate simulations, and perform statistical analysis of live data. The target OSN user community is well-represented by SC attendees. OSN data is housed in storage pods, each providing up to a petabyte of storage, and interconnected by national, high-performance networks, creating well-connected, cloud-like storage that is easily accessible to high data transfer rates comparable to or exceeding the public cloud storage providers, where users can temporarily park data for retrieval by a collaborator or create a repository of active research data. OSN leverages Ceph, commodity hardware, Ansible, and other open source methodologies and technologies. OSN pods are typically located in Science DMZs at the host institution and secure access is provided via the Incommon Federation. Since its inception in 2017, the OSN has been prototyping its service offerings with a community of friendly researchers while building out the network of pods. The Schmidt Foundation provided funding for first prototypes, followed by NSF. In fall 2022, OSN transitioned to a production-level pilot and began marketing both its services and the opportunity to participate in the network through the purchase of storage pods to the research computing community, beginning with a four-part webinar series that culminated in April 2021 with a session that attracted over 450 registrations. In January 2021, OSN also became a resource allocatable through XSEDE XRAS process, where users can request startup allocations of 1TB-10TB and production allocations of 1TB-50TB. Allocations greater than 50TB and up to 300TB can be requested by contacting the OSN team directly. This BoF will begin with a brief update on trends and the state of the art in research storage and lessons learned from building the OSN network, followed by details about the OSN, how to use it and how to get involved. This will be followed by brief presentations from two users. Finally, we will solicit feedback from BoF participants about their research storage needs, the sustainability of OSN to address these needs, and any barriers they see to use. The outcome of the BoF will be to build community and identify new areas of needed innovation in federated object storage.


