SC22 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Birds of a Feather Archive

Continuum Computing

Authors: Neena Imam (NVIDIA Corporation), Nagi Rao (Oak Ridge National Laboratory (ORNL)), Richard Carlson (DOE Office of Advanced Scientific Computing Research), Ewa Deelman (University of Southern California (USC))

Abstract: High-Performance Computing systems that have been traditionally deployed at a single site are expected to significantly expand their reach to include a variety of remote edge systems. Theses edge systems include computing platforms located near instruments as well as the instruments themselves. Examples range from interconnected ecosystems of large science instruments and supercomputers and vehicle networks orchestrated by large scale AI. These interconnected systems form a continuum wherein computation is distributed in various stages from the edge to the core. This BoF will address the challenges and best practices associated with designing, implementing, and operating such complex computing ecosystems.

Long Description: Computing ecosystems are poised for significant transformations to keep pace with the growth and expansion of geographically distributed science infrastructure that includes IoT edge devices, large scale instruments, upgraded networks, datacenters, as well as exascale computing platforms. Experimental science is also evolving as new approaches are being adopted for effective operation and collaboration of science instruments that are reaching unprecedented scales and complexities. For example, within the next decade, the U.S. Department of Energy (DOE) assets will produce enormous data sets with unparalleled depth and resolution. The global high-energy physics community will deploy Artificial Intelligence (AI)-controlled, city-size scientific instruments (particle accelerators and particle detectors) within the next decade that will produce zettabytes of data. These observational datasets will be combined with exascale-enabled simulations to support major scientific advances. As a result, continuum computing (also known as the digital continuum) paradigm is emerging to enable in-situ AI for predictive analytics, control, and orchestration of instruments that are distributed over distances of hundreds of kilometers. In the continuum paradigm, computation is distributed in various stages from the edge to the core to optimize data movement and response times. Optimizing end-to-end performance in such a complex continuum is challenging. It is necessary to develop new approaches that seamlessly combine resources and services at the edge and along the data paths between multiple facilities and computer resources as needed. Methods based on AI/ML also need to be developed for the convergence of experimental/simulation data and autonomous steering of experiments.

To realize this new paradigm, the scientific community needs to work across multiple disciplines to develop technologies to enable access to distributed data and computing resources globally. Novel solutions are needed for system software, libraries and frameworks, data driven dynamic workflows that can respond to unpredictable data sizes, multisite governance policies, and gathering of actionable experimental metrics, etc. This BoF addresses four areas in this context: 1) infrastructure and software needed to form and operate these continuum systems, 2) analysis tools for performance monitoring and optimization, 3) role of AI methods to design and operate the continuum, and 4) community best practices for seamless integration of computing and instruments.

A BoF session on continuum computing has not been held at SC before. However, this is a very germane topic as the need for continuum computing are emerging from many science domains as evidenced by recent DOE workshops. Given the cross-disciplinary nature of this BoF, we believe the session will be well-attended and enthusiastically received. SC presents a rare opportunity for cross-disciplinary teams to meet and discuss how to create a roadmap for continuum computing. The targeted audience includes computer and computational scientists, instrument scientists, experts in ML/AI, system architects, as well as end users. The session leaders will provide a diverse set of perspectives from industry (Imam), federal government (Carlson), national lab (Rao), and academia (Deelman). Input will be collected from the audience to develop a guide for best practices in continuum computing.


Back to Birds of a Feather Archive Listing