Workshop: PDSW22: 7th International Parallel Data Systems Workshop
Authors: Hyungro Lee, Jesun Firoz, and Nathan R. Tallent (Pacific Northwest National Laboratory (PNNL)) and Meng Tang, Anthony Kougkas, and Xian-He Sun (Illinois Institute of Technology)
Abstract: Scientific exploration is increasingly dependent on the convergence of scientific modeling, data analytics, and machine learning. The result is data-intensive workflows that are composed of multiple stages of computation and communication between distributed and heterogeneous computing resources. Data movement through storage systems is frequently the most significant bottleneck, which is compounded by increasingly large data volumes and rates. To identify opportunities for optimizing data movement, we are developing novel workflow telemetry that highlights data objects’ dynamic flow, reuse, lifetime, and locality. Our objective is to enable modeling and reasoning about task-data locality, especially compared to default placement and data exchange, and the scheduling of anticipatory data movement that selects what data should be staged in memory and when.