SC22 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Workshops Archive

Dask-Enabled External Tasks for In Transit Analytics

Workshop: PDSW22: 7th International Parallel Data Systems Workshop

Authors: Amal Gueroudji and Julien Bigot (Atomic Energy and Alternative Energies Commission (CEA)) and Bruno Raffin (French Institute for Research in Computer Science and Automation (INRIA))

Abstract: In situ models represent a relevant alternative to classical post hoc workflows as they allow bypassing disk accesses, thus reducing the IO bottleneck. However, as most in situ data analytics tools are based on MPI, they are complicated to use, especially to parallelize irregular algorithms. Deisa, a task-based in situ analytics tool, couples MPI with Dask, providing a higher level and easier way to write in situ analytics. In this work, we improve Deisa's design by introducing three main concepts: deisa virtual arrays, contracts, and external tasks in Dask distributed. Those refinements reduce the load in the centralized scheduler of Dask and integrate selected simulation data in Dask task graphs transparently, improving Deisa's performance and productivity.

Back to PDSW22: 7th International Parallel Data Systems Workshop Archive Listing

Back to Full Workshop Archive Listing