SC22 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Workshops Archive

Task Fusion in Distributed Runtimes


Workshop: The 5th Annual Parallel Applications Workshop, Alternatives to MPI+X (PAW-ATM)

Authors: Shiv Sundram and Alex Aiken (Stanford University) and Wonchan Lee (NVIDIA Corporation)


Abstract: We present distributed task fusion, a run-time optimization for task-based runtimes operating on parallel and heterogeneous systems. Distributed task fusion dynamically performs an efficient buffering, analysis, and fusion of asynchronously-evaluated distributed operations, reducing the overheads inherent to scheduling distributed tasks in implicitly parallel frameworks and runtimes. We identify the constraints under which distributed task fusion is permissible and describe an implementation in Legate, a domain-agnostic library for constructing portable and scalable task-based libraries. We present performance results using cuNumeric, a Legate library that enables scalable execution of NumPy pipelines on parallel and heterogeneous systems. We realize speedups up to 1.5x with task fusion enabled on up to 32 P100 GPUs, thus demonstrating efficient execution of pipelines involving many successive fine-grained tasks. Finally, we discuss potential future work, including complementary optimizations that could result in additional performance improvements.


Website:






Back to The 5th Annual Parallel Applications Workshop, Alternatives to MPI+X (PAW-ATM) Archive Listing



Back to Full Workshop Archive Listing