DescriptionThe sequential task flow (STF) model is the mainstream approach for interacting with task-based runtime systems. Compared with other approaches of submitting tasks into a runtime system, STF has interesting advantages centered around an easy-to-use API, that allows users to expressed algorithms as a sequence of tasks, while allowing the runtime to automatically identify and analyze the task dependencies and scheduling.
We focus on the DTD interface in PaRSEC, highlight some of its lesser known limitations and implemented two optimization techniques for DTD: support for user level graph trimming, and a new API for broadcast read-only data to remote tasks. We then analyze the benefits and limitations of these optimizations with benchmarks as well as on Cholesky and QR matrix factorizations, on two different systems Shaheen-II and Fugaku. We pointed out some potential for further improvements, and provided valuable insights into the strength and weakness of STF model.