Workshop: The 17th Workshop on Workflows in Support of Large-Scale Science (WORKS22)
Authors: Khairul Alam and Banani Roy (University of Saskatchewan)
Abstract: Scientific workflow is one of the well-established pillars of large-scale computational science and emerged as a torchbearer to formalize and structure a massive amount of complex heterogeneous data and accelerate scientific progress. SWfMSs support the automation of repetitive tasks and capture complex analysis through workflows. However, the execution of workflows is costly and requires a lot of resource usage. At different phases of a workflow life cycle, most SWfMSs store provenance information, allowing result reproducibility, sharing, and knowledge reuse in the scientific community. But, this provenance information can be many times larger than the workflow and input data, and managing provenance data is growing in complexity with large-scale applications. We describe the challenges of provenance managing and reusing in e-science, focusing primarily on scientific workflow approaches by exploring different SWfMSs and provenance management systems. We also investigated the ways to overcome the challenges.