Workflow Anomaly Detection with Graph Neural Networks
DescriptionReliable execution of scientific workflows is a fundamental concern in computational campaigns. Therefore, detecting and diagnosing anomalies are both important and challenging for workflow executions that span complex, distributed computing infrastructures. We model the scientific workflow as a directed acyclic graph and apply graph neural networks (GNNs) to identify the anomalies at both the workflow and individual job levels. In addition, we generalize our GNN model to take into account a set of workflows together for the anomaly detection task rather than a specific workflow. By taking advantage of learning the hidden representation, not only from the job features but also from the topological information of the workflow, our GNN models demonstrate higher accuracy and better runtime efficiency when compared with conventional machine learning models and other convolutional neural network approaches.
TimeMonday, 14 November 20229:48am - 10am CST
Cloud and Distributed Computing