SC22 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Research Posters Archive

Statistical and Causal Analysis of Chimbuko Provenance Database

Authors: Margaret Ajuwon, Serges Love Teutu Talla, and Isabelle Kemajou-Brown (Morgan State University) and Christopher Kelly and Kerstin Kleese Van Dam (Brookhaven National Laboratory)

Abstract: Performance data are collected to establish how well exascale applications are doing with executing their code or workflow as efficiently as possible. Chimbuko, a tool specifically focused on the analysis of performance data in real time, looks through these data and collects performance anomalies that are detected. These anomalies are saved into the Chimbuko Provenance Database, together with as much contextual information as needed. The goal of our work is to perform statistical analysis on the Chimbuko Provenance Database by presenting simple visualizations and determining if the information collected for each anomaly is sufficient to conduct a causal analysis. Statistical methods such as Theil’s U correlation analysis, Logistic regression, and K-Prototype clustering were used to identify association between variables. Furthermore, feature selection was conducted with Decision Tree and Random Forest. We identified association between call_stack and several variables, which reveals that call_stack is a very important feature of the dataset.

Best Poster Finalist (BP): no

Poster: PDF
Poster summary: PDF

Back to Poster Archive Listing