· Contributors · Organizations · Search
Statistical and Causal Analysis of Chimbuko Provenance Database
SessionResearch Posters Display
DescriptionPerformance data are collected to establish how well exascale applications are doing with executing their code or workflow as efficiently as possible. Chimbuko, a tool specifically focused on the analysis of performance data in real time, looks through these data and collects performance anomalies that are detected. These anomalies are saved into the Chimbuko Provenance Database, together with as much contextual information as needed. The goal of our work is to perform statistical analysis on the Chimbuko Provenance Database by presenting simple visualizations and determining if the information collected for each anomaly is sufficient to conduct a causal analysis. Statistical methods such as Theil’s U correlation analysis, Logistic regression, and K-Prototype clustering were used to identify association between variables. Furthermore, feature selection was conducted with Decision Tree and Random Forest. We identified association between call_stack and several variables, which reveals that call_stack is a very important feature of the dataset.