SUGAR: Speeding Up GPGPU Application Resilience Estimation with Input Sizing
DescriptionAs Graphics Processing Units (GPUs) are becoming a de facto solution for accelerating a wide range of applications, their reliable operation is becoming increasingly important. One major challenge is to accurately measure GPGPU application error resilience. A typical GPGPU application spawns a huge number of threads and utilizes a large amount of potentially unreliable compute and memory resources available on the GPUs. As the number of possible fault locations can be in the billions, evaluating every fault and examining its effect on the application error resilience is impractical. Application resilience is evaluated via extensive fault injection campaigns based on sampling of an extensive fault site space. Typically, the larger the input of the GPGPU application, the longer the experimental campaign. We devise a methodology, SUGAR (Speeding Up GPGPU Application Resilience Estimation with input sizing), that dramatically speeds up the evaluation of GPGPU application error resilience by judicious input sizing.
Event Type
Workshop
TimeSunday, 13 November 20221:32pm - 1:36pm CST
LocationC148
W
Diversity Equity Inclusion (DEI)
Education and Training and Outreach
Recorded