Early Experience with Transformer-Based Similarity Analysis for DataRaceBench
DescriptionDataRaceBench is a dedicated benchmark suite to evaluate tools aimed to find data races in OpenMP programs. Using microbenchmarks with or without data races, DRB is able to generate standard quality metrics and provide systematic and quantitative assessments of data race detection tools. However, as the number of microbenchmarks grows, it is challenging to manually identify similar code patterns for DRB, within the context of identifying duplicated kernels or guiding the additions of new kernels. In this paper, we experiment with a transformer-based, deep learning approach to similarity analysis. A state-of-the-art transformer model, CodeBERT, has been adapted to find similar OpenMP code regions. We explore the challenges and the solutions when applying transformer-based similarity analysis to source codes which are unseen by pre-trained transformers. Using comparative experiments of different variants of similarity analysis, we comment on the strengths and limitations of the transformer-based approach and point out future research directions.
Event Type
Workshop
TimeFriday, 18 November 202211:20am - 11:40am CST
LocationD167
W
Correctness
Software Engineering
Recorded