· Contributors · Organizations · Search
Characterization of Transform-Based Lossy Compression for HPC Datasets
SessionThe 8th International Workshop on Data Analysis and Reduction for Big Scientific Data (DRBSD-8) in Conjunction with SC22
DescriptionAs the scale and complexity of HPC systems keep growing, data compression techniques are often adopted to reduce the data movement bottleneck. While lossy compression becomes preferable to a lossless one because of the potential of generating high compression ratios, it would lose its worth the effort without finding an optimal balance between volume reduction and information loss. The insight of this paper is that quantifying dominant coefficients at the block level reveals the right balance, potentially impacting overall compression ratios. Motivated by this, we characterize three transformation-based lossy compression mechanisms at the block level, using the statistical features that capture data characteristics. We build several prediction models using the collected features and the characteristics of dominant coefficients and evaluate the effectiveness of each model using six HPC datasets. Our results demonstrate that the random forest classifier captures the behavior of dominant coefficients precisely, achieving nearly 99% of prediction accuracy.
Next PresentationNext PresentationWelcome