· Contributors · Organizations · Search
An Approach for Large-Scale Distributed FFT Framework on GPUs
SessionResearch Posters Display
DescriptionThe fast Fourier Transforms (FFT), a reduced-complexity formulation of the Discrete Fourier Transform (DFT), dominate the computational cost in many areas of science and engineering. Due to the large-scale data, multi-node heterogeneous systems aspire to meet the increasing demands from parallel computing FFT in the field of High-Performance Computing (HPC). In this work, we present a highly efficient GPU-based distributed FFT framework by adapting the Cooley-Tukey recursive FFT algorithm. Two major types of optimizations, including automatic low-dimensional FFT kernel generation and asynchronous strategy for multi-GPUs, are presented to enhance the performance of our approach for large-scale distributed FFT, and numerical experiments demonstrate that our work achieves more than 40x speedup over CPU FFT libraries and about 2x speedup over heFFTe, currently available state-of-art research, on GPUs.