BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Chicago
X-LIC-LOCATION:America/Chicago
BEGIN:DAYLIGHT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
TZNAME:CDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
TZNAME:CST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20230124T171525Z
LOCATION:C1-2-3
DTSTART;TZID=America/Chicago:20221117T083000
DTEND;TZID=America/Chicago:20221117T170000
UID:submissions.supercomputing.org_SC22_sess275_rpost112@linklings.com
SUMMARY:An Approach for Large-Scale Distributed FFT Framework on GPUs
DESCRIPTION:Posters, Research Posters\n\nAn Approach for Large-Scale Distr
 ibuted FFT Framework on GPUs\n\nHu, Zhou, Lu\n\nThe fast Fourier Transform
 s (FFT), a reduced-complexity formulation of the Discrete Fourier Transfor
 m (DFT), dominate the computational cost in many areas of science and engi
 neering. Due to the large-scale data, multi-node heterogeneous systems asp
 ire to meet the increasing demands from parallel computing FFT in the fiel
 d of High-Performance Computing (HPC). In this work, we present a highly e
 fficient GPU-based distributed FFT framework by adapting the Cooley-Tukey 
 recursive FFT algorithm. Two major types of optimizations, including autom
 atic low-dimensional FFT kernel generation and asynchronous strategy for m
 ulti-GPUs, are presented to enhance the performance of our approach for la
 rge-scale distributed FFT, and numerical experiments demonstrate that our 
 work achieves more than 40x speedup over CPU FFT libraries and about 2x sp
 eedup over heFFTe, currently available state-of-art research, on GPUs.\n\n
 Registration Category: Tech Program Reg Pass, Exhibits Reg Pass
END:VEVENT
END:VCALENDAR
