FFT blitz: The tensor cores strike back

Sultan Durrani, Muhammad Saad Chughtai, Abdul Dakkak, Wen Mei Hwu, Lawrence Rauchwerger

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The fast Fourier Transform (FFT), a reduced-complexity formulation of the Discrete Fourier Transform (DFT), is an important tool in many areas of science and engineering. FFTW is a well-known package that follows this approach and is currently one of the fastest available implementations of the FFT. NVIDIA introduced its version of FFTW called cuFFT that achieves high performance on the GPUs. In this work we present a novel way to map the FFT algorithm on the newly introduced Tensor Cores by adapting the the Cooley-Tukey recursive FFT algorithm. We present four major types of optimizations that enhance the performance of our approach for varying FFT sizes and show that the approach consistently outperforms cuFFT with a speedup of about 15% to 250% on average.

Original languageEnglish (US)
Title of host publicationPPoPP 2021 - Proceedings of the 2021 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
PublisherAssociation for Computing Machinery
Pages488-489
Number of pages2
ISBN (Electronic)9781450382946
DOIs
StatePublished - Feb 17 2021
Event26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2021 - Virtual, Online, Korea, Republic of
Duration: Feb 27 2021Mar 3 2021

Publication series

NameProceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP

Conference

Conference26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2021
CountryKorea, Republic of
CityVirtual, Online
Period2/27/213/3/21

Keywords

  • DFT
  • FFT
  • GPU
  • tensor cores

ASJC Scopus subject areas

  • Software

Fingerprint Dive into the research topics of 'FFT blitz: The tensor cores strike back'. Together they form a unique fingerprint.

Cite this