Accelerating Fourier and Number Theoretic Transforms using Tensor Cores and Warp Shuffles

Sultan Durrani, Muhammad Saad Chughtai, Mert Hidayetoglu, Rashid Tahir, Abdul Dakkak, Lawrence Rauchwerger, Fareed Zaffar, Wen Mei Hwu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The discrete Fourier transform (DFT) and its specialized case, the number theoretic transform (NTT), are two important mathematical tools having applications in several areas of science and engineering. However, despite their usefulness and utility, their adoption continues to be a challenge as computing the DFT of a signal can be a time-consuming and expensive operation. To speed things up, fast Fourier transform (FFT) algorithms, which are reduced-complexity formulations for computing the DFT of a sequence, have been proposed and implemented for traditional processors and their corresponding instruction sets. With the rise of GPUs, NVIDIA introduced its own FFT computation library called cuFFT, which leverages the power of GPUs to compute the DFT. However, as this paper demonstrates, there is a lot of room for improvement to accelerate the FFT and NTT algorithms on modern GPUs by utilizing specialized operations and architectural advancements. In particular, we present four major types of optimizations that leverage tensor cores and the warp-shuffle instruction. Through extensive evaluations, we show that our approach consistently outperforms existing GPU-based implementations with a speedup of up to 4× for NTT and a speed of up to 1.5× for FFT.

Original languageEnglish (US)
Title of host publicationProceedings - 30th International Conference on Parallel Architectures and Compilation Techniques, PACT 2021
EditorsJaejin Lee, Albert Cohen
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages345-355
Number of pages11
ISBN (Electronic)9781665442787
DOIs
StatePublished - 2021
Event30th International Conference on Parallel Architectures and Compilation Techniques, PACT 2021 - Virtual, Onliine, United States
Duration: Sep 26 2021Sep 29 2021

Publication series

NameParallel Architectures and Compilation Techniques - Conference Proceedings, PACT
Volume2021-September
ISSN (Print)1089-795X

Conference

Conference30th International Conference on Parallel Architectures and Compilation Techniques, PACT 2021
Country/TerritoryUnited States
CityVirtual, Onliine
Period9/26/219/29/21

Keywords

  • Cyphertext
  • DFT
  • FFT
  • GPU
  • Homomorphic encryption
  • NTT
  • Tensor cores

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'Accelerating Fourier and Number Theoretic Transforms using Tensor Cores and Warp Shuffles'. Together they form a unique fingerprint.

Cite this