High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel Constructs

William S. Moses, Ivan R. Ivanov, Jens Domke, Toshio Endo, Johannes Doerfert, Oleksandr Zinenko

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

While parallelism remains the main source of performance, architectural implementations and programming models change with each new hardware generation, often leading to costly application re-engineering. Most tools for performance portability require manual and costly application porting to yet another programming model. We propose an alternative approach that automatically translates programs written in one programming model (CUDA), into another (CPU threads) based on Polygeist/MLIR. Our approach includes a representation of parallel constructs that allows conventional compiler transformations to apply transparently and without modification and enables parallelism-specific optimizations. We evaluate our framework by transpiling and optimizing the CUDA Rodinia benchmark suite for a multi-core CPU and achieve a 58% geomean speedup over handwritten OpenMP code. Further, we show how CUDA kernels from PyTorch can efficiently run and scale on the CPU-only Supercomputer Fugaku without user intervention. Our PyTorch compatibility layer making use of transpiled CUDA PyTorch kernels outperforms the PyTorch CPU native backend by 2.7×.

Original languageEnglish (US)
Title of host publicationPPoPP 2023 - Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming
PublisherAssociation for Computing Machinery
Pages119-134
Number of pages16
ISBN (Electronic)9798400700156
DOIs
StatePublished - Feb 25 2023
Externally publishedYes
Event28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, PPoPP 2023 - Montreal, Canada
Duration: Feb 25 2023Mar 1 2023

Publication series

NameProceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP

Conference

Conference28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, PPoPP 2023
Country/TerritoryCanada
CityMontreal
Period2/25/233/1/23

Keywords

  • barrier synchronization
  • CUDA
  • MLIR
  • polygeist

ASJC Scopus subject areas

  • Software

Fingerprint

Dive into the research topics of 'High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel Constructs'. Together they form a unique fingerprint.

Cite this