End-to-end performance modeling of distributed GPU applications

Jaemin Choi, David F. Richards, Laxmikant V. Kale, Abhinav Bhatele

Research output: Chapter in Book/Report/Conference proceedingConference contribution


With the growing number of GPU-based supercomputing platforms and GPU-enabled applications, the ability to accurately model the performance of such applications is becoming increasingly important. Most current performance models for GPU-enabled applications are limited to single node performance. In this work, we propose a methodology for end-to-end performance modeling of distributed GPU applications. Our work strives to create performance models that are both accurate and easily applicable to any distributed GPU application. We combine trace-driven simulation of MPI communication using the TraceR-CODES framework with a profiling-based roofline model for GPU kernels. We make substantial modifications to these models to capture the complex effects of both on-node and off-node networks in today's multi-GPU supercomputers. We validate our model against empirical data from GPU platforms and also vary tunable parameters of our model to observe how they might affect application performance.

Original languageEnglish (US)
Title of host publicationProceedings of the 34th ACM International Conference on Supercomputing, ICS 2020
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450379830
StatePublished - Jun 29 2020
Event34th ACM International Conference on Supercomputing, ICS 2020 - Barcelona, Spain
Duration: Jun 29 2020Jul 2 2020

Publication series

NameProceedings of the International Conference on Supercomputing


Conference34th ACM International Conference on Supercomputing, ICS 2020


  • GPU computing
  • communication
  • performance modeling
  • trace-driven simulation

ASJC Scopus subject areas

  • Computer Science(all)


Dive into the research topics of 'End-to-end performance modeling of distributed GPU applications'. Together they form a unique fingerprint.

Cite this