End-to-end performance modeling of distributed GPU applications

Jaemin Choi, David F. Richards, Laxmikant V. Kale, Abhinav Bhatele

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

With the growing number of GPU-based supercomputing platforms and GPU-enabled applications, the ability to accurately model the performance of such applications is becoming increasingly important. Most current performance models for GPU-enabled applications are limited to single node performance. In this work, we propose a methodology for end-to-end performance modeling of distributed GPU applications. Our work strives to create performance models that are both accurate and easily applicable to any distributed GPU application. We combine trace-driven simulation of MPI communication using the TraceR-CODES framework with a profiling-based roofline model for GPU kernels. We make substantial modifications to these models to capture the complex effects of both on-node and off-node networks in today's multi-GPU supercomputers. We validate our model against empirical data from GPU platforms and also vary tunable parameters of our model to observe how they might affect application performance.

Original languageEnglish (US)
Title of host publicationProceedings of the 34th ACM International Conference on Supercomputing, ICS 2020
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450379830
DOIs
StatePublished - Jun 29 2020
Event34th ACM International Conference on Supercomputing, ICS 2020 - Barcelona, Spain
Duration: Jun 29 2020Jul 2 2020

Publication series

NameProceedings of the International Conference on Supercomputing

Conference

Conference34th ACM International Conference on Supercomputing, ICS 2020
Country/TerritorySpain
CityBarcelona
Period6/29/207/2/20

Keywords

  • GPU computing
  • communication
  • performance modeling
  • trace-driven simulation

ASJC Scopus subject areas

  • General Computer Science

Fingerprint

Dive into the research topics of 'End-to-end performance modeling of distributed GPU applications'. Together they form a unique fingerprint.

Cite this