An adaptive performance modeling tool for GPU architectures

Sara S. Baghsorkhi, Matthieu Delahaye, Sanjay Jeram Patel, William D Gropp, Wen-Mei W Hwu

Research output: Contribution to journalArticle

Abstract

This paper presents an analytical model to predict the performance of general-purpose applications on a GPU architecture. The model is designed to provide performance information to an auto-tuning compiler and assist it in narrowing down the search to the more promising implementations. It can also be incorporated into a tool to help programmers better assess the performance bottlenecks in their code. We analyze each GPU kernel and identify how the kernel exercises major GPU microarchitecture features. To identify the performance bottlenecks accurately, we introduce an abstract interpretation of a GPU kernel, work flow graph, based on which we estimate the execution time of a GPU kernel. We validated our performance model on the NVIDIA GPUs using CUDA (Compute Unified Device Architecture). For this purpose, we used data parallel benchmarks that stress different GPU microarchitecture events such as uncoalesced memory accesses, scratch-pad memory bank conflicts, and control flow divergence, which must be accurately modeled but represent challenges to the analytical performance models. The proposed model captures full system complexity and shows high accuracy in predicting the performance trends of different optimized kernel implementations. We also describe our approach to extracting the performance model automatically from akernel code.

Original languageEnglish (US)
Pages (from-to)105-114
Number of pages10
JournalACM SIGPLAN Notices
Volume45
Issue number5
DOIs
StatePublished - May 1 2010

Fingerprint

Flow graphs
Data storage equipment
Graphics processing unit
Flow control
Analytical models
Tuning

Keywords

  • Analytical model
  • GPU
  • Parallel programming
  • Performance estimation

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

An adaptive performance modeling tool for GPU architectures. / Baghsorkhi, Sara S.; Delahaye, Matthieu; Patel, Sanjay Jeram; Gropp, William D; Hwu, Wen-Mei W.

In: ACM SIGPLAN Notices, Vol. 45, No. 5, 01.05.2010, p. 105-114.

Research output: Contribution to journalArticle

@article{dff6debfab584adabb69da283567b7b8,
title = "An adaptive performance modeling tool for GPU architectures",
abstract = "This paper presents an analytical model to predict the performance of general-purpose applications on a GPU architecture. The model is designed to provide performance information to an auto-tuning compiler and assist it in narrowing down the search to the more promising implementations. It can also be incorporated into a tool to help programmers better assess the performance bottlenecks in their code. We analyze each GPU kernel and identify how the kernel exercises major GPU microarchitecture features. To identify the performance bottlenecks accurately, we introduce an abstract interpretation of a GPU kernel, work flow graph, based on which we estimate the execution time of a GPU kernel. We validated our performance model on the NVIDIA GPUs using CUDA (Compute Unified Device Architecture). For this purpose, we used data parallel benchmarks that stress different GPU microarchitecture events such as uncoalesced memory accesses, scratch-pad memory bank conflicts, and control flow divergence, which must be accurately modeled but represent challenges to the analytical performance models. The proposed model captures full system complexity and shows high accuracy in predicting the performance trends of different optimized kernel implementations. We also describe our approach to extracting the performance model automatically from akernel code.",
keywords = "Analytical model, GPU, Parallel programming, Performance estimation",
author = "Baghsorkhi, {Sara S.} and Matthieu Delahaye and Patel, {Sanjay Jeram} and Gropp, {William D} and Hwu, {Wen-Mei W}",
year = "2010",
month = "5",
day = "1",
doi = "10.1145/1837853.1693470",
language = "English (US)",
volume = "45",
pages = "105--114",
journal = "ACM SIGPLAN Notices",
issn = "1523-2867",
publisher = "Association for Computing Machinery (ACM)",
number = "5",

}

TY - JOUR

T1 - An adaptive performance modeling tool for GPU architectures

AU - Baghsorkhi, Sara S.

AU - Delahaye, Matthieu

AU - Patel, Sanjay Jeram

AU - Gropp, William D

AU - Hwu, Wen-Mei W

PY - 2010/5/1

Y1 - 2010/5/1

N2 - This paper presents an analytical model to predict the performance of general-purpose applications on a GPU architecture. The model is designed to provide performance information to an auto-tuning compiler and assist it in narrowing down the search to the more promising implementations. It can also be incorporated into a tool to help programmers better assess the performance bottlenecks in their code. We analyze each GPU kernel and identify how the kernel exercises major GPU microarchitecture features. To identify the performance bottlenecks accurately, we introduce an abstract interpretation of a GPU kernel, work flow graph, based on which we estimate the execution time of a GPU kernel. We validated our performance model on the NVIDIA GPUs using CUDA (Compute Unified Device Architecture). For this purpose, we used data parallel benchmarks that stress different GPU microarchitecture events such as uncoalesced memory accesses, scratch-pad memory bank conflicts, and control flow divergence, which must be accurately modeled but represent challenges to the analytical performance models. The proposed model captures full system complexity and shows high accuracy in predicting the performance trends of different optimized kernel implementations. We also describe our approach to extracting the performance model automatically from akernel code.

AB - This paper presents an analytical model to predict the performance of general-purpose applications on a GPU architecture. The model is designed to provide performance information to an auto-tuning compiler and assist it in narrowing down the search to the more promising implementations. It can also be incorporated into a tool to help programmers better assess the performance bottlenecks in their code. We analyze each GPU kernel and identify how the kernel exercises major GPU microarchitecture features. To identify the performance bottlenecks accurately, we introduce an abstract interpretation of a GPU kernel, work flow graph, based on which we estimate the execution time of a GPU kernel. We validated our performance model on the NVIDIA GPUs using CUDA (Compute Unified Device Architecture). For this purpose, we used data parallel benchmarks that stress different GPU microarchitecture events such as uncoalesced memory accesses, scratch-pad memory bank conflicts, and control flow divergence, which must be accurately modeled but represent challenges to the analytical performance models. The proposed model captures full system complexity and shows high accuracy in predicting the performance trends of different optimized kernel implementations. We also describe our approach to extracting the performance model automatically from akernel code.

KW - Analytical model

KW - GPU

KW - Parallel programming

KW - Performance estimation

UR - http://www.scopus.com/inward/record.url?scp=77957561221&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77957561221&partnerID=8YFLogxK

U2 - 10.1145/1837853.1693470

DO - 10.1145/1837853.1693470

M3 - Article

AN - SCOPUS:77957561221

VL - 45

SP - 105

EP - 114

JO - ACM SIGPLAN Notices

JF - ACM SIGPLAN Notices

SN - 1523-2867

IS - 5

ER -