Program optimization carving for GPU computing

Shane Ryoo, Christopher I. Rodrigues, Sam S. Stone, John A. Stratton, Sain Zee Ueng, Sara S. Baghsorkhi, Wen-Mei W Hwu

Research output: Contribution to journalArticle

Abstract

Contemporary many-core processors such as the GeForce 8800 GTX enable application developers to utilize various levels of parallelism to enhance the performance of their applications. However, iterative optimization for such a system may lead to a local performance maximum, due to the complexity of the system. We propose program optimization carving, a technique that begins with a complete optimization space and prunes it down to a set of configurations that is likely to contain the global maximum. The remaining configurations can then be evaluated to determine the one with the best performance. The technique can reduce the number of configurations to be evaluated by as much as 98% and is successful at finding a near-best configuration. For some applications, we show that this approach is significantly superior to random sampling of the search space.

Original languageEnglish (US)
Pages (from-to)1389-1401
Number of pages13
JournalJournal of Parallel and Distributed Computing
Volume68
Issue number10
DOIs
StatePublished - Oct 1 2008

Fingerprint

Configuration
Optimization
Computing
Many-core
Random Sampling
Search Space
Sampling
Parallelism
Likely
Graphics processing unit

Keywords

  • GPU computing
  • Optimization space exploration
  • Parallel computing

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture
  • Computer Networks and Communications
  • Artificial Intelligence

Cite this

Ryoo, S., Rodrigues, C. I., Stone, S. S., Stratton, J. A., Ueng, S. Z., Baghsorkhi, S. S., & Hwu, W-M. W. (2008). Program optimization carving for GPU computing. Journal of Parallel and Distributed Computing, 68(10), 1389-1401. https://doi.org/10.1016/j.jpdc.2008.05.011

Program optimization carving for GPU computing. / Ryoo, Shane; Rodrigues, Christopher I.; Stone, Sam S.; Stratton, John A.; Ueng, Sain Zee; Baghsorkhi, Sara S.; Hwu, Wen-Mei W.

In: Journal of Parallel and Distributed Computing, Vol. 68, No. 10, 01.10.2008, p. 1389-1401.

Research output: Contribution to journalArticle

Ryoo, S, Rodrigues, CI, Stone, SS, Stratton, JA, Ueng, SZ, Baghsorkhi, SS & Hwu, W-MW 2008, 'Program optimization carving for GPU computing', Journal of Parallel and Distributed Computing, vol. 68, no. 10, pp. 1389-1401. https://doi.org/10.1016/j.jpdc.2008.05.011
Ryoo S, Rodrigues CI, Stone SS, Stratton JA, Ueng SZ, Baghsorkhi SS et al. Program optimization carving for GPU computing. Journal of Parallel and Distributed Computing. 2008 Oct 1;68(10):1389-1401. https://doi.org/10.1016/j.jpdc.2008.05.011
Ryoo, Shane ; Rodrigues, Christopher I. ; Stone, Sam S. ; Stratton, John A. ; Ueng, Sain Zee ; Baghsorkhi, Sara S. ; Hwu, Wen-Mei W. / Program optimization carving for GPU computing. In: Journal of Parallel and Distributed Computing. 2008 ; Vol. 68, No. 10. pp. 1389-1401.
@article{d4d34fcc86444235bcc24fbad2f558d7,
title = "Program optimization carving for GPU computing",
abstract = "Contemporary many-core processors such as the GeForce 8800 GTX enable application developers to utilize various levels of parallelism to enhance the performance of their applications. However, iterative optimization for such a system may lead to a local performance maximum, due to the complexity of the system. We propose program optimization carving, a technique that begins with a complete optimization space and prunes it down to a set of configurations that is likely to contain the global maximum. The remaining configurations can then be evaluated to determine the one with the best performance. The technique can reduce the number of configurations to be evaluated by as much as 98{\%} and is successful at finding a near-best configuration. For some applications, we show that this approach is significantly superior to random sampling of the search space.",
keywords = "GPU computing, Optimization space exploration, Parallel computing",
author = "Shane Ryoo and Rodrigues, {Christopher I.} and Stone, {Sam S.} and Stratton, {John A.} and Ueng, {Sain Zee} and Baghsorkhi, {Sara S.} and Hwu, {Wen-Mei W}",
year = "2008",
month = "10",
day = "1",
doi = "10.1016/j.jpdc.2008.05.011",
language = "English (US)",
volume = "68",
pages = "1389--1401",
journal = "Journal of Parallel and Distributed Computing",
issn = "0743-7315",
publisher = "Academic Press Inc.",
number = "10",

}

TY - JOUR

T1 - Program optimization carving for GPU computing

AU - Ryoo, Shane

AU - Rodrigues, Christopher I.

AU - Stone, Sam S.

AU - Stratton, John A.

AU - Ueng, Sain Zee

AU - Baghsorkhi, Sara S.

AU - Hwu, Wen-Mei W

PY - 2008/10/1

Y1 - 2008/10/1

N2 - Contemporary many-core processors such as the GeForce 8800 GTX enable application developers to utilize various levels of parallelism to enhance the performance of their applications. However, iterative optimization for such a system may lead to a local performance maximum, due to the complexity of the system. We propose program optimization carving, a technique that begins with a complete optimization space and prunes it down to a set of configurations that is likely to contain the global maximum. The remaining configurations can then be evaluated to determine the one with the best performance. The technique can reduce the number of configurations to be evaluated by as much as 98% and is successful at finding a near-best configuration. For some applications, we show that this approach is significantly superior to random sampling of the search space.

AB - Contemporary many-core processors such as the GeForce 8800 GTX enable application developers to utilize various levels of parallelism to enhance the performance of their applications. However, iterative optimization for such a system may lead to a local performance maximum, due to the complexity of the system. We propose program optimization carving, a technique that begins with a complete optimization space and prunes it down to a set of configurations that is likely to contain the global maximum. The remaining configurations can then be evaluated to determine the one with the best performance. The technique can reduce the number of configurations to be evaluated by as much as 98% and is successful at finding a near-best configuration. For some applications, we show that this approach is significantly superior to random sampling of the search space.

KW - GPU computing

KW - Optimization space exploration

KW - Parallel computing

UR - http://www.scopus.com/inward/record.url?scp=51449112813&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=51449112813&partnerID=8YFLogxK

U2 - 10.1016/j.jpdc.2008.05.011

DO - 10.1016/j.jpdc.2008.05.011

M3 - Article

AN - SCOPUS:51449112813

VL - 68

SP - 1389

EP - 1401

JO - Journal of Parallel and Distributed Computing

JF - Journal of Parallel and Distributed Computing

SN - 0743-7315

IS - 10

ER -