Register and thread structure optimization for GPUs

Yun Liang, Zheng Cui, Kyle Rupnow, Deming Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

GPUs are an increasingly popular implementation platform for a variety of general purpose applications from mobile and embedded devices to high performance computing. The CUDA and OpenCL parallel programming models enable easy utilization of the GPU's resources. However, tuning GPU applications' performance is a complex and labor intensive task. Software programmers employ a variety of optimization techniques to explore tradeoffs between the thread parallelism and performance of a single thread. However, prior techniques ignore register allocation, a significant factor in single thread performance and, indirectly affects the number of simultaneously active threads. In this paper, we show that joint optimization of register allocation and thread structure has great potential to significantly improve performance. However, the design space for this joint optimization can be large; therefore, we develop performance metrics appropriate for evaluation within a compiler's inner loop and efficient design space exploration techniques that use the metrics to narrow the search space. Across a range of GPU applications, we achieve average performance speedup of 1.33X (up to 1.73X) with design space exploration 355X faster than the exhaustive search.

Original languageEnglish (US)
Title of host publication2013 18th Asia and South Pacific Design Automation Conference, ASP-DAC 2013
Pages461-466
Number of pages6
DOIs
StatePublished - 2013
Event2013 18th Asia and South Pacific Design Automation Conference, ASP-DAC 2013 - Yokohama, Japan
Duration: Jan 22 2013Jan 25 2013

Publication series

NameProceedings of the Asia and South Pacific Design Automation Conference, ASP-DAC

Other

Other2013 18th Asia and South Pacific Design Automation Conference, ASP-DAC 2013
Country/TerritoryJapan
CityYokohama
Period1/22/131/25/13

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Computer Science Applications
  • Computer Graphics and Computer-Aided Design

Fingerprint

Dive into the research topics of 'Register and thread structure optimization for GPUs'. Together they form a unique fingerprint.

Cite this