Hybrid latency tolerance for robust energy-efficiency on 1000-core data parallel processors

Neal C. Crago, Omid Azizi, Steven S. Lumetta, Sanjay J. Patel

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Currently, GPUs and data parallel processors leverage latency tolerance techniques such as multithreading and prefetching to maximize performance per Watt. However, choosing a technique that provides energy-efficiency on a wide variety of workloads is difficult, as the type of latency to tolerate, required hardware complexity, and energy consumption is directly related to application behavior. After qualitatively evaluating five commonly used latency tolerance techniques, we develop a hybrid technique utilizing multithreading and decoupled execution to maximize performance while minimizing hardware complexity and energy consumption across a wide variety of workloads. We compare our hybrid technique with the five commonly used techniques on a 1024-core data parallel processor by performing a comprehensive design space exploration, leveraging detailed performance and physical design models. By intelligently leveraging both decoupled execution and multithreading, our hybrid latency tolerance technique is able to improve energy-efficiency by 28% to 89% over any single technique on data parallel benchmarks. Compared to other combinations of latency tolerance techniques, we find that our hybrid latency tolerance technique provides the highest energy-efficiency by over 26%.

Original languageEnglish (US)
Title of host publication19th IEEE International Symposium on High Performance Computer Architecture, HPCA 2013
Pages294-305
Number of pages12
DOIs
StatePublished - Jul 23 2013
Event19th IEEE International Symposium on High Performance Computer Architecture, HPCA 2013 - Shenzhen, China
Duration: Feb 23 2013Feb 27 2013

Publication series

NameProceedings - International Symposium on High-Performance Computer Architecture
ISSN (Print)1530-0897

Other

Other19th IEEE International Symposium on High Performance Computer Architecture, HPCA 2013
CountryChina
CityShenzhen
Period2/23/132/27/13

ASJC Scopus subject areas

  • Hardware and Architecture

Fingerprint Dive into the research topics of 'Hybrid latency tolerance for robust energy-efficiency on 1000-core data parallel processors'. Together they form a unique fingerprint.

Cite this