OUTRIDER: Efficient memory latency tolerance with decoupled strands

Neal C. Crago, Sanjay J. Patel

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present Outrider, an architecture for throughput-oriented processors that provides memory latency tolerance to improve performance on highly threaded workloads. Outrider enables a single thread of execution to be presented to the architecture as multiple decoupled instruction streams that separate memory-accessing and memory-consuming instructions. The key insight is that by decoupling the instruction streams, the processor pipeline can tolerate memory latency in a way similar to out-of-order designs while relying on a low-complexity in-order micro-architecture. Moreover, instead of adding more threads as is done in modern GPUs, Outrider can tolerate memory latency with fewer threads and reduced contention for resources shared amongst threads. We demonstrate that Outrider can outperform single threaded cores by 23-131% and a 4-way simultaneous multithreaded core by up to 87% on data parallel applications in a 1024-core system. Moreover, Outrider achieves these performance gains without incurring the overhead of additional hardware thread contexts, which results in improved area efficiency compared to a multithreaded core.

Original languageEnglish (US)
Title of host publicationProceeding of the 38th Annual International Symposium on Computer Architecture, ISCA'11
Pages117-128
Number of pages12
DOIs
StatePublished - Sep 13 2011
Event38th Annual International Symposium on Computer Architecture, ISCA'11 - San Jose, CA, United States
Duration: Jun 4 2011Jun 8 2011

Publication series

NameProceedings - International Symposium on Computer Architecture
ISSN (Print)1063-6897

Other

Other38th Annual International Symposium on Computer Architecture, ISCA'11
Country/TerritoryUnited States
CitySan Jose, CA
Period6/4/116/8/11

Keywords

  • Accelerator
  • Computer architecture
  • Memory latency

ASJC Scopus subject areas

  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'OUTRIDER: Efficient memory latency tolerance with decoupled strands'. Together they form a unique fingerprint.

Cite this