Using speculative retirement and larger instruction windows to narrow the performance gap between memory consistency models

Parthasarathy Ranganathan, Vijay S. Pai, Sarita V. Adve

Research output: Contribution to conferencePaperpeer-review

Abstract

This paper studies techniques to improve the performance of memory consistency models for shared-memory multi-processors with ILP processors. The first part of this paper extends earlier work by studying the impact o.f current hardware optimizations to memory consistency implementations, hardware-controlled non-binding prefetching and speculative load execution, on the performance of the processor consistency (PC) memory model. We find that the optimized implementation of PC performs significantly better than the best implementation of sequential consistency (SC) in some cases because PC relaxes the store-to-load ordering constraint of SC. Nevertheless, release consistency (RC) provides significant benefits over PC in some cases, because PC suffers from the negative effects of premature store prefetches and insufficient memory queue sizes. The second part of the paper proposes and evaluates a new technique, speculative retirement, to improve the performance of SC. Speculative retirement alleviates the impact of the store-to-load constraint of SC by allowing loads and subsequent instructions to speculatively commit or retire, even while a previous store is outstanding. Speculative retirement needs additional hardware support (in the form of a history buffer) to recover from possible consistency violations due to such speculative retires. With a 64 element history buffer, speculative retirement reduces the execution time gap between SC and PC to within 11% for all our applications on our base architecture; a significant, though reduced, gap still remains between SC and RC. The third part of our paper evaluates the interactions of the various techniques with larger instruction window sizes. When increasing instruction window size, initially, the previous best implementations of all models generally improve in performance due to increased load and store overlap. With further increases, the performance of PC and RC stabilizes while that of SC often degrades (due to negative effects of previous optimizations), widening the gap between the models. At low base instruction window sizes, speculative retirement is sometimes outperformed by an equivalent increase in instruction window size (because the latter also provides load overlap). However, beyond the point where RC stabilizes, speculative retirement gives comparable or better benefit than an equivalent instruction window increase, with possibly less complexity.

Original languageEnglish (US)
Pages199-210
Number of pages12
DOIs
StatePublished - 1997
Externally publishedYes
EventProceedings of the 1997 9th Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA - Newport, RI, USA
Duration: Jun 22 1997Jun 25 1997

Other

OtherProceedings of the 1997 9th Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA
CityNewport, RI, USA
Period6/22/976/25/97

ASJC Scopus subject areas

  • Software
  • Safety, Risk, Reliability and Quality

Fingerprint

Dive into the research topics of 'Using speculative retirement and larger instruction windows to narrow the performance gap between memory consistency models'. Together they form a unique fingerprint.

Cite this