Using speculative retirement and larger instruction windows to narrow the performance gap between memory consistency models

Parthasarathy Ranganathan, Vijay S. Pai, Sarita V. Adve

Research output: Contribution to conferencePaper

Abstract

This paper studies techniques to improve the performance of memory consistency models for shared-memory multi-processors with ILP processors. The first part of this paper extends earlier work by studying the impact o.f current hardware optimizations to memory consistency implementations, hardware-controlled non-binding prefetching and speculative load execution, on the performance of the processor consistency (PC) memory model. We find that the optimized implementation of PC performs significantly better than the best implementation of sequential consistency (SC) in some cases because PC relaxes the store-to-load ordering constraint of SC. Nevertheless, release consistency (RC) provides significant benefits over PC in some cases, because PC suffers from the negative effects of premature store prefetches and insufficient memory queue sizes. The second part of the paper proposes and evaluates a new technique, speculative retirement, to improve the performance of SC. Speculative retirement alleviates the impact of the store-to-load constraint of SC by allowing loads and subsequent instructions to speculatively commit or retire, even while a previous store is outstanding. Speculative retirement needs additional hardware support (in the form of a history buffer) to recover from possible consistency violations due to such speculative retires. With a 64 element history buffer, speculative retirement reduces the execution time gap between SC and PC to within 11% for all our applications on our base architecture; a significant, though reduced, gap still remains between SC and RC. The third part of our paper evaluates the interactions of the various techniques with larger instruction window sizes. When increasing instruction window size, initially, the previous best implementations of all models generally improve in performance due to increased load and store overlap. With further increases, the performance of PC and RC stabilizes while that of SC often degrades (due to negative effects of previous optimizations), widening the gap between the models. At low base instruction window sizes, speculative retirement is sometimes outperformed by an equivalent increase in instruction window size (because the latter also provides load overlap). However, beyond the point where RC stabilizes, speculative retirement gives comparable or better benefit than an equivalent instruction window increase, with possibly less complexity.

Original languageEnglish (US)
Pages199-210
Number of pages12
StatePublished - Jan 1 1997
Externally publishedYes
EventProceedings of the 1997 9th Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA - Newport, RI, USA
Duration: Jun 22 1997Jun 25 1997

Other

OtherProceedings of the 1997 9th Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA
CityNewport, RI, USA
Period6/22/976/25/97

Fingerprint

Data storage equipment
Computer hardware
Inductive logic programming (ILP)
Hardware

ASJC Scopus subject areas

  • Software
  • Safety, Risk, Reliability and Quality

Cite this

Ranganathan, P., Pai, V. S., & Adve, S. V. (1997). Using speculative retirement and larger instruction windows to narrow the performance gap between memory consistency models. 199-210. Paper presented at Proceedings of the 1997 9th Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA, Newport, RI, USA, .

Using speculative retirement and larger instruction windows to narrow the performance gap between memory consistency models. / Ranganathan, Parthasarathy; Pai, Vijay S.; Adve, Sarita V.

1997. 199-210 Paper presented at Proceedings of the 1997 9th Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA, Newport, RI, USA, .

Research output: Contribution to conferencePaper

Ranganathan, P, Pai, VS & Adve, SV 1997, 'Using speculative retirement and larger instruction windows to narrow the performance gap between memory consistency models' Paper presented at Proceedings of the 1997 9th Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA, Newport, RI, USA, 6/22/97 - 6/25/97, pp. 199-210.
Ranganathan P, Pai VS, Adve SV. Using speculative retirement and larger instruction windows to narrow the performance gap between memory consistency models. 1997. Paper presented at Proceedings of the 1997 9th Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA, Newport, RI, USA, .
Ranganathan, Parthasarathy ; Pai, Vijay S. ; Adve, Sarita V. / Using speculative retirement and larger instruction windows to narrow the performance gap between memory consistency models. Paper presented at Proceedings of the 1997 9th Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA, Newport, RI, USA, .12 p.
@conference{ae0873c78c4b4bee999f28bcdadef2a6,
title = "Using speculative retirement and larger instruction windows to narrow the performance gap between memory consistency models",
abstract = "This paper studies techniques to improve the performance of memory consistency models for shared-memory multi-processors with ILP processors. The first part of this paper extends earlier work by studying the impact o.f current hardware optimizations to memory consistency implementations, hardware-controlled non-binding prefetching and speculative load execution, on the performance of the processor consistency (PC) memory model. We find that the optimized implementation of PC performs significantly better than the best implementation of sequential consistency (SC) in some cases because PC relaxes the store-to-load ordering constraint of SC. Nevertheless, release consistency (RC) provides significant benefits over PC in some cases, because PC suffers from the negative effects of premature store prefetches and insufficient memory queue sizes. The second part of the paper proposes and evaluates a new technique, speculative retirement, to improve the performance of SC. Speculative retirement alleviates the impact of the store-to-load constraint of SC by allowing loads and subsequent instructions to speculatively commit or retire, even while a previous store is outstanding. Speculative retirement needs additional hardware support (in the form of a history buffer) to recover from possible consistency violations due to such speculative retires. With a 64 element history buffer, speculative retirement reduces the execution time gap between SC and PC to within 11{\%} for all our applications on our base architecture; a significant, though reduced, gap still remains between SC and RC. The third part of our paper evaluates the interactions of the various techniques with larger instruction window sizes. When increasing instruction window size, initially, the previous best implementations of all models generally improve in performance due to increased load and store overlap. With further increases, the performance of PC and RC stabilizes while that of SC often degrades (due to negative effects of previous optimizations), widening the gap between the models. At low base instruction window sizes, speculative retirement is sometimes outperformed by an equivalent increase in instruction window size (because the latter also provides load overlap). However, beyond the point where RC stabilizes, speculative retirement gives comparable or better benefit than an equivalent instruction window increase, with possibly less complexity.",
author = "Parthasarathy Ranganathan and Pai, {Vijay S.} and Adve, {Sarita V.}",
year = "1997",
month = "1",
day = "1",
language = "English (US)",
pages = "199--210",
note = "Proceedings of the 1997 9th Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA ; Conference date: 22-06-1997 Through 25-06-1997",

}

TY - CONF

T1 - Using speculative retirement and larger instruction windows to narrow the performance gap between memory consistency models

AU - Ranganathan, Parthasarathy

AU - Pai, Vijay S.

AU - Adve, Sarita V.

PY - 1997/1/1

Y1 - 1997/1/1

N2 - This paper studies techniques to improve the performance of memory consistency models for shared-memory multi-processors with ILP processors. The first part of this paper extends earlier work by studying the impact o.f current hardware optimizations to memory consistency implementations, hardware-controlled non-binding prefetching and speculative load execution, on the performance of the processor consistency (PC) memory model. We find that the optimized implementation of PC performs significantly better than the best implementation of sequential consistency (SC) in some cases because PC relaxes the store-to-load ordering constraint of SC. Nevertheless, release consistency (RC) provides significant benefits over PC in some cases, because PC suffers from the negative effects of premature store prefetches and insufficient memory queue sizes. The second part of the paper proposes and evaluates a new technique, speculative retirement, to improve the performance of SC. Speculative retirement alleviates the impact of the store-to-load constraint of SC by allowing loads and subsequent instructions to speculatively commit or retire, even while a previous store is outstanding. Speculative retirement needs additional hardware support (in the form of a history buffer) to recover from possible consistency violations due to such speculative retires. With a 64 element history buffer, speculative retirement reduces the execution time gap between SC and PC to within 11% for all our applications on our base architecture; a significant, though reduced, gap still remains between SC and RC. The third part of our paper evaluates the interactions of the various techniques with larger instruction window sizes. When increasing instruction window size, initially, the previous best implementations of all models generally improve in performance due to increased load and store overlap. With further increases, the performance of PC and RC stabilizes while that of SC often degrades (due to negative effects of previous optimizations), widening the gap between the models. At low base instruction window sizes, speculative retirement is sometimes outperformed by an equivalent increase in instruction window size (because the latter also provides load overlap). However, beyond the point where RC stabilizes, speculative retirement gives comparable or better benefit than an equivalent instruction window increase, with possibly less complexity.

AB - This paper studies techniques to improve the performance of memory consistency models for shared-memory multi-processors with ILP processors. The first part of this paper extends earlier work by studying the impact o.f current hardware optimizations to memory consistency implementations, hardware-controlled non-binding prefetching and speculative load execution, on the performance of the processor consistency (PC) memory model. We find that the optimized implementation of PC performs significantly better than the best implementation of sequential consistency (SC) in some cases because PC relaxes the store-to-load ordering constraint of SC. Nevertheless, release consistency (RC) provides significant benefits over PC in some cases, because PC suffers from the negative effects of premature store prefetches and insufficient memory queue sizes. The second part of the paper proposes and evaluates a new technique, speculative retirement, to improve the performance of SC. Speculative retirement alleviates the impact of the store-to-load constraint of SC by allowing loads and subsequent instructions to speculatively commit or retire, even while a previous store is outstanding. Speculative retirement needs additional hardware support (in the form of a history buffer) to recover from possible consistency violations due to such speculative retires. With a 64 element history buffer, speculative retirement reduces the execution time gap between SC and PC to within 11% for all our applications on our base architecture; a significant, though reduced, gap still remains between SC and RC. The third part of our paper evaluates the interactions of the various techniques with larger instruction window sizes. When increasing instruction window size, initially, the previous best implementations of all models generally improve in performance due to increased load and store overlap. With further increases, the performance of PC and RC stabilizes while that of SC often degrades (due to negative effects of previous optimizations), widening the gap between the models. At low base instruction window sizes, speculative retirement is sometimes outperformed by an equivalent increase in instruction window size (because the latter also provides load overlap). However, beyond the point where RC stabilizes, speculative retirement gives comparable or better benefit than an equivalent instruction window increase, with possibly less complexity.

UR - http://www.scopus.com/inward/record.url?scp=0030721203&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0030721203&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:0030721203

SP - 199

EP - 210

ER -