Abstract
This paper studies techniques to improve the performance of memory consistency models for shared-memory multi-processors with ILP processors. The first part of this paper extends earlier work by studying the impact o.f current hardware optimizations to memory consistency implementations, hardware-controlled non-binding prefetching and speculative load execution, on the performance of the processor consistency (PC) memory model. We find that the optimized implementation of PC performs significantly better than the best implementation of sequential consistency (SC) in some cases because PC relaxes the store-to-load ordering constraint of SC. Nevertheless, release consistency (RC) provides significant benefits over PC in some cases, because PC suffers from the negative effects of premature store prefetches and insufficient memory queue sizes. The second part of the paper proposes and evaluates a new technique, speculative retirement, to improve the performance of SC. Speculative retirement alleviates the impact of the store-to-load constraint of SC by allowing loads and subsequent instructions to speculatively commit or retire, even while a previous store is outstanding. Speculative retirement needs additional hardware support (in the form of a history buffer) to recover from possible consistency violations due to such speculative retires. With a 64 element history buffer, speculative retirement reduces the execution time gap between SC and PC to within 11% for all our applications on our base architecture; a significant, though reduced, gap still remains between SC and RC. The third part of our paper evaluates the interactions of the various techniques with larger instruction window sizes. When increasing instruction window size, initially, the previous best implementations of all models generally improve in performance due to increased load and store overlap. With further increases, the performance of PC and RC stabilizes while that of SC often degrades (due to negative effects of previous optimizations), widening the gap between the models. At low base instruction window sizes, speculative retirement is sometimes outperformed by an equivalent increase in instruction window size (because the latter also provides load overlap). However, beyond the point where RC stabilizes, speculative retirement gives comparable or better benefit than an equivalent instruction window increase, with possibly less complexity.
Original language | English (US) |
---|---|
Pages | 199-210 |
Number of pages | 12 |
DOIs | |
State | Published - 1997 |
Externally published | Yes |
Event | Proceedings of the 1997 9th Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA - Newport, RI, USA Duration: Jun 22 1997 → Jun 25 1997 |
Other
Other | Proceedings of the 1997 9th Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA |
---|---|
City | Newport, RI, USA |
Period | 6/22/97 → 6/25/97 |
ASJC Scopus subject areas
- Software
- Safety, Risk, Reliability and Quality