TY - GEN
T1 - OmniOrder
T2 - 2014 ACM/IEEE 41st International Symposium on Computer Architecture, ISCA 2014
AU - Qian, Xuehai
AU - Sahelices, Benjamin
AU - Torrellas, Josep
PY - 2014
Y1 - 2014
N2 - Effective execution of atomic blocks of instructions (also called transactions) can enhance the performance and programmability of multiprocessors. Atomic blocks can be demarcated in software as in Transactional Memory (TM) or dynamically generated by the hardware as in aggressive implementations of strict memory consistency. In most current designs, when two atomic blocks conflict, one is squashed- a performance loss that is often unnecessary. To avoid this waste, this paper presents OmniOrder, the first design that efficiently executes conflicting atomic blocks concurrently in a directory-based coherence environment. The idea is to keep only non-speculative data in the caches and, when the cache coherence protocol transfers a line, include in the message the history of speculative updates to the line. The coherence protocol transitions are unmodified. We evaluate OmniOrder with 64-core simulations. In a TM environment, OmniOrder reduces the execution time of the STAMP applications by an average of 18.4% over a scheme that squashes on conflict. In an environment with SC enforcement with speculation, we run 11 programs that implement concurrent algorithms. OmniOrder reduces the programs' execution time by an average of 15.3% relative to a scheme that squashes on conflict. Finally, OmniOrder's communication overhead of transferring the history of speculative updates is negligible.
AB - Effective execution of atomic blocks of instructions (also called transactions) can enhance the performance and programmability of multiprocessors. Atomic blocks can be demarcated in software as in Transactional Memory (TM) or dynamically generated by the hardware as in aggressive implementations of strict memory consistency. In most current designs, when two atomic blocks conflict, one is squashed- a performance loss that is often unnecessary. To avoid this waste, this paper presents OmniOrder, the first design that efficiently executes conflicting atomic blocks concurrently in a directory-based coherence environment. The idea is to keep only non-speculative data in the caches and, when the cache coherence protocol transfers a line, include in the message the history of speculative updates to the line. The coherence protocol transitions are unmodified. We evaluate OmniOrder with 64-core simulations. In a TM environment, OmniOrder reduces the execution time of the STAMP applications by an average of 18.4% over a scheme that squashes on conflict. In an environment with SC enforcement with speculation, we run 11 programs that implement concurrent algorithms. OmniOrder reduces the programs' execution time by an average of 15.3% relative to a scheme that squashes on conflict. Finally, OmniOrder's communication overhead of transferring the history of speculative updates is negligible.
UR - http://www.scopus.com/inward/record.url?scp=84905457960&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84905457960&partnerID=8YFLogxK
U2 - 10.1109/ISCA.2014.6853223
DO - 10.1109/ISCA.2014.6853223
M3 - Conference contribution
AN - SCOPUS:84905457960
SN - 9781479943968
T3 - Proceedings - International Symposium on Computer Architecture
SP - 421
EP - 432
BT - 41st Annual International Symposium on Computer Architecture, ISCA 2014 - Conference Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 14 June 2014 through 18 June 2014
ER -