TY - GEN
T1 - BulkCommit
T2 - 46th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2013
AU - Qian, Xuehai
AU - Torrellas, Josep
AU - Sahelices, Benjamin
AU - Qian, Depei
PY - 2013
Y1 - 2013
N2 - To help improve the programmability and performance of shared-memory multiprocessors, there are proposals of architectures that continuously execute atomic blocks of instructions - also called Chunks. To be competitive, these architectures must support chunk operations very efficiently. In particular, in a large manycore with lazy conflict detection, they must support efficient chunk commit. This paper addresses the challenge of providing scalable and fast chunk commit for a large manycore in a lazy environment. To understand the problem, we first present a model of chunk commit in a distributed directory protocol. Then, to attain scalable and fast commit, we propose two general techniques: (1) Serialization of the write sets of output-dependent chunks to avoid squashes and (2) Full parallelization of directory module ownership by the committing chunks. Our simulation results with 64-threaded codes show that our combined scheme, called BulkCommit, eliminates most of the squash and commit stall times, speeding-up the codes by an average of 40% and 18% compared to previously-proposed schemes.
AB - To help improve the programmability and performance of shared-memory multiprocessors, there are proposals of architectures that continuously execute atomic blocks of instructions - also called Chunks. To be competitive, these architectures must support chunk operations very efficiently. In particular, in a large manycore with lazy conflict detection, they must support efficient chunk commit. This paper addresses the challenge of providing scalable and fast chunk commit for a large manycore in a lazy environment. To understand the problem, we first present a model of chunk commit in a distributed directory protocol. Then, to attain scalable and fast commit, we propose two general techniques: (1) Serialization of the write sets of output-dependent chunks to avoid squashes and (2) Full parallelization of directory module ownership by the committing chunks. Our simulation results with 64-threaded codes show that our combined scheme, called BulkCommit, eliminates most of the squash and commit stall times, speeding-up the codes by an average of 40% and 18% compared to previously-proposed schemes.
KW - atomic blocks
KW - bulk operation
KW - cache coherence
KW - hardware transactions
KW - shared-memory multiprocessors
UR - http://www.scopus.com/inward/record.url?scp=84892493760&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84892493760&partnerID=8YFLogxK
U2 - 10.1145/2540708.2540740
DO - 10.1145/2540708.2540740
M3 - Conference contribution
AN - SCOPUS:84892493760
SN - 9781450326384
T3 - MICRO 2013 - Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
SP - 371
EP - 382
BT - MICRO 2013 - Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Y2 - 7 December 2013 through 11 December 2013
ER -