TY - GEN
T1 - BulkCompactor
T2 - 18th IEEE International Symposium on High Performance Computer Architecture, HPCA - 18 2012
AU - Duan, Yuelu
AU - Zhou, Xing
AU - Ahn, Wonsun
AU - Torrellas, Josep
PY - 2012
Y1 - 2012
N2 - Recent proposals for determinism-enforcement architectures are able to honor the dependences between threads through a commit step that often becomes a performance bottleneck. As they commit code blocks (or chunks) in a round-robin order, if one chunk gets squashed due to a conflict, its successors also observe a stall. We call this effect transitive squash delay. This paper proposes a novel, high-performance approach to deterministic execution based on Conflict-Aware commit. Rather than committing chunks in strict round-robin order, the idea is to skip those chunks with conflicts and deterministically execute them slightly later. The scheme, called BulkCompactor, largely eliminates transitive squash delay, "compacts" the chunk commits, and substantially speeds-up execution. With BulkCompactor, the squash overhead is O(N) rather than O(N 2) as in round-robin. We describe BulkCompactor designs for machines with centralized or distributed commit. Finally, a simulation-based evaluation shows that BulkCompactor delivers performance comparable to nondeter-ministic systems. For example, for 32 processors, BulkCompactor incurs an average execution overhead of 22% over a nondetermin-istic system. The round-robin scheme's average overhead is 133%.
AB - Recent proposals for determinism-enforcement architectures are able to honor the dependences between threads through a commit step that often becomes a performance bottleneck. As they commit code blocks (or chunks) in a round-robin order, if one chunk gets squashed due to a conflict, its successors also observe a stall. We call this effect transitive squash delay. This paper proposes a novel, high-performance approach to deterministic execution based on Conflict-Aware commit. Rather than committing chunks in strict round-robin order, the idea is to skip those chunks with conflicts and deterministically execute them slightly later. The scheme, called BulkCompactor, largely eliminates transitive squash delay, "compacts" the chunk commits, and substantially speeds-up execution. With BulkCompactor, the squash overhead is O(N) rather than O(N 2) as in round-robin. We describe BulkCompactor designs for machines with centralized or distributed commit. Finally, a simulation-based evaluation shows that BulkCompactor delivers performance comparable to nondeter-ministic systems. For example, for 32 processors, BulkCompactor incurs an average execution overhead of 22% over a nondetermin-istic system. The round-robin scheme's average overhead is 133%.
UR - http://www.scopus.com/inward/record.url?scp=84860344951&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84860344951&partnerID=8YFLogxK
U2 - 10.1109/HPCA.2012.6169040
DO - 10.1109/HPCA.2012.6169040
M3 - Conference contribution
AN - SCOPUS:84860344951
SN - 9781467308243
T3 - Proceedings - International Symposium on High-Performance Computer Architecture
SP - 361
EP - 372
BT - Proceedings - 18th IEEE International Symposium on High Performance Computer Architecture, HPCA - 18 2012
Y2 - 25 February 2012 through 29 February 2012
ER -