TY - GEN
T1 - Decoupled architectures as a low-complexity alternative to out-of-order execution
AU - Crago, Neal C.
AU - Patel, Sanjay Jeram
PY - 2011
Y1 - 2011
N2 - In this paper we present OUTRIDERHP, a novel implementation of a decoupled architecture that approaches the performance of contemporary out-of-order processors on parallel benchmarks while maintaining low hardware complexity. OUTRIDERHP leverages the compiler to separate a single thread of execution into memory-accessing and memoryconsuming streams that can be executed concurrently, which we call strands. We identify loss-of-decoupling events which cripple performance on traditional decoupled architectures, and design OUTRIDERHP to enable extraction of multiple strands and control speculation which provide superior memory and functional unit latency tolerance. OUTRIDERHP outperforms a baseline in-order architecture by 26-220% and Decoupled Access/Execute by 7-172% when executing parallel benchmarks on an 8-core CMP configuration. OUTRIDERHP performs within 15% of higher-complexity out-of-order cores despite not utilizing large physical register files, dynamic scheduling, and register renaming hardware.
AB - In this paper we present OUTRIDERHP, a novel implementation of a decoupled architecture that approaches the performance of contemporary out-of-order processors on parallel benchmarks while maintaining low hardware complexity. OUTRIDERHP leverages the compiler to separate a single thread of execution into memory-accessing and memoryconsuming streams that can be executed concurrently, which we call strands. We identify loss-of-decoupling events which cripple performance on traditional decoupled architectures, and design OUTRIDERHP to enable extraction of multiple strands and control speculation which provide superior memory and functional unit latency tolerance. OUTRIDERHP outperforms a baseline in-order architecture by 26-220% and Decoupled Access/Execute by 7-172% when executing parallel benchmarks on an 8-core CMP configuration. OUTRIDERHP performs within 15% of higher-complexity out-of-order cores despite not utilizing large physical register files, dynamic scheduling, and register renaming hardware.
UR - http://www.scopus.com/inward/record.url?scp=84856541275&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84856541275&partnerID=8YFLogxK
U2 - 10.1109/PACT.2011.28
DO - 10.1109/PACT.2011.28
M3 - Conference contribution
AN - SCOPUS:84856541275
SN - 9780769545660
T3 - Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT
SP - 179
EP - 180
BT - Proceedings - 2011 International Conference on Parallel Architectures and Compilation Techniques, PACT 2011
T2 - 20th International Conference on Parallel Architectures and Compilation Techniques, PACT 2011
Y2 - 10 October 2011 through 14 October 2011
ER -