TY - GEN
T1 - Discerning the dominant out-of-order performance advantage
T2 - 18th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2013
AU - McFarlin, Daniel S.
AU - Tucker, Charles
AU - Zilles, Craig
PY - 2013/4/5
Y1 - 2013/4/5
N2 - In this paper, we set out to study the performance advantages of an Out-of-Order (OOO) processor relative to in-order processors with similar execution resources. In particular, we try to tease apart the performance contributions from two sources: the improved schedules enabled by OOO hardware speculation support and its ability to generate different schedules on different occurrences of the same instructions based on operand and functional unit availability. We find that the ability to express good static schedules achieves the bulk of the speedup resulting from OOO. Specifically, of the 53% speedup achieved by OOO relative to a similarly provisioned inorder machine, we find that 88% of that speedup can be achieved by using a single "best" static schedule as suggested by observing an OOO schedule of the code. We discuss the ISA mechanisms that would be required to express these static schedules. Furthermore, we find that the benefits of dynamism largely come from two kinds of events that influence the application's critical path: load instructions that miss in the cache only part of the time and branch mispredictions. We find that much of the benefit of OOO dynamism can be achieved by the potentially simpler task of addressing these two behaviors directly. Categories and Subject Descriptors D.3.4 [Software]: Programming Languages-Processors:Compilers, Optimization, C.0 [Computer Systems Organization]: General-Hardware/software interfaces General Terms Performance.
AB - In this paper, we set out to study the performance advantages of an Out-of-Order (OOO) processor relative to in-order processors with similar execution resources. In particular, we try to tease apart the performance contributions from two sources: the improved schedules enabled by OOO hardware speculation support and its ability to generate different schedules on different occurrences of the same instructions based on operand and functional unit availability. We find that the ability to express good static schedules achieves the bulk of the speedup resulting from OOO. Specifically, of the 53% speedup achieved by OOO relative to a similarly provisioned inorder machine, we find that 88% of that speedup can be achieved by using a single "best" static schedule as suggested by observing an OOO schedule of the code. We discuss the ISA mechanisms that would be required to express these static schedules. Furthermore, we find that the benefits of dynamism largely come from two kinds of events that influence the application's critical path: load instructions that miss in the cache only part of the time and branch mispredictions. We find that much of the benefit of OOO dynamism can be achieved by the potentially simpler task of addressing these two behaviors directly. Categories and Subject Descriptors D.3.4 [Software]: Programming Languages-Processors:Compilers, Optimization, C.0 [Computer Systems Organization]: General-Hardware/software interfaces General Terms Performance.
KW - Dynamic scheduling
KW - Optimization
KW - Speculation
UR - http://www.scopus.com/inward/record.url?scp=84875684291&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84875684291&partnerID=8YFLogxK
U2 - 10.1145/2451116.2451143
DO - 10.1145/2451116.2451143
M3 - Conference contribution
AN - SCOPUS:84875684291
SN - 9781450318709
T3 - International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS
SP - 241
EP - 251
BT - ASPLOS 2013 - 18th International Conference on Architectural Support for Programming Languages and Operating Systems
Y2 - 16 March 2013 through 20 March 2013
ER -