TY - GEN
T1 - Hardware atomicity for reliable software speculation
AU - Neelakantam, Naveen
AU - Rajwar, Ravi
AU - Srinivas, Suresh
AU - Srinivasan, Uma
AU - Zilles, Craig
PY - 2007
Y1 - 2007
N2 - Speculative compiler optimizations are effective in improving both single-thread performance and reducing power consumption, but their implementation introduces significant complexity, which can limit their adoption, limit their optimization scope, and negatively impact the reliability of the compilers that implement them. To eliminate much of this complexity, as well as increase the effectiveness of these optimizations, we propose that microprocessors provide architecturally-visible hardware primitives for atomic execution. These primitives provide to the compiler the ability to optimize the program's hot path in isolation, allowing the use of non-speculative formulations of optimization passes to perform speculative optimizations. Atomic execution guarantees that if a speculation invariant does not hold, the speculative updates are discarded, the register state is restored, and control is transferred to a non-speculative version of the code, thereby relieving the compiler from the responsibility of generating compensation code. We demonstrate the benefit of hardware atomicity in the context of a Java virtual machine. We find incorporating the notion of atomic regions into an existing compiler intermediate representation to be natural, requiring roughly 3,000 lines of code (∼3% of a JVM's optimizing compiler), most of which were for region formation. Its incorporation creates new opportunities for existing optimization passes, as well as greatly simplifying the implementation of additional optimizations (e.g., partial inlining, partial loop unrolling, and speculative lock elision). These optimizations reduce dynamic instruction count by 11% on average and result in a 10-15% average speedup, relative to a baseline compiler with a similar degree of inlining.
AB - Speculative compiler optimizations are effective in improving both single-thread performance and reducing power consumption, but their implementation introduces significant complexity, which can limit their adoption, limit their optimization scope, and negatively impact the reliability of the compilers that implement them. To eliminate much of this complexity, as well as increase the effectiveness of these optimizations, we propose that microprocessors provide architecturally-visible hardware primitives for atomic execution. These primitives provide to the compiler the ability to optimize the program's hot path in isolation, allowing the use of non-speculative formulations of optimization passes to perform speculative optimizations. Atomic execution guarantees that if a speculation invariant does not hold, the speculative updates are discarded, the register state is restored, and control is transferred to a non-speculative version of the code, thereby relieving the compiler from the responsibility of generating compensation code. We demonstrate the benefit of hardware atomicity in the context of a Java virtual machine. We find incorporating the notion of atomic regions into an existing compiler intermediate representation to be natural, requiring roughly 3,000 lines of code (∼3% of a JVM's optimizing compiler), most of which were for region formation. Its incorporation creates new opportunities for existing optimization passes, as well as greatly simplifying the implementation of additional optimizations (e.g., partial inlining, partial loop unrolling, and speculative lock elision). These optimizations reduce dynamic instruction count by 11% on average and result in a 10-15% average speedup, relative to a baseline compiler with a similar degree of inlining.
KW - Atomicity
KW - Checkpoint
KW - Isolation
KW - Java
KW - Optimization
KW - Speculation
UR - http://www.scopus.com/inward/record.url?scp=35348918162&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=35348918162&partnerID=8YFLogxK
U2 - 10.1145/1250662.1250684
DO - 10.1145/1250662.1250684
M3 - Conference contribution
AN - SCOPUS:35348918162
SN - 1595937064
SN - 9781595937063
T3 - Proceedings - International Symposium on Computer Architecture
SP - 174
EP - 185
BT - ISCA'07
T2 - ISCA'07: 34th Annual International Symposium on Computer Architecture
Y2 - 9 June 2007 through 13 June 2007
ER -