TY - GEN
T1 - Bungee jumps
T2 - 48th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2015
AU - McFarlin, Daniel S.
AU - Zilles, Craig
PY - 2015/12/5
Y1 - 2015/12/5
N2 - Indirect branches have historically been a challenge for microarchitectures and code generators alike. The recent steady increase in indirect branch predictability has translated into continual performance improvements especially for Out-of-Order processors which benefit more readily from improvements in branch prediction. In contrast, in-order processors which rely on code generators for performance are still challenged by indirect branches; they are a frequent source of issue stalls and the large number of indirect branch targets and unbiased nature of indirect branches complicate the use of traditional branch handling techniques like assert conversion and predication. To address these limitations, we propose an ISA enhancement with associated code transformation and hardware support that collectively enable the current trend of improved indirect branch predictability to be directly leveraged by code-generators for in-orders. By separating the prediction point of an indirect branch from its resolution point, we enable code generators to emit schedules which more readily match those found by the Out-of-Order. Our technique is particularly beneficial to those processors which leverage dynamic binary translation and optimization such as Transmeta's Efficeon and more recently Nvidia's Project Denver. On a set of indirect branch intensive benchmarks from SPEC 2006, 2000 and 95, we achieve a Geomean speedup on a 4-wide of 11%. We further demonstrate speedups of 23% and 14% speedup on PHP and Python benchmarks.
AB - Indirect branches have historically been a challenge for microarchitectures and code generators alike. The recent steady increase in indirect branch predictability has translated into continual performance improvements especially for Out-of-Order processors which benefit more readily from improvements in branch prediction. In contrast, in-order processors which rely on code generators for performance are still challenged by indirect branches; they are a frequent source of issue stalls and the large number of indirect branch targets and unbiased nature of indirect branches complicate the use of traditional branch handling techniques like assert conversion and predication. To address these limitations, we propose an ISA enhancement with associated code transformation and hardware support that collectively enable the current trend of improved indirect branch predictability to be directly leveraged by code-generators for in-orders. By separating the prediction point of an indirect branch from its resolution point, we enable code generators to emit schedules which more readily match those found by the Out-of-Order. Our technique is particularly beneficial to those processors which leverage dynamic binary translation and optimization such as Transmeta's Efficeon and more recently Nvidia's Project Denver. On a set of indirect branch intensive benchmarks from SPEC 2006, 2000 and 95, we achieve a Geomean speedup on a 4-wide of 11%. We further demonstrate speedups of 23% and 14% speedup on PHP and Python benchmarks.
UR - http://www.scopus.com/inward/record.url?scp=84959918203&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84959918203&partnerID=8YFLogxK
U2 - 10.1145/2830772.2830781
DO - 10.1145/2830772.2830781
M3 - Conference contribution
AN - SCOPUS:84959918203
T3 - Proceedings of the Annual International Symposium on Microarchitecture, MICRO
SP - 370
EP - 382
BT - Proceedings - 48th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2015
PB - IEEE Computer Society
Y2 - 5 December 2015 through 9 December 2015
ER -