TY - GEN
T1 - Branch vanguard
T2 - 42nd Annual International Symposium on Computer Architecture, ISCA 2015
AU - McFarlin, Daniel S.
AU - Zilles, Craig
PY - 2015/6/13
Y1 - 2015/6/13
N2 - While control speculation is highly effective for generating good schedules in out-of-order processors, it is less effective for in-order processors because compilers have trouble scheduling in the presence of unbiased branches, even when those branches are highly predictable. In this paper, we demonstrate a novel architectural branch decomposition that separates the prediction and deconvergence point of a branch from its resolution, which enables the compiler to profitably schedule across predictable, but unbiased branches. We show that the hardware support for this branch architecture is a trivial extension of existing systems and describe a simple code transformation for exploiting this architectural support. As architectural changes are required, this technique is most compelling for a dynamic binary translation-based system like Project Denver. We evaluate the performance improvements enabled by this transformation for several in-order configurations across the SPEC 2006 benchmark suites. We show that our technique produces a Geomean speedup of 11% for SPEC 2006 Integer, with speedups as large as 35%. As floating point benchmarks contain fewer unbiased, but predictable branches, our Geomean speedup on SPEC 2006 FP is 7%, with a maximum speedup of 26%.
AB - While control speculation is highly effective for generating good schedules in out-of-order processors, it is less effective for in-order processors because compilers have trouble scheduling in the presence of unbiased branches, even when those branches are highly predictable. In this paper, we demonstrate a novel architectural branch decomposition that separates the prediction and deconvergence point of a branch from its resolution, which enables the compiler to profitably schedule across predictable, but unbiased branches. We show that the hardware support for this branch architecture is a trivial extension of existing systems and describe a simple code transformation for exploiting this architectural support. As architectural changes are required, this technique is most compelling for a dynamic binary translation-based system like Project Denver. We evaluate the performance improvements enabled by this transformation for several in-order configurations across the SPEC 2006 benchmark suites. We show that our technique produces a Geomean speedup of 11% for SPEC 2006 Integer, with speedups as large as 35%. As floating point benchmarks contain fewer unbiased, but predictable branches, our Geomean speedup on SPEC 2006 FP is 7%, with a maximum speedup of 26%.
UR - http://www.scopus.com/inward/record.url?scp=84960076025&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84960076025&partnerID=8YFLogxK
U2 - 10.1145/2749469.2750400
DO - 10.1145/2749469.2750400
M3 - Conference contribution
AN - SCOPUS:84960076025
T3 - Proceedings - International Symposium on Computer Architecture
SP - 323
EP - 335
BT - ISCA 2015 - 42nd Annual International Symposium on Computer Architecture, Conference Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 13 June 2015 through 17 June 2015
ER -