TY - GEN
T1 - LeadOut
T2 - 16th International Symposium on High-Performance Computer Architecture, HPCA-16 2010
AU - Greskamp, Brian
AU - Karpuzcu, Ulya R.
AU - Torrellas, Josep
PY - 2010
Y1 - 2010
N2 - Despite the ubiquity of multicores, it is as important as ever to deliver high single-thread performance. An appealing way to accomplish this is by shutting down the idle cores in the chip and running the busy, performance-critical core(s) at higher-than-nominal frequencies. To enable such frequencies, two low-overhead approaches either boost voltage beyond nominal values, or pair cores in leader-checker con.gurations and let them run beyond safe frequency margins. We observe that, in a large multicore with varying numbers of busy cores, individual application of either of these two techniques is suboptimal. Each alone is often unable to bring the multicore all the way to its power or temperature envelopes due to limitations in supply voltage or error rate. Moreover, we show that the two techniques are complementary, and can be synergistically combined to unlock much higher levels of single-thread performance. Finally, we demonstrate a dynamic controller that optimizes the two techniques. Our data shows that, given a 16-core multicore where half of the cores are already busy, an additional, performance-critical thread now attains 34% higher performance than before, while consuming 220% more power.
AB - Despite the ubiquity of multicores, it is as important as ever to deliver high single-thread performance. An appealing way to accomplish this is by shutting down the idle cores in the chip and running the busy, performance-critical core(s) at higher-than-nominal frequencies. To enable such frequencies, two low-overhead approaches either boost voltage beyond nominal values, or pair cores in leader-checker con.gurations and let them run beyond safe frequency margins. We observe that, in a large multicore with varying numbers of busy cores, individual application of either of these two techniques is suboptimal. Each alone is often unable to bring the multicore all the way to its power or temperature envelopes due to limitations in supply voltage or error rate. Moreover, we show that the two techniques are complementary, and can be synergistically combined to unlock much higher levels of single-thread performance. Finally, we demonstrate a dynamic controller that optimizes the two techniques. Our data shows that, given a 16-core multicore where half of the cores are already busy, an additional, performance-critical thread now attains 34% higher performance than before, while consuming 220% more power.
UR - http://www.scopus.com/inward/record.url?scp=77952570189&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77952570189&partnerID=8YFLogxK
U2 - 10.1109/hpca.2010.5416656
DO - 10.1109/hpca.2010.5416656
M3 - Conference contribution
AN - SCOPUS:77952570189
SN - 9781424456581
T3 - Proceedings - International Symposium on High-Performance Computer Architecture
BT - HPCA-16 2010 - The 16th International Symposium on High-Performance Computer Architecture
PB - IEEE Computer Society
Y2 - 9 January 2010 through 14 January 2010
ER -