TY - GEN
T1 - Variation among processors under Turbo Boost in HPC systems
AU - Acun, Bilge
AU - Miller, Phil
AU - Kale, Laxmikant V.
N1 - Funding Information:
This work was partially supported by grant PHS-5-P41- RR05969 from the National Institutes of Health, and partially supported by the U.S. Department of Energy, Offfce of Science, Ofice of Advanced Scientific Computer Research, under Contract DEAC02-06CH11357. The authors are also grateful to Cisco Systems Inc. for funding support (gift award CG 587589). This research used resources of the National Energy Research Scientific Computing Center (NERSC), which is supported by the Ofice of Science of the U.S. Department of Energy under Contract No. DEAC02- 05CH11231. This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation grant number OCI-1053575. This research is part of the Blue Waters sustained-petascale computing project, which is supported by the National Science Foundation (award number OCI 07- 25070) and the state of Illinois. Blue Waters is a joint effort of the University of Illinois at Urbana-Champaign and its National Center for Supercomputing Applications. Finally, this research used computer time on Livermore Computing's high performance computing resources, provided under the M&IC Program.
Publisher Copyright:
© 2016 ACM.
PY - 2016/6/1
Y1 - 2016/6/1
N2 - The design and manufacture of present-day CPUs causes inherent variation in supercomputer architectures such as variation in power and temperature of the chips. The variation also manifests itself as frequency differences among processors under Turbo Boost dynamic overclocking. This variation can lead to unpredictable and suboptimal performance in tightly coupled HPC applications. In this study, we use compute-intensive kernels and applications to analyze the variation among processors in four top supercomputers: Edison, Cab, Stampede, and Blue Waters. We observe that there is an execution time difference of up to 16% among processors on the Turbo Boost-enabled supercomputers: Edison, Cab, Stampede. There is less than 1% variation on Blue Waters, which does not have a dynamic overclocking feature. We analyze measurements from temperature and power instrumentation and find that intrinsic differences in the chips' power eficiency is the culprit behind the frequency variation. Moreover, we analyze potential solutions such as disabling Turbo Boost, leaving idle cores and replacing slow chips to mitigate the variation. We also propose a speed-aware dynamic task redistribution (load balancing) algorithm to reduce the negative effects of performance variation. Our speed-aware load balancing algorithm improves the performance up to 18% compared to no load balancing performance and 6% better than the non-speed aware counterpart.
AB - The design and manufacture of present-day CPUs causes inherent variation in supercomputer architectures such as variation in power and temperature of the chips. The variation also manifests itself as frequency differences among processors under Turbo Boost dynamic overclocking. This variation can lead to unpredictable and suboptimal performance in tightly coupled HPC applications. In this study, we use compute-intensive kernels and applications to analyze the variation among processors in four top supercomputers: Edison, Cab, Stampede, and Blue Waters. We observe that there is an execution time difference of up to 16% among processors on the Turbo Boost-enabled supercomputers: Edison, Cab, Stampede. There is less than 1% variation on Blue Waters, which does not have a dynamic overclocking feature. We analyze measurements from temperature and power instrumentation and find that intrinsic differences in the chips' power eficiency is the culprit behind the frequency variation. Moreover, we analyze potential solutions such as disabling Turbo Boost, leaving idle cores and replacing slow chips to mitigate the variation. We also propose a speed-aware dynamic task redistribution (load balancing) algorithm to reduce the negative effects of performance variation. Our speed-aware load balancing algorithm improves the performance up to 18% compared to no load balancing performance and 6% better than the non-speed aware counterpart.
UR - http://www.scopus.com/inward/record.url?scp=84978488528&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84978488528&partnerID=8YFLogxK
U2 - 10.1145/2925426.2926289
DO - 10.1145/2925426.2926289
M3 - Conference contribution
AN - SCOPUS:84978488528
T3 - Proceedings of the International Conference on Supercomputing
BT - Proceedings of the 2016 International Conference on Supercomputing, ICS 2016
PB - Association for Computing Machinery
T2 - 30th International Conference on Supercomputing, ICS 2016
Y2 - 1 June 2016 through 3 June 2016
ER -