Variation among processors under Turbo Boost in HPC systems

Bilge Acun, Phil Miller, Laxmikant V. Kale

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The design and manufacture of present-day CPUs causes inherent variation in supercomputer architectures such as variation in power and temperature of the chips. The variation also manifests itself as frequency differences among processors under Turbo Boost dynamic overclocking. This variation can lead to unpredictable and suboptimal performance in tightly coupled HPC applications. In this study, we use compute-intensive kernels and applications to analyze the variation among processors in four top supercomputers: Edison, Cab, Stampede, and Blue Waters. We observe that there is an execution time difference of up to 16% among processors on the Turbo Boost-enabled supercomputers: Edison, Cab, Stampede. There is less than 1% variation on Blue Waters, which does not have a dynamic overclocking feature. We analyze measurements from temperature and power instrumentation and find that intrinsic differences in the chips' power eficiency is the culprit behind the frequency variation. Moreover, we analyze potential solutions such as disabling Turbo Boost, leaving idle cores and replacing slow chips to mitigate the variation. We also propose a speed-aware dynamic task redistribution (load balancing) algorithm to reduce the negative effects of performance variation. Our speed-aware load balancing algorithm improves the performance up to 18% compared to no load balancing performance and 6% better than the non-speed aware counterpart.

Original languageEnglish (US)
Title of host publicationProceedings of the 2016 International Conference on Supercomputing, ICS 2016
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450343619
DOIs
StatePublished - Jun 1 2016
Event30th International Conference on Supercomputing, ICS 2016 - Istanbul, Turkey
Duration: Jun 1 2016Jun 3 2016

Publication series

NameProceedings of the International Conference on Supercomputing
Volume01-03-June-2016

Other

Other30th International Conference on Supercomputing, ICS 2016
Country/TerritoryTurkey
CityIstanbul
Period6/1/166/3/16

ASJC Scopus subject areas

  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Variation among processors under Turbo Boost in HPC systems'. Together they form a unique fingerprint.

Cite this