Optimizing power allocation to CPU and memory subsystems in overprovisioned HPC systems

Osman Sarood, Akhil Langer, Laxmikant V Kale, Barry Rountree, Bronis De Supinski

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Energy consumption and power draw pose two major challenges to the HPC community for designing larger systems. Present day HPC systems consume as much as 10MW of electricity and this is fast becoming a bottleneck. Although energy bills will significantly increase with machine size, power consumption is a hard constraint that must be addressed. Intel's Running Average Power Limit (RAPL) toolkit is a recent feature that enables power capping of CPU and memory subsystems on modern hardware. In this paper, we use RAPL to evaluate the possibility of improving execution time efficiency of an application by capping power while adding more nodes. We profile the strong scaling of an application using different power caps for both CPU and memory subsystems. Our proposed interpolation scheme uses an application profile to optimize the number of nodes and the distribution of power between CPU and memory subsystems to minimize execution time under a strict power budget. We validate these estimates by running experiments on a 20-node (120 cores) Sandy Bridge cluster. Our experimental results closely match the model estimates and show speedups greater than 1.47X for all applications compared to not capping CPU and memory power. We demonstrate that the quality of solution that our interpolation scheme provides matches very closely to results obtained via exhaustive profiling.

Original languageEnglish (US)
Title of host publication2013 IEEE International Conference on Cluster Computing, CLUSTER 2013
DOIs
StatePublished - Dec 1 2013
Event15th IEEE International Conference on Cluster Computing, CLUSTER 2013 - Indianapolis, IN, United States
Duration: Sep 23 2013Sep 27 2013

Publication series

NameProceedings - IEEE International Conference on Cluster Computing, ICCC
ISSN (Print)1552-5244

Other

Other15th IEEE International Conference on Cluster Computing, CLUSTER 2013
Country/TerritoryUnited States
CityIndianapolis, IN
Period9/23/139/27/13

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Signal Processing

Fingerprint

Dive into the research topics of 'Optimizing power allocation to CPU and memory subsystems in overprovisioned HPC systems'. Together they form a unique fingerprint.

Cite this