TY - GEN
T1 - Optimizing power allocation to CPU and memory subsystems in overprovisioned HPC systems
AU - Sarood, Osman
AU - Langer, Akhil
AU - Kale, Laxmikant V
AU - Rountree, Barry
AU - De Supinski, Bronis
PY - 2013
Y1 - 2013
N2 - Energy consumption and power draw pose two major challenges to the HPC community for designing larger systems. Present day HPC systems consume as much as 10MW of electricity and this is fast becoming a bottleneck. Although energy bills will significantly increase with machine size, power consumption is a hard constraint that must be addressed. Intel's Running Average Power Limit (RAPL) toolkit is a recent feature that enables power capping of CPU and memory subsystems on modern hardware. In this paper, we use RAPL to evaluate the possibility of improving execution time efficiency of an application by capping power while adding more nodes. We profile the strong scaling of an application using different power caps for both CPU and memory subsystems. Our proposed interpolation scheme uses an application profile to optimize the number of nodes and the distribution of power between CPU and memory subsystems to minimize execution time under a strict power budget. We validate these estimates by running experiments on a 20-node (120 cores) Sandy Bridge cluster. Our experimental results closely match the model estimates and show speedups greater than 1.47X for all applications compared to not capping CPU and memory power. We demonstrate that the quality of solution that our interpolation scheme provides matches very closely to results obtained via exhaustive profiling.
AB - Energy consumption and power draw pose two major challenges to the HPC community for designing larger systems. Present day HPC systems consume as much as 10MW of electricity and this is fast becoming a bottleneck. Although energy bills will significantly increase with machine size, power consumption is a hard constraint that must be addressed. Intel's Running Average Power Limit (RAPL) toolkit is a recent feature that enables power capping of CPU and memory subsystems on modern hardware. In this paper, we use RAPL to evaluate the possibility of improving execution time efficiency of an application by capping power while adding more nodes. We profile the strong scaling of an application using different power caps for both CPU and memory subsystems. Our proposed interpolation scheme uses an application profile to optimize the number of nodes and the distribution of power between CPU and memory subsystems to minimize execution time under a strict power budget. We validate these estimates by running experiments on a 20-node (120 cores) Sandy Bridge cluster. Our experimental results closely match the model estimates and show speedups greater than 1.47X for all applications compared to not capping CPU and memory power. We demonstrate that the quality of solution that our interpolation scheme provides matches very closely to results obtained via exhaustive profiling.
UR - http://www.scopus.com/inward/record.url?scp=84893548567&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84893548567&partnerID=8YFLogxK
U2 - 10.1109/CLUSTER.2013.6702684
DO - 10.1109/CLUSTER.2013.6702684
M3 - Conference contribution
AN - SCOPUS:84893548567
SN - 9781479908981
T3 - Proceedings - IEEE International Conference on Cluster Computing, ICCC
BT - 2013 IEEE International Conference on Cluster Computing, CLUSTER 2013
T2 - 15th IEEE International Conference on Cluster Computing, CLUSTER 2013
Y2 - 23 September 2013 through 27 September 2013
ER -