TY - GEN
T1 - Real time visualization of monitoring data for large scale HPC systems
AU - Showerman, Michael
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015/10/26
Y1 - 2015/10/26
N2 - High Performance Computing (HPC) system users and administrators are often hampered in their ability understand application performance and system behavior due to a lack of sufficient information about how resources, such as memory, CPU, networks and filesystems are being used. While obtaining the related data is a necessary step, it is insufficient without tools that can turn the data into actionable information. Required capabilities of such tools are the ability to efficiently handle vast amounts of data in a timely fashion, the presentation of effective and understandable information representations for large node counts, and the correlation of that data with job and system events. This paper presents visualization approaches and tools that NCSA is developing, combined with the use of freely available web interfaces, to turn the eight billion platform related data points per day being collected from their 27,648 compute node Blue Waters platform into actionable information for both system administrators and users. Insights from the visualizations both at the system and the job levels are also presented.
AB - High Performance Computing (HPC) system users and administrators are often hampered in their ability understand application performance and system behavior due to a lack of sufficient information about how resources, such as memory, CPU, networks and filesystems are being used. While obtaining the related data is a necessary step, it is insufficient without tools that can turn the data into actionable information. Required capabilities of such tools are the ability to efficiently handle vast amounts of data in a timely fashion, the presentation of effective and understandable information representations for large node counts, and the correlation of that data with job and system events. This paper presents visualization approaches and tools that NCSA is developing, combined with the use of freely available web interfaces, to turn the eight billion platform related data points per day being collected from their 27,648 compute node Blue Waters platform into actionable information for both system administrators and users. Insights from the visualizations both at the system and the job levels are also presented.
KW - Data visualization
KW - Resource monitoring
UR - http://www.scopus.com/inward/record.url?scp=84959271975&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84959271975&partnerID=8YFLogxK
U2 - 10.1109/CLUSTER.2015.122
DO - 10.1109/CLUSTER.2015.122
M3 - Conference contribution
AN - SCOPUS:84959271975
T3 - Proceedings - IEEE International Conference on Cluster Computing, ICCC
SP - 706
EP - 709
BT - Proceedings - 2015 IEEE International Conference on Cluster Computing, CLUSTER 2015
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - IEEE International Conference on Cluster Computing, CLUSTER 2015
Y2 - 8 September 2015 through 11 September 2015
ER -