Real time visualization of monitoring data for large scale HPC systems

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

High Performance Computing (HPC) system users and administrators are often hampered in their ability understand application performance and system behavior due to a lack of sufficient information about how resources, such as memory, CPU, networks and filesystems are being used. While obtaining the related data is a necessary step, it is insufficient without tools that can turn the data into actionable information. Required capabilities of such tools are the ability to efficiently handle vast amounts of data in a timely fashion, the presentation of effective and understandable information representations for large node counts, and the correlation of that data with job and system events. This paper presents visualization approaches and tools that NCSA is developing, combined with the use of freely available web interfaces, to turn the eight billion platform related data points per day being collected from their 27,648 compute node Blue Waters platform into actionable information for both system administrators and users. Insights from the visualizations both at the system and the job levels are also presented.

Original languageEnglish (US)
Title of host publicationProceedings - 2015 IEEE International Conference on Cluster Computing, CLUSTER 2015
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages706-709
Number of pages4
ISBN (Electronic)9781467365987
DOIs
StatePublished - Oct 26 2015
EventIEEE International Conference on Cluster Computing, CLUSTER 2015 - Chicago, United States
Duration: Sep 8 2015Sep 11 2015

Publication series

NameProceedings - IEEE International Conference on Cluster Computing, ICCC
Volume2015-October
ISSN (Print)1552-5244

Other

OtherIEEE International Conference on Cluster Computing, CLUSTER 2015
Country/TerritoryUnited States
CityChicago
Period9/8/159/11/15

Keywords

  • Data visualization
  • Resource monitoring

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Signal Processing

Fingerprint

Dive into the research topics of 'Real time visualization of monitoring data for large scale HPC systems'. Together they form a unique fingerprint.

Cite this