Challenges of workload analysis on large HPC systems; A case study on NCSA Bluewaters

Joseph P. White, Martins Innus, Mahew D. Jones, Robert L. DeLeon, Nikolay Simakov, Jerey T. Palmer, Steven M. Gallo, Tomas R. Furlani, Michael Showerman, Robert J Brunner, Andriy Kot, Gregory H Bauer, Brett Bode, Jeremy James Enos, William T Kramer

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

BlueWaters [4] is a petascale-level supercomputer whose mission is to greatly accelerate insight to the most challenging computational and data analysis problems. We performed a detailed workload analysis of Blue Waters [8] using Open XDMoD [10]. .e analysis used approximately 35,000 node hours to process the roughly 95 TB of input data from over 4.5M jobs that ran on Blue Waters during the period that was studied (April 1, 2013-September 30, 2016). .is paper describes the work that was done to collate, process and analyze the data that was collected on Blue Waters, the design decisions that were made, tools that we created and the various so.ware engineering problems that we encountered and solved. In particular, we describe the challenges to data processing unique to BlueWaters engendered by the extremely large jobs that it typically executed.

Original languageEnglish (US)
Title of host publicationPEARC 2017 - Practice and Experience in Advanced Research Computing 2017
Subtitle of host publicationSustainability, Success and Impact
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450352727
DOIs
StatePublished - Jul 9 2017
Event2017 Practice and Experience in Advanced Research Computing, PEARC 2017 - New Orleans, United States
Duration: Jul 9 2017Jul 13 2017

Publication series

NameACM International Conference Proceeding Series
VolumePart F128771

Other

Other2017 Practice and Experience in Advanced Research Computing, PEARC 2017
CountryUnited States
CityNew Orleans
Period7/9/177/13/17

Keywords

  • Availability
  • Measurement techniques
  • Modeling techniques
  • Performance attributes
  • Reliability
  • Serviceability

ASJC Scopus subject areas

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications

Fingerprint Dive into the research topics of 'Challenges of workload analysis on large HPC systems; A case study on NCSA Bluewaters'. Together they form a unique fingerprint.

  • Cite this

    White, J. P., Innus, M., Jones, M. D., DeLeon, R. L., Simakov, N., Palmer, J. T., Gallo, S. M., Furlani, T. R., Showerman, M., Brunner, R. J., Kot, A., Bauer, G. H., Bode, B., Enos, J. J., & Kramer, W. T. (2017). Challenges of workload analysis on large HPC systems; A case study on NCSA Bluewaters. In PEARC 2017 - Practice and Experience in Advanced Research Computing 2017: Sustainability, Success and Impact [a6] (ACM International Conference Proceeding Series; Vol. Part F128771). Association for Computing Machinery. https://doi.org/10.1145/3093338.3093348