Challenges of workload analysis on large HPC systems; A case study on NCSA Bluewaters

Joseph P. White, Martins Innus, Mahew D. Jones, Robert L. DeLeon, Nikolay Simakov, Jerey T. Palmer, Steven M. Gallo, Tomas R. Furlani, Michael Showerman, Robert Brunner, Andriy Kot, Gregory Bauer, Brett Bode, Jeremy Enos, William Kramer

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

BlueWaters [4] is a petascale-level supercomputer whose mission is to greatly accelerate insight to the most challenging computational and data analysis problems. We performed a detailed workload analysis of Blue Waters [8] using Open XDMoD [10]. .e analysis used approximately 35,000 node hours to process the roughly 95 TB of input data from over 4.5M jobs that ran on Blue Waters during the period that was studied (April 1, 2013-September 30, 2016). .is paper describes the work that was done to collate, process and analyze the data that was collected on Blue Waters, the design decisions that were made, tools that we created and the various so.ware engineering problems that we encountered and solved. In particular, we describe the challenges to data processing unique to BlueWaters engendered by the extremely large jobs that it typically executed.

Original languageEnglish (US)
Title of host publicationPEARC 2017 - Practice and Experience in Advanced Research Computing 2017
Subtitle of host publicationSustainability, Success and Impact
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450352727
DOIs
StatePublished - Jul 9 2017
Event2017 Practice and Experience in Advanced Research Computing, PEARC 2017 - New Orleans, United States
Duration: Jul 9 2017Jul 13 2017

Publication series

NameACM International Conference Proceeding Series
VolumePart F128771

Other

Other2017 Practice and Experience in Advanced Research Computing, PEARC 2017
Country/TerritoryUnited States
CityNew Orleans
Period7/9/177/13/17

Keywords

  • Availability
  • Measurement techniques
  • Modeling techniques
  • Performance attributes
  • Reliability
  • Serviceability

ASJC Scopus subject areas

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Challenges of workload analysis on large HPC systems; A case study on NCSA Bluewaters'. Together they form a unique fingerprint.

Cite this