Challenges of workload analysis on large HPC systems; A case study on NCSA Bluewaters

Joseph P. White, Martins Innus, Mahew D. Jones, Robert L. DeLeon, Nikolay Simakov, Jerey T. Palmer, Steven M. Gallo, Tomas R. Furlani, Michael Showerman, Robert J Brunner, Andriy Kot, Gregory H Bauer, Brett Bode, Jeremy James Enos, William T Kramer

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

BlueWaters [4] is a petascale-level supercomputer whose mission is to greatly accelerate insight to the most challenging computational and data analysis problems. We performed a detailed workload analysis of Blue Waters [8] using Open XDMoD [10]. .e analysis used approximately 35,000 node hours to process the roughly 95 TB of input data from over 4.5M jobs that ran on Blue Waters during the period that was studied (April 1, 2013-September 30, 2016). .is paper describes the work that was done to collate, process and analyze the data that was collected on Blue Waters, the design decisions that were made, tools that we created and the various so.ware engineering problems that we encountered and solved. In particular, we describe the challenges to data processing unique to BlueWaters engendered by the extremely large jobs that it typically executed.

Original languageEnglish (US)
Title of host publicationPEARC 2017 - Practice and Experience in Advanced Research Computing 2017
Subtitle of host publicationSustainability, Success and Impact
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450352727
DOIs
StatePublished - Jul 9 2017
Event2017 Practice and Experience in Advanced Research Computing, PEARC 2017 - New Orleans, United States
Duration: Jul 9 2017Jul 13 2017

Publication series

NameACM International Conference Proceeding Series
VolumePart F128771

Other

Other2017 Practice and Experience in Advanced Research Computing, PEARC 2017
CountryUnited States
CityNew Orleans
Period7/9/177/13/17

Fingerprint

Water
Supercomputers

Keywords

  • Availability
  • Measurement techniques
  • Modeling techniques
  • Performance attributes
  • Reliability
  • Serviceability

ASJC Scopus subject areas

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications

Cite this

White, J. P., Innus, M., Jones, M. D., DeLeon, R. L., Simakov, N., Palmer, J. T., ... Kramer, W. T. (2017). Challenges of workload analysis on large HPC systems; A case study on NCSA Bluewaters. In PEARC 2017 - Practice and Experience in Advanced Research Computing 2017: Sustainability, Success and Impact [a6] (ACM International Conference Proceeding Series; Vol. Part F128771). Association for Computing Machinery. https://doi.org/10.1145/3093338.3093348

Challenges of workload analysis on large HPC systems; A case study on NCSA Bluewaters. / White, Joseph P.; Innus, Martins; Jones, Mahew D.; DeLeon, Robert L.; Simakov, Nikolay; Palmer, Jerey T.; Gallo, Steven M.; Furlani, Tomas R.; Showerman, Michael; Brunner, Robert J; Kot, Andriy; Bauer, Gregory H; Bode, Brett; Enos, Jeremy James; Kramer, William T.

PEARC 2017 - Practice and Experience in Advanced Research Computing 2017: Sustainability, Success and Impact. Association for Computing Machinery, 2017. a6 (ACM International Conference Proceeding Series; Vol. Part F128771).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

White, JP, Innus, M, Jones, MD, DeLeon, RL, Simakov, N, Palmer, JT, Gallo, SM, Furlani, TR, Showerman, M, Brunner, RJ, Kot, A, Bauer, GH, Bode, B, Enos, JJ & Kramer, WT 2017, Challenges of workload analysis on large HPC systems; A case study on NCSA Bluewaters. in PEARC 2017 - Practice and Experience in Advanced Research Computing 2017: Sustainability, Success and Impact., a6, ACM International Conference Proceeding Series, vol. Part F128771, Association for Computing Machinery, 2017 Practice and Experience in Advanced Research Computing, PEARC 2017, New Orleans, United States, 7/9/17. https://doi.org/10.1145/3093338.3093348
White JP, Innus M, Jones MD, DeLeon RL, Simakov N, Palmer JT et al. Challenges of workload analysis on large HPC systems; A case study on NCSA Bluewaters. In PEARC 2017 - Practice and Experience in Advanced Research Computing 2017: Sustainability, Success and Impact. Association for Computing Machinery. 2017. a6. (ACM International Conference Proceeding Series). https://doi.org/10.1145/3093338.3093348
White, Joseph P. ; Innus, Martins ; Jones, Mahew D. ; DeLeon, Robert L. ; Simakov, Nikolay ; Palmer, Jerey T. ; Gallo, Steven M. ; Furlani, Tomas R. ; Showerman, Michael ; Brunner, Robert J ; Kot, Andriy ; Bauer, Gregory H ; Bode, Brett ; Enos, Jeremy James ; Kramer, William T. / Challenges of workload analysis on large HPC systems; A case study on NCSA Bluewaters. PEARC 2017 - Practice and Experience in Advanced Research Computing 2017: Sustainability, Success and Impact. Association for Computing Machinery, 2017. (ACM International Conference Proceeding Series).
@inproceedings{a97d8099a9f24c61949dc6601bb91386,
title = "Challenges of workload analysis on large HPC systems; A case study on NCSA Bluewaters",
abstract = "BlueWaters [4] is a petascale-level supercomputer whose mission is to greatly accelerate insight to the most challenging computational and data analysis problems. We performed a detailed workload analysis of Blue Waters [8] using Open XDMoD [10]. .e analysis used approximately 35,000 node hours to process the roughly 95 TB of input data from over 4.5M jobs that ran on Blue Waters during the period that was studied (April 1, 2013-September 30, 2016). .is paper describes the work that was done to collate, process and analyze the data that was collected on Blue Waters, the design decisions that were made, tools that we created and the various so.ware engineering problems that we encountered and solved. In particular, we describe the challenges to data processing unique to BlueWaters engendered by the extremely large jobs that it typically executed.",
keywords = "Availability, Measurement techniques, Modeling techniques, Performance attributes, Reliability, Serviceability",
author = "White, {Joseph P.} and Martins Innus and Jones, {Mahew D.} and DeLeon, {Robert L.} and Nikolay Simakov and Palmer, {Jerey T.} and Gallo, {Steven M.} and Furlani, {Tomas R.} and Michael Showerman and Brunner, {Robert J} and Andriy Kot and Bauer, {Gregory H} and Brett Bode and Enos, {Jeremy James} and Kramer, {William T}",
year = "2017",
month = "7",
day = "9",
doi = "10.1145/3093338.3093348",
language = "English (US)",
series = "ACM International Conference Proceeding Series",
publisher = "Association for Computing Machinery",
booktitle = "PEARC 2017 - Practice and Experience in Advanced Research Computing 2017",

}

TY - GEN

T1 - Challenges of workload analysis on large HPC systems; A case study on NCSA Bluewaters

AU - White, Joseph P.

AU - Innus, Martins

AU - Jones, Mahew D.

AU - DeLeon, Robert L.

AU - Simakov, Nikolay

AU - Palmer, Jerey T.

AU - Gallo, Steven M.

AU - Furlani, Tomas R.

AU - Showerman, Michael

AU - Brunner, Robert J

AU - Kot, Andriy

AU - Bauer, Gregory H

AU - Bode, Brett

AU - Enos, Jeremy James

AU - Kramer, William T

PY - 2017/7/9

Y1 - 2017/7/9

N2 - BlueWaters [4] is a petascale-level supercomputer whose mission is to greatly accelerate insight to the most challenging computational and data analysis problems. We performed a detailed workload analysis of Blue Waters [8] using Open XDMoD [10]. .e analysis used approximately 35,000 node hours to process the roughly 95 TB of input data from over 4.5M jobs that ran on Blue Waters during the period that was studied (April 1, 2013-September 30, 2016). .is paper describes the work that was done to collate, process and analyze the data that was collected on Blue Waters, the design decisions that were made, tools that we created and the various so.ware engineering problems that we encountered and solved. In particular, we describe the challenges to data processing unique to BlueWaters engendered by the extremely large jobs that it typically executed.

AB - BlueWaters [4] is a petascale-level supercomputer whose mission is to greatly accelerate insight to the most challenging computational and data analysis problems. We performed a detailed workload analysis of Blue Waters [8] using Open XDMoD [10]. .e analysis used approximately 35,000 node hours to process the roughly 95 TB of input data from over 4.5M jobs that ran on Blue Waters during the period that was studied (April 1, 2013-September 30, 2016). .is paper describes the work that was done to collate, process and analyze the data that was collected on Blue Waters, the design decisions that were made, tools that we created and the various so.ware engineering problems that we encountered and solved. In particular, we describe the challenges to data processing unique to BlueWaters engendered by the extremely large jobs that it typically executed.

KW - Availability

KW - Measurement techniques

KW - Modeling techniques

KW - Performance attributes

KW - Reliability

KW - Serviceability

UR - http://www.scopus.com/inward/record.url?scp=85025823609&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85025823609&partnerID=8YFLogxK

U2 - 10.1145/3093338.3093348

DO - 10.1145/3093338.3093348

M3 - Conference contribution

AN - SCOPUS:85025823609

T3 - ACM International Conference Proceeding Series

BT - PEARC 2017 - Practice and Experience in Advanced Research Computing 2017

PB - Association for Computing Machinery

ER -