Deploying a large petascale system: The Blue Waters experience

Research output: Contribution to journalConference article

Abstract

Deployment of a large parallel system is typically a very complex process, involving several steps of preparation, delivery, installation, testing and acceptance. Despite the availability of various petascale machines currently, the steps and lessons from their deployment are rarely described in the literature. This paper presents the experiences observed during the deployment of Blue Waters, the largest supercomputer ever built by Cray and one of the most powerful machines currently available for open science. The presentation is focused on the final deployment steps, where the system was intensively tested and accepted by NCSA. After a brief introduction of the Blue Waters architecture, a detailed description of the set of acceptance tests employed is provided, including many of the obtained results. This is followed by the major lessons learned during the process. Those experiences and lessons should be useful to guide similarly complex deployments in the future.

Original languageEnglish (US)
Pages (from-to)198-209
Number of pages12
JournalProcedia Computer Science
Volume29
DOIs
StatePublished - Jan 1 2014
Event14th Annual International Conference on Computational Science, ICCS 2014 - Cairns, QLD, Australia
Duration: Jun 10 2014Jun 12 2014

Fingerprint

Acceptance tests
Supercomputers
Water
Availability
Testing

Keywords

  • Acceptance testing
  • Large system deployment
  • Petascale performance

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Deploying a large petascale system : The Blue Waters experience. / Mendes, Celso Luiz; Bode, Brett; Bauer, Gregory H; Enos, Jeremy James; Beldica, Cristina; Kramer, William T.

In: Procedia Computer Science, Vol. 29, 01.01.2014, p. 198-209.

Research output: Contribution to journalConference article

@article{fc3a9ce3ab044cabac95102baa590940,
title = "Deploying a large petascale system: The Blue Waters experience",
abstract = "Deployment of a large parallel system is typically a very complex process, involving several steps of preparation, delivery, installation, testing and acceptance. Despite the availability of various petascale machines currently, the steps and lessons from their deployment are rarely described in the literature. This paper presents the experiences observed during the deployment of Blue Waters, the largest supercomputer ever built by Cray and one of the most powerful machines currently available for open science. The presentation is focused on the final deployment steps, where the system was intensively tested and accepted by NCSA. After a brief introduction of the Blue Waters architecture, a detailed description of the set of acceptance tests employed is provided, including many of the obtained results. This is followed by the major lessons learned during the process. Those experiences and lessons should be useful to guide similarly complex deployments in the future.",
keywords = "Acceptance testing, Large system deployment, Petascale performance",
author = "Mendes, {Celso Luiz} and Brett Bode and Bauer, {Gregory H} and Enos, {Jeremy James} and Cristina Beldica and Kramer, {William T}",
year = "2014",
month = "1",
day = "1",
doi = "10.1016/j.procs.2014.05.018",
language = "English (US)",
volume = "29",
pages = "198--209",
journal = "Procedia Computer Science",
issn = "1877-0509",
publisher = "Elsevier BV",

}

TY - JOUR

T1 - Deploying a large petascale system

T2 - The Blue Waters experience

AU - Mendes, Celso Luiz

AU - Bode, Brett

AU - Bauer, Gregory H

AU - Enos, Jeremy James

AU - Beldica, Cristina

AU - Kramer, William T

PY - 2014/1/1

Y1 - 2014/1/1

N2 - Deployment of a large parallel system is typically a very complex process, involving several steps of preparation, delivery, installation, testing and acceptance. Despite the availability of various petascale machines currently, the steps and lessons from their deployment are rarely described in the literature. This paper presents the experiences observed during the deployment of Blue Waters, the largest supercomputer ever built by Cray and one of the most powerful machines currently available for open science. The presentation is focused on the final deployment steps, where the system was intensively tested and accepted by NCSA. After a brief introduction of the Blue Waters architecture, a detailed description of the set of acceptance tests employed is provided, including many of the obtained results. This is followed by the major lessons learned during the process. Those experiences and lessons should be useful to guide similarly complex deployments in the future.

AB - Deployment of a large parallel system is typically a very complex process, involving several steps of preparation, delivery, installation, testing and acceptance. Despite the availability of various petascale machines currently, the steps and lessons from their deployment are rarely described in the literature. This paper presents the experiences observed during the deployment of Blue Waters, the largest supercomputer ever built by Cray and one of the most powerful machines currently available for open science. The presentation is focused on the final deployment steps, where the system was intensively tested and accepted by NCSA. After a brief introduction of the Blue Waters architecture, a detailed description of the set of acceptance tests employed is provided, including many of the obtained results. This is followed by the major lessons learned during the process. Those experiences and lessons should be useful to guide similarly complex deployments in the future.

KW - Acceptance testing

KW - Large system deployment

KW - Petascale performance

UR - http://www.scopus.com/inward/record.url?scp=84902768261&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84902768261&partnerID=8YFLogxK

U2 - 10.1016/j.procs.2014.05.018

DO - 10.1016/j.procs.2014.05.018

M3 - Conference article

AN - SCOPUS:84902768261

VL - 29

SP - 198

EP - 209

JO - Procedia Computer Science

JF - Procedia Computer Science

SN - 1877-0509

ER -