TY - JOUR
T1 - Deploying a large petascale system
T2 - 14th Annual International Conference on Computational Science, ICCS 2014
AU - Mendes, Celso L.
AU - Bode, Brett
AU - Bauer, Gregory H.
AU - Enos, Jeremy
AU - Beldica, Cristina
AU - Kramer, William T.
N1 - Funding Information:
Blue Waters is one of the most powerful supercomputers currently available for the open-science community. Sponsored by the US National Science Foundation (NSF) and installed at the National Center for Supercomputing Applications (NCSA) in Illinois, Blue Waters is also the largest machine ever built by Cray. In addition, it has tremendous amounts of memory and persistent storage. Various application groups are achieving the sustained petascale capability of the system, and there is a huge potential for scientific discoveries in the coming years.
Funding Information:
This work is part of the Blue Waters sustained-petascale computing project, which is supported by the US National Science Foundation (award number ACI 1238993) and the state of Illinois. Blue Waters is a joint effort of the University of Illinois at Urbana-Champaign and its National Center for Supercomputing Applications.
PY - 2014
Y1 - 2014
N2 - Deployment of a large parallel system is typically a very complex process, involving several steps of preparation, delivery, installation, testing and acceptance. Despite the availability of various petascale machines currently, the steps and lessons from their deployment are rarely described in the literature. This paper presents the experiences observed during the deployment of Blue Waters, the largest supercomputer ever built by Cray and one of the most powerful machines currently available for open science. The presentation is focused on the final deployment steps, where the system was intensively tested and accepted by NCSA. After a brief introduction of the Blue Waters architecture, a detailed description of the set of acceptance tests employed is provided, including many of the obtained results. This is followed by the major lessons learned during the process. Those experiences and lessons should be useful to guide similarly complex deployments in the future.
AB - Deployment of a large parallel system is typically a very complex process, involving several steps of preparation, delivery, installation, testing and acceptance. Despite the availability of various petascale machines currently, the steps and lessons from their deployment are rarely described in the literature. This paper presents the experiences observed during the deployment of Blue Waters, the largest supercomputer ever built by Cray and one of the most powerful machines currently available for open science. The presentation is focused on the final deployment steps, where the system was intensively tested and accepted by NCSA. After a brief introduction of the Blue Waters architecture, a detailed description of the set of acceptance tests employed is provided, including many of the obtained results. This is followed by the major lessons learned during the process. Those experiences and lessons should be useful to guide similarly complex deployments in the future.
KW - Acceptance testing
KW - Large system deployment
KW - Petascale performance
UR - http://www.scopus.com/inward/record.url?scp=84902768261&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84902768261&partnerID=8YFLogxK
U2 - 10.1016/j.procs.2014.05.018
DO - 10.1016/j.procs.2014.05.018
M3 - Conference article
AN - SCOPUS:84902768261
SN - 1877-0509
VL - 29
SP - 198
EP - 209
JO - Procedia Computer Science
JF - Procedia Computer Science
Y2 - 10 June 2014 through 12 June 2014
ER -