TY - JOUR
T1 - Transferring a petabyte in a day
AU - Kettimuthu, Rajkumar
AU - Liu, Zhengchun
AU - Wheeler, David
AU - Foster, Ian
AU - Heitmann, Katrin
AU - Cappello, Franck
N1 - Funding Information:
We would like to thank the FGCS anonymous reviewers, for the valuable feedback and good questions they brought up. This work was supported in part by the U.S. Department of Energy under contract number DEAC02-06CH11357 , National Science Foundation, United States award 1440761 and the Blue Waters sustained-petascale computing project, United States , which is supported by the National Science Foundation, United States (awards OCI-0725070 and ACI-1238993 ) and the state of Illinois.
Publisher Copyright:
© 2018
PY - 2018/11
Y1 - 2018/11
N2 - Extreme-scale simulations and experiments can generate large amounts of data, whose volume can exceed the compute and/or storage capacity at the simulation or experimental facility. With the emergence of ultra-high-speed networks, researchers are considering pipelined approaches in which data are passed to a remote facility for analysis. Here we examine an extreme-scale cosmology simulation that, when run on a large fraction of a leadership computer, generates data at a rate of one petabyte per elapsed day. Writing those data to disk is inefficient and impractical, and in situ analysis poses its own difficulties. Thus we implement a pipeline in which data are generated on one supercomputer and then transferred, as they are generated, to a remote supercomputer for analysis. We use the Swift scripting language to instantiate this pipeline across Argonne National Laboratory and the National Center for Supercomputing Applications, which are connected by a 100 Gb/s network; and we demonstrate that by using the Globus transfer service we can achieve a sustained rate of 93 Gb/s over a 24-hour period, thus attaining our performance goal of one petabyte moved in 24 h. This paper describes the methods used and summarizes the lessons learned in this demonstration.
AB - Extreme-scale simulations and experiments can generate large amounts of data, whose volume can exceed the compute and/or storage capacity at the simulation or experimental facility. With the emergence of ultra-high-speed networks, researchers are considering pipelined approaches in which data are passed to a remote facility for analysis. Here we examine an extreme-scale cosmology simulation that, when run on a large fraction of a leadership computer, generates data at a rate of one petabyte per elapsed day. Writing those data to disk is inefficient and impractical, and in situ analysis poses its own difficulties. Thus we implement a pipeline in which data are generated on one supercomputer and then transferred, as they are generated, to a remote supercomputer for analysis. We use the Swift scripting language to instantiate this pipeline across Argonne National Laboratory and the National Center for Supercomputing Applications, which are connected by a 100 Gb/s network; and we demonstrate that by using the Globus transfer service we can achieve a sustained rate of 93 Gb/s over a 24-hour period, thus attaining our performance goal of one petabyte moved in 24 h. This paper describes the methods used and summarizes the lessons learned in this demonstration.
KW - Cosmology workflow
KW - GridFTP
KW - Large data transfer
KW - Pipeline
KW - Wide area data transfer
UR - http://www.scopus.com/inward/record.url?scp=85047971868&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85047971868&partnerID=8YFLogxK
U2 - 10.1016/j.future.2018.05.051
DO - 10.1016/j.future.2018.05.051
M3 - Article
AN - SCOPUS:85047971868
SN - 0167-739X
VL - 88
SP - 191
EP - 198
JO - Future Generation Computer Systems
JF - Future Generation Computer Systems
ER -