TY - JOUR
T1 - Optimizing I/O performance of HPC applications with autotuning
AU - Behzad, Babak
AU - Byna, Surendra
AU - Prabhat,
AU - Snir, Marc
N1 - Funding Information:
This work is supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, under Contract No. DE-AC02-05CH11231. This research used resources of the National Energy Research Scientific Computing Center, the Texas Advanced Computing Center, and the Argonne Leadership Computing Facility at Argonne National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under contract DE-AC02-06CH11357. It was partly supported by NSF grant 0938064.
Funding Information:
This work is supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, under Contract No. DE-AC02-05CH11231. This research used resources of the National Energy Research Scientific Computing Center, the Texas Advanced Computing Center, and the Argonne Leadership Computing Facility at Argonne National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under contract DE-AC02-06CH11357. It was partly supported by NSF grant 0938064. Authors’ addresses: B. Behzad, 66 E 40th AveSan Mateo, CA 94403; email: babakbehzad@gmail.com; S. Byna, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Mail Stop 50B-3238 Berkeley, CA 94720 email: sbyna@lbl.gov; Prabhat, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Mail Stop 59R4010A, Berkeley, CA 94720; email: prabhat@ lbl.gov; M. Snir, Dept. of Computer Science, University of Illinois at Urbana-Champaign, 201 N Goodwin Ave, Urbana, IL 61801; email: snir@illinois.edu. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor, or affiliate of the United States government. As such, the United States government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for government purposes only. © 2019 Association for Computing Machinery. 2329-4949/2019/03-ART15 $15.00 https://doi.org/10.1145/3309205
Publisher Copyright:
© 2019 Association for Computing Machinery.
PY - 2019/3
Y1 - 2019/3
N2 - Parallel Input output is an essential component of modern high-performance computing (HPC). Obtaining good I/O performance for a broad range of applications on diverse HPC platforms is a major challenge, in part, because of complex inter dependencies between I/O middleware and hardware. The parallel file system and I/O middleware layers all offer optimization parameters that can, in theory, result in better I/O performance. Unfortunately, the right combination of parameters is highly dependent on the application, HPC platform, problem size, and concurrency. Scientific application developers do not have the time or expertise to take on the substantial burden of identifying good parameters for each problem configuration. They resort to using system defaults, a choice that frequently results in poor I/O performance. We expect this problem to be compounded on exascale-class machines, which will likely have a deeper software stack with hierarchically arranged hardware resources. We present as a solution to this problem an autotuning system for optimizing I/O performance, I/O performance modeling, I/O tuning, and I/O patterns. We demonstrate the value of this framework across several HPC platforms and applications at scale.
AB - Parallel Input output is an essential component of modern high-performance computing (HPC). Obtaining good I/O performance for a broad range of applications on diverse HPC platforms is a major challenge, in part, because of complex inter dependencies between I/O middleware and hardware. The parallel file system and I/O middleware layers all offer optimization parameters that can, in theory, result in better I/O performance. Unfortunately, the right combination of parameters is highly dependent on the application, HPC platform, problem size, and concurrency. Scientific application developers do not have the time or expertise to take on the substantial burden of identifying good parameters for each problem configuration. They resort to using system defaults, a choice that frequently results in poor I/O performance. We expect this problem to be compounded on exascale-class machines, which will likely have a deeper software stack with hierarchically arranged hardware resources. We present as a solution to this problem an autotuning system for optimizing I/O performance, I/O performance modeling, I/O tuning, and I/O patterns. We demonstrate the value of this framework across several HPC platforms and applications at scale.
KW - Autotuning
KW - HPC
KW - I/O
KW - Parallel file systems
KW - Performance optimization
KW - Storage
UR - http://www.scopus.com/inward/record.url?scp=85065757355&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85065757355&partnerID=8YFLogxK
U2 - 10.1145/3309205
DO - 10.1145/3309205
M3 - Article
AN - SCOPUS:85065757355
SN - 2329-4949
VL - 5
JO - ACM Transactions on Parallel Computing
JF - ACM Transactions on Parallel Computing
IS - 4
M1 - 15
ER -