TY - GEN
T1 - Pattern-driven parallel I/O tuning
AU - Behzad, Babak
AU - Byna, Surendra
AU - Prabhat,
AU - Snir, Marc
N1 - Publisher Copyright:
© 2015 ACM.
PY - 2015/11/15
Y1 - 2015/11/15
N2 - The contemporary parallel I/O software stack is complex due to a large number of configurations for tuning I/O performance. Without a proper configuration, I/O becomes a performance bottleneck. As high performance computing (HPC) is moving towards exascale, poor I/O performance has a significant impact on the runtime of large-scale simulations producing massive amounts of data. In this paper, we focus on developing a framework for tuning parallel I/O configurations automatically. This auto-tuning framework first traces high-level I/O accesses and analyzes data write patterns. Based on these patterns and historically avail- Able tuning parameters for similar patterns, the framework selects best performing configurations at runtime. If previous history for a pattern is unavailable, the framework initiates model-based training to acquire efficient set of tuning parameters. Our framework includes a runtime system to apply the selected configurations using dynamic linking, without the need for changing application source code. In this paper, we describe this framework and evaluate it using multiple I/O kernels extracted from real applications and demonstrate substantial I/O performance improvement.
AB - The contemporary parallel I/O software stack is complex due to a large number of configurations for tuning I/O performance. Without a proper configuration, I/O becomes a performance bottleneck. As high performance computing (HPC) is moving towards exascale, poor I/O performance has a significant impact on the runtime of large-scale simulations producing massive amounts of data. In this paper, we focus on developing a framework for tuning parallel I/O configurations automatically. This auto-tuning framework first traces high-level I/O accesses and analyzes data write patterns. Based on these patterns and historically avail- Able tuning parameters for similar patterns, the framework selects best performing configurations at runtime. If previous history for a pattern is unavailable, the framework initiates model-based training to acquire efficient set of tuning parameters. Our framework includes a runtime system to apply the selected configurations using dynamic linking, without the need for changing application source code. In this paper, we describe this framework and evaluate it using multiple I/O kernels extracted from real applications and demonstrate substantial I/O performance improvement.
UR - http://www.scopus.com/inward/record.url?scp=84959421612&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84959421612&partnerID=8YFLogxK
U2 - 10.1145/2834976.2834977
DO - 10.1145/2834976.2834977
M3 - Conference contribution
AN - SCOPUS:84959421612
T3 - Proceedings of PDSW 2015: 10th Parallel Data Storage Workshop - Held in conjunction with SC 2015: The International Conference for High Performance Computing, Networking, Storage and Analysis
SP - 43
EP - 48
BT - Proceedings of PDSW 2015
PB - Association for Computing Machinery
T2 - 10th Parallel Data Storage Workshop, PDSW 2015
Y2 - 16 November 2015
ER -