TY - GEN
T1 - Hiding I/O latency with pre-execution prefetching for parallel applications
AU - Chen, Yong
AU - Byna, Surendra
AU - Sun, Xian He
AU - Thakur, Rajeev
AU - Gropp, William
N1 - Copyright:
Copyright 2009 Elsevier B.V., All rights reserved.
PY - 2008
Y1 - 2008
N2 - Parallel applications are usually able to achieve high computational performance but suffer from large latency in I/O accesses. I/O prefetching is an effective solution for masking the latency. Most of existing I/O prefetching techniques, however, are conservative and their effectiveness is limited by low accuracy and coverage. As the processor-I/O performance gap has been increasing rapidly, data-access delay has become a dominant performance bottleneck. We argue that it is time to revisit the "I/O wall" problem and trade the excessive computing power with data-access speed. We propose a novel pre-execution approach for masking I/O latency. We describe the pre-execution I/O prefetching framework, the pre-execution thread construction methodology, the underlying library support, and the prototype implementation in the ROMIO MPI-IO implementation in MPICH2. Preliminary experiments show that the pre-execution approach is promising in reducing I/O access latency and has real potential.
AB - Parallel applications are usually able to achieve high computational performance but suffer from large latency in I/O accesses. I/O prefetching is an effective solution for masking the latency. Most of existing I/O prefetching techniques, however, are conservative and their effectiveness is limited by low accuracy and coverage. As the processor-I/O performance gap has been increasing rapidly, data-access delay has become a dominant performance bottleneck. We argue that it is time to revisit the "I/O wall" problem and trade the excessive computing power with data-access speed. We propose a novel pre-execution approach for masking I/O latency. We describe the pre-execution I/O prefetching framework, the pre-execution thread construction methodology, the underlying library support, and the prototype implementation in the ROMIO MPI-IO implementation in MPICH2. Preliminary experiments show that the pre-execution approach is promising in reducing I/O access latency and has real potential.
UR - http://www.scopus.com/inward/record.url?scp=70350757788&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=70350757788&partnerID=8YFLogxK
U2 - 10.1109/SC.2008.5213209
DO - 10.1109/SC.2008.5213209
M3 - Conference contribution
AN - SCOPUS:70350757788
SN - 9781424428359
T3 - 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2008
BT - 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2008
T2 - 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2008
Y2 - 15 November 2008 through 21 November 2008
ER -