TY - GEN
T1 - Data sieving and collective I/O in ROMIO
AU - Thakur, Rajeev
AU - Gropp, William
AU - Lusk, Ewing
N1 - Publisher Copyright:
© 1999 IEEE.
PY - 1999
Y1 - 1999
N2 - The I/O access patterns of parallel programs often consist of accesses to a large number of small, noncontiguous pieces of data. If an application's I/O needs are met by making many small, distinct I/O requests, however, the I/O performance degrades drastically. To avoid this problem, MPI-IO allows users to access a noncontiguous data set with a single I/O function call. This feature provides MPI-IO implementations an opportunity to optimize data access. We describe how our MPI-IO implementation, ROMIO, delivers high performance in the presence of noncontiguous requests. We explain in detail the two key optimizations ROMIO performs: data sieving for noncontiguous requests from one process and collective I/O for noncontiguous requests from multiple processes. We describe how one can implement these optimizations portably on multiple machines and file systems, control their memory requirements, and also achieve high performance. We demonstrate the performance and portability with performance results for three applications-an astrophysics-application template (DIST3D) the NAS BTIO benchmark, and an unstructured code (UNSTRUC)-on five different parallel machines: HP Exemplar IBM SP, Intel Paragon, NEC SX-4, and SGI Origin2000.
AB - The I/O access patterns of parallel programs often consist of accesses to a large number of small, noncontiguous pieces of data. If an application's I/O needs are met by making many small, distinct I/O requests, however, the I/O performance degrades drastically. To avoid this problem, MPI-IO allows users to access a noncontiguous data set with a single I/O function call. This feature provides MPI-IO implementations an opportunity to optimize data access. We describe how our MPI-IO implementation, ROMIO, delivers high performance in the presence of noncontiguous requests. We explain in detail the two key optimizations ROMIO performs: data sieving for noncontiguous requests from one process and collective I/O for noncontiguous requests from multiple processes. We describe how one can implement these optimizations portably on multiple machines and file systems, control their memory requirements, and also achieve high performance. We demonstrate the performance and portability with performance results for three applications-an astrophysics-application template (DIST3D) the NAS BTIO benchmark, and an unstructured code (UNSTRUC)-on five different parallel machines: HP Exemplar IBM SP, Intel Paragon, NEC SX-4, and SGI Origin2000.
UR - http://www.scopus.com/inward/record.url?scp=85029696725&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85029696725&partnerID=8YFLogxK
U2 - 10.1109/FMPC.1999.750599
DO - 10.1109/FMPC.1999.750599
M3 - Conference contribution
AN - SCOPUS:85029696725
T3 - Proceedings - Frontiers 1999, 7th Symposium on the Frontiers of Massively Parallel Computation
SP - 182
EP - 189
BT - Proceedings - Frontiers 1999, 7th Symposium on the Frontiers of Massively Parallel Computation
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 7th Symposium on the Frontiers of Massively Parallel Computation, Frontiers 1999
Y2 - 21 February 1999 through 25 February 1999
ER -