TY - GEN
T1 - Pilgrim
T2 - 33rd International Conference for High Performance Computing, Networking, Storage and Analysis: Science and Beyond, SC 2021
AU - Wang, Chen
AU - Balaji, Pavan
AU - Snir, Marc
N1 - Funding Information:
This research was supported by NSF OAC grant 19-09144 and by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration, and by the U.S. Department of Energy, Office of Science, under Contract DE-AC02-06CH11357. We thank Dr. Yanfei Guo for his gracious help.
Publisher Copyright:
© 2021 IEEE Computer Society. All rights reserved.
PY - 2021/11/14
Y1 - 2021/11/14
N2 - Traces of MPI communications are used by many performance analysis and visualization tools. Storing exhaustive traces of large scale MPI applications is infeasible, due to their large volume. Aggregated or lossy MPI traces are smaller, but provide much less information. In this paper, we present Pilgrim, a near lossless MPI tracing tool that incurs moderate overheads and generates small trace files at large scales, by using sophisticated compression techniques. Furthermore, for codes with regular communication patterns, Pilgrim can store their traces in constant space regardless of the problem size, the number of processors, and the number of iterations. In comparison with existing tools, Pilgrim preserves more information with less space in all the programs we tested.
AB - Traces of MPI communications are used by many performance analysis and visualization tools. Storing exhaustive traces of large scale MPI applications is infeasible, due to their large volume. Aggregated or lossy MPI traces are smaller, but provide much less information. In this paper, we present Pilgrim, a near lossless MPI tracing tool that incurs moderate overheads and generates small trace files at large scales, by using sophisticated compression techniques. Furthermore, for codes with regular communication patterns, Pilgrim can store their traces in constant space regardless of the problem size, the number of processors, and the number of iterations. In comparison with existing tools, Pilgrim preserves more information with less space in all the programs we tested.
KW - Communication tracing
KW - Lossless MPI tracing
KW - Trace compression
UR - http://www.scopus.com/inward/record.url?scp=85119974570&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85119974570&partnerID=8YFLogxK
U2 - 10.1145/3458817.3476151
DO - 10.1145/3458817.3476151
M3 - Conference contribution
AN - SCOPUS:85119974570
T3 - International Conference for High Performance Computing, Networking, Storage and Analysis, SC
BT - Proceedings of SC 2021
PB - IEEE Computer Society
Y2 - 14 November 2021 through 19 November 2021
ER -