TY - GEN
T1 - Scalable Incremental Checkpointing using GPU-Accelerated De-Duplication
AU - Tan, Nigel
AU - Luettgau, Jakob
AU - Marquez, Jack
AU - Terianishi, Keita
AU - Morales, Nicolas
AU - Bhowmick, Sanjukta
AU - Cappello, Franck
AU - Taufer, Michela
AU - Nicolae, Bogdan
N1 - This material is based upon work supported by: the U.S. Department of Energy (DOE), Office of Science, Office of Advanced Scientific Computing Research, under Contract DE-AC02-06CH11357; the National Science Foundation under Grants #1900888 and #1900765; and the IBM Shared University Research Award at the University of Tennessee. This manuscript has been authored by UT-Battelle LLC under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). Sandia National Laboratories is a multi-mission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525.
PY - 2023/8/7
Y1 - 2023/8/7
N2 - Writing large amounts of data concurrently to stable storage is a typical I/O pattern of many HPC workflows. This pattern introduces high I/O overheads and results in increased storage space utilization especially for workflows that need to capture the evolution of data structures with high frequency as checkpoints. In this context, many applications, such as graph pattern matching, perform sparse updates to large data structures between checkpoints. For these applications, incremental checkpointing techniques that save only the differences from one checkpoint to another can dramatically reduce the checkpoint sizes, I/O bottlenecks, and storage space utilization. However, such techniques are not without challenges: it is non-trivial to transparently determine what data has changed since a previous checkpoint and assemble the differences in a compact fashion that does not result in excessive metadata. State-of-art data reduction techniques (e.g., compression and de-duplication) have significant limitations when applied to modern HPC applications that leverage GPUs: slow at detecting the differences, generate a large amount of metadata to keep track of the differences, and ignore crucial spatiotemporal checkpoint data redundancy. This paper addresses these challenges by proposing a Merkle tree-based incremental checkpointing method to exploit GPUs' high memory bandwidth and massive parallelism. Experimental results at scale show a significant reduction of the I/O overhead and space utilization of checkpointing compared with state-of-the-art incremental checkpointing and compression techniques.
AB - Writing large amounts of data concurrently to stable storage is a typical I/O pattern of many HPC workflows. This pattern introduces high I/O overheads and results in increased storage space utilization especially for workflows that need to capture the evolution of data structures with high frequency as checkpoints. In this context, many applications, such as graph pattern matching, perform sparse updates to large data structures between checkpoints. For these applications, incremental checkpointing techniques that save only the differences from one checkpoint to another can dramatically reduce the checkpoint sizes, I/O bottlenecks, and storage space utilization. However, such techniques are not without challenges: it is non-trivial to transparently determine what data has changed since a previous checkpoint and assemble the differences in a compact fashion that does not result in excessive metadata. State-of-art data reduction techniques (e.g., compression and de-duplication) have significant limitations when applied to modern HPC applications that leverage GPUs: slow at detecting the differences, generate a large amount of metadata to keep track of the differences, and ignore crucial spatiotemporal checkpoint data redundancy. This paper addresses these challenges by proposing a Merkle tree-based incremental checkpointing method to exploit GPUs' high memory bandwidth and massive parallelism. Experimental results at scale show a significant reduction of the I/O overhead and space utilization of checkpointing compared with state-of-the-art incremental checkpointing and compression techniques.
KW - Checkpointing
KW - GPU parallelization
KW - data versioning
KW - de-duplication
KW - incremental storage
UR - http://www.scopus.com/inward/record.url?scp=85178157728&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85178157728&partnerID=8YFLogxK
U2 - 10.1145/3605573.3605639
DO - 10.1145/3605573.3605639
M3 - Conference contribution
AN - SCOPUS:85178157728
T3 - ACM International Conference Proceeding Series
SP - 665
EP - 674
BT - 52nd International Conference on Parallel Processing, ICPP 2023 - Main Conference Proceedings
PB - Association for Computing Machinery
T2 - 52nd International Conference on Parallel Processing, ICPP 2023
Y2 - 7 August 2023 through 10 August 2023
ER -