TY - GEN
T1 - CuSZ
T2 - 2020 ACM International Conference on Parallel Architectures and Compilation Techniques, PACT 2020
AU - Tian, Jiannan
AU - Di, Sheng
AU - Zhao, Kai
AU - Rivera, Cody
AU - Fulp, Megan Hickman
AU - Underwood, Robert
AU - Jin, Sian
AU - Liang, Xin
AU - Calhoun, Jon
AU - Tao, Dingwen
AU - Cappello, Franck
N1 - Funding Information:
This research was supported by the Exascale Computing Project (ECP), Project Number: 17-SC-20-SC, a collaborative effort of two DOE organizations - the Office of Science and the National Nuclear Security Administration, responsible for the planning and preparation of a capable exascale ecosystem, including software, applications, hardware, advanced system engineering and early testbed platforms, to support the nation’s exascale computing imperative. The material was supported by the U.S. Department of Energy, Office of Science, under contract DE-AC02-06CH11357. This work was also supported by the National Science Foundation under Grants CCF-1619253, OAC-2003709, OAC-1948447/2034169, and OAC-2003624/2042084. We would like to thank The University of Alabama for providing the startup funding for this work.
Funding Information:
This research was supported by the Exascale Computing Project (ECP), Project Number: 17-SC-20-SC, a collaborative effort of two DOE organizations - the Office of Science and the National Nuclear Security Administration, responsible for the planning and preparation of a capable exascale ecosystem, including software, applications, hardware, advanced system engineering and early testbed platforms, to support the nation's exascale computing imperative. The material was supported by the U.S. Department of Energy, Office of Science, under contract DE-AC02-06CH11357. This work was also supported by the National Science Foundation under Grants CCF-1619253, OAC- 2003709, OAC-1948447/2034169, and OAC-2003624/2042084. We would like to thank The University of Alabama for providing the startup funding for this work.
Publisher Copyright:
© 2020 Association for Computing Machinery.
PY - 2020/9/30
Y1 - 2020/9/30
N2 - Error-bounded lossy compression is a state-of-the-art data reduction technique for HPC applications because it not only significantly reduces storage overhead but also can retain high fidelityfor postanalysis. Because supercomputers and HPC applicationsare becoming heterogeneous using accelerator-based architectures,in particular GPUs, several development teams have recently released GPU versions of their lossy compressors. However, existingstate-of-the-art GPU-based lossy compressors suffer from eitherlow compression and decompression throughput or low compression quality. In this paper, we present an optimized GPU version,cuSZ, for one of the best error-bounded lossy compressors-SZ.To the best of our knowledge, cuSZ is the first error-boundedlossy compressor on GPUs for scientific data. Our contributions arefourfold. (1) We propose a dual-qantization scheme to entirelyremove the data dependency in the prediction step of SZ such thatthis step can be performed very efficiently on GPUs. (2) We developan efficient customized Huffman coding for the SZ compressor onGPUs. (3) We implement cuSZ using CUDA and optimize its performance by improving the utilization of GPU memory bandwidth. (4)We evaluate our cuSZ on five real-world HPC application datasetsfrom the Scientific Data Reduction Benchmarks and compare it withother state-of-the-art methods on both CPUs and GPUs. Experiments show that our cuSZ improves SZ's compression throughputby up to 370.1× and 13.1×, respectively, over the production version running on single and multiple CPU cores, respectively, whilegetting the same quality of reconstructed data. It also improves thecompression ratio by up to 3.48× on the tested data compared withanother state-of-the-art GPU supported lossy compressor.
AB - Error-bounded lossy compression is a state-of-the-art data reduction technique for HPC applications because it not only significantly reduces storage overhead but also can retain high fidelityfor postanalysis. Because supercomputers and HPC applicationsare becoming heterogeneous using accelerator-based architectures,in particular GPUs, several development teams have recently released GPU versions of their lossy compressors. However, existingstate-of-the-art GPU-based lossy compressors suffer from eitherlow compression and decompression throughput or low compression quality. In this paper, we present an optimized GPU version,cuSZ, for one of the best error-bounded lossy compressors-SZ.To the best of our knowledge, cuSZ is the first error-boundedlossy compressor on GPUs for scientific data. Our contributions arefourfold. (1) We propose a dual-qantization scheme to entirelyremove the data dependency in the prediction step of SZ such thatthis step can be performed very efficiently on GPUs. (2) We developan efficient customized Huffman coding for the SZ compressor onGPUs. (3) We implement cuSZ using CUDA and optimize its performance by improving the utilization of GPU memory bandwidth. (4)We evaluate our cuSZ on five real-world HPC application datasetsfrom the Scientific Data Reduction Benchmarks and compare it withother state-of-the-art methods on both CPUs and GPUs. Experiments show that our cuSZ improves SZ's compression throughputby up to 370.1× and 13.1×, respectively, over the production version running on single and multiple CPU cores, respectively, whilegetting the same quality of reconstructed data. It also improves thecompression ratio by up to 3.48× on the tested data compared withanother state-of-the-art GPU supported lossy compressor.
UR - http://www.scopus.com/inward/record.url?scp=85094197428&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85094197428&partnerID=8YFLogxK
U2 - 10.1145/3410463.3414624
DO - 10.1145/3410463.3414624
M3 - Conference contribution
AN - SCOPUS:85094197428
T3 - Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT
SP - 3
EP - 15
BT - PACT 2020 - Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 3 October 2020 through 7 October 2020
ER -