TY - GEN
T1 - Efficient Error-Bounded Lossy Compression for CPU Architectures
AU - Dube, Griffin
AU - Tian, Jiannan
AU - Di, Sheng
AU - Tao, Dingwen
AU - Calhoun, Jon C.
AU - Cappello, Franck
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Modern HPC applications produce increasingly large amounts of data, which limits the performance of current extreme-scale systems. Lossy compression, helps to mitigate this issue by decreasing the size of data generated by these applications. SZ, a current state-of-the-art lossy compressor, is able to achieve high compression ratios, but its prediction/quantization methods contain RAW dependencies that prevent parallelizing this step of the compression. Recent work proposes a parallel dual prediction/quantization algorithm for GPUs which removes these dependencies. However, some HPC systems and applications do not use GPUs, and could still benefit from the fine-grained parallelism of this method. Using the dual-quantization technique, we implement and optimize a SIMD vectorized CPU version of SZ (vecSZ), and create a heuristic for selecting the optimal block size and vector length. We propose a novel block padding algorithm to decrease the number of unpredictable values along compression block borders and find it reduces the number of prediction outliers by up to 100%. We measure performance of our vecSZ against an CPU version of SZ using dual-quantization, pSZ, as well as SZ-1.4. Using real-world scientific datasets, we evaluate vecSZ on the Intel Skylake and AMD Rome architectures. vecSZ results in up to 32% improvement in rate-distortion and up to 15× speedup over SZ-1.4, achieving a prediction and quantization bandwidth in excess of 3.4 GB/s.
AB - Modern HPC applications produce increasingly large amounts of data, which limits the performance of current extreme-scale systems. Lossy compression, helps to mitigate this issue by decreasing the size of data generated by these applications. SZ, a current state-of-the-art lossy compressor, is able to achieve high compression ratios, but its prediction/quantization methods contain RAW dependencies that prevent parallelizing this step of the compression. Recent work proposes a parallel dual prediction/quantization algorithm for GPUs which removes these dependencies. However, some HPC systems and applications do not use GPUs, and could still benefit from the fine-grained parallelism of this method. Using the dual-quantization technique, we implement and optimize a SIMD vectorized CPU version of SZ (vecSZ), and create a heuristic for selecting the optimal block size and vector length. We propose a novel block padding algorithm to decrease the number of unpredictable values along compression block borders and find it reduces the number of prediction outliers by up to 100%. We measure performance of our vecSZ against an CPU version of SZ using dual-quantization, pSZ, as well as SZ-1.4. Using real-world scientific datasets, we evaluate vecSZ on the Intel Skylake and AMD Rome architectures. vecSZ results in up to 32% improvement in rate-distortion and up to 15× speedup over SZ-1.4, achieving a prediction and quantization bandwidth in excess of 3.4 GB/s.
KW - big data
KW - compression
KW - lossy compression
KW - program optimization
KW - veetorization
UR - http://www.scopus.com/inward/record.url?scp=85149917789&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85149917789&partnerID=8YFLogxK
U2 - 10.1109/MASCOTS56607.2022.00020
DO - 10.1109/MASCOTS56607.2022.00020
M3 - Conference contribution
AN - SCOPUS:85149917789
T3 - Proceedings - IEEE Computer Society's Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems, MASCOTS
SP - 89
EP - 96
BT - Proceedings - 2022 30th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, MASCOTS 2022
PB - IEEE Computer Society
T2 - 30th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, MASCOTS 2022
Y2 - 18 October 2022 through 20 October 2022
ER -