TY - GEN
T1 - A Lightweight, Effective Compressibility Estimation Method for Error-bounded Lossy Compression
AU - Ganguli, Arkaprabha
AU - Underwood, Robert
AU - Bessac, Julie
AU - Krasowska, David
AU - Calhoun, Jon C.
AU - Di, Sheng
AU - Cappello, Franck
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Error-bounded lossy compression turns more and more important for the data-moving intensive applications to deal with big datasets efficiently in HPC environments, which often requires knowing the compressibility of the datasets before performing the compression. However, the off-the-shelf state-of-the-art lossy compressors are often driven by error bounds, so the compression ratios cannot be forecasted until the completion of the compression operation. In this paper, we propose a lightweight, robust, easy-to-train model that estimates the compressibility of datasets for different lossy compressors accurately. Our approach combines novel predictors that measure various notions of spatial correlation and smoothness exploited by lossy compressors that are implemented efficiently on the GPU in a framework and that uses mixture model regression to improve robustness with conformal prediction to provide bounds on the estimates. We then use these models with a detailed analysis of speedup to understand the tradeoffs between high speed, consistent speed, and accuracy of the methods on real applications. We evaluate our approach in the context of 3 key applications where compression ratio estimation is highly required.
AB - Error-bounded lossy compression turns more and more important for the data-moving intensive applications to deal with big datasets efficiently in HPC environments, which often requires knowing the compressibility of the datasets before performing the compression. However, the off-the-shelf state-of-the-art lossy compressors are often driven by error bounds, so the compression ratios cannot be forecasted until the completion of the compression operation. In this paper, we propose a lightweight, robust, easy-to-train model that estimates the compressibility of datasets for different lossy compressors accurately. Our approach combines novel predictors that measure various notions of spatial correlation and smoothness exploited by lossy compressors that are implemented efficiently on the GPU in a framework and that uses mixture model regression to improve robustness with conformal prediction to provide bounds on the estimates. We then use these models with a detailed analysis of speedup to understand the tradeoffs between high speed, consistent speed, and accuracy of the methods on real applications. We evaluate our approach in the context of 3 key applications where compression ratio estimation is highly required.
KW - Compression Estimation
KW - Error Bounded Lossy Compressors
KW - Lossy Compression
KW - Rate Distortion
UR - http://www.scopus.com/inward/record.url?scp=85178163898&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85178163898&partnerID=8YFLogxK
U2 - 10.1109/CLUSTER52292.2023.00028
DO - 10.1109/CLUSTER52292.2023.00028
M3 - Conference contribution
AN - SCOPUS:85178163898
T3 - Proceedings - IEEE International Conference on Cluster Computing, ICCC
SP - 247
EP - 258
BT - Proceedings - 2023 IEEE International Conference on Cluster Computing, CLUSTER 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 25th IEEE International Conference on Cluster Computing, CLUSTER 2023
Y2 - 31 October 2023 through 3 November 2023
ER -