TY - GEN
T1 - Improving Prediction-Based Lossy Compression Dramatically via Ratio-Quality Modeling
AU - Jin, Sian
AU - Di, Sheng
AU - Tian, Jiannan
AU - Byna, Suren
AU - Tao, Dingwen
AU - Cappello, Franck
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Error-bounded lossy compression is one of the most effective techniques for reducing scientific data sizes. However, the traditional trial-and-error approach used to configure lossy compressors for finding the optimal trade-off between reconstructed data quality and compression ratio is prohibitively expensive. To resolve this issue, we develop a general-purpose analytical ratio-quality model based on the prediction-based lossy compression framework, which can effectively foresee the reduced data quality and compression ratio, as well as the impact of lossy compressed data on post-hoc analysis quality. Our analytical model significantly improves the prediction-based lossy compression in three use-cases: (1) optimization of predictor by selecting the best-fit predictor; (2) memory compression with a target ratio; and (3) in-situ compression optimization by fine-grained tuning error-bounds for various data partitions. We evaluate our analytical model on 10 scientific datasets, demonstrating its high accuracy (93.47% accuracy on average) and low computational cost (up to 18.7x lower than the trial-and-error approach) for estimating the compression ratio and the impact of lossy compression on post-hoc analysis quality. We also verify the high efficiency of our ratio-quality model using different applications across the three use-cases. In addition, our experiment demonstrates that our modeling-based approach reduces the time to store the 3D RTM data with HDF5 by up to 3.4 x with 128 CPU cores over the traditional solution.
AB - Error-bounded lossy compression is one of the most effective techniques for reducing scientific data sizes. However, the traditional trial-and-error approach used to configure lossy compressors for finding the optimal trade-off between reconstructed data quality and compression ratio is prohibitively expensive. To resolve this issue, we develop a general-purpose analytical ratio-quality model based on the prediction-based lossy compression framework, which can effectively foresee the reduced data quality and compression ratio, as well as the impact of lossy compressed data on post-hoc analysis quality. Our analytical model significantly improves the prediction-based lossy compression in three use-cases: (1) optimization of predictor by selecting the best-fit predictor; (2) memory compression with a target ratio; and (3) in-situ compression optimization by fine-grained tuning error-bounds for various data partitions. We evaluate our analytical model on 10 scientific datasets, demonstrating its high accuracy (93.47% accuracy on average) and low computational cost (up to 18.7x lower than the trial-and-error approach) for estimating the compression ratio and the impact of lossy compression on post-hoc analysis quality. We also verify the high efficiency of our ratio-quality model using different applications across the three use-cases. In addition, our experiment demonstrates that our modeling-based approach reduces the time to store the 3D RTM data with HDF5 by up to 3.4 x with 128 CPU cores over the traditional solution.
KW - Analytical modeling
KW - Lossy compression
KW - Scientific data
UR - http://www.scopus.com/inward/record.url?scp=85136387050&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85136387050&partnerID=8YFLogxK
U2 - 10.1109/ICDE53745.2022.00232
DO - 10.1109/ICDE53745.2022.00232
M3 - Conference contribution
AN - SCOPUS:85136387050
T3 - Proceedings - International Conference on Data Engineering
SP - 2494
EP - 2507
BT - Proceedings - 2022 IEEE 38th International Conference on Data Engineering, ICDE 2022
PB - IEEE Computer Society
T2 - 38th IEEE International Conference on Data Engineering, ICDE 2022
Y2 - 9 May 2022 through 12 May 2022
ER -