TY - GEN
T1 - An Efficient and Accurate Compression Ratio Estimation Model for SZx
AU - Khan, Arham
AU - Di, Sheng
AU - Zhao, Kai
AU - Liu, Jinyang
AU - Chard, Kyle
AU - Fosterv, Ian
AU - Cappello, Franck
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Modern large-scale HPC applications generate enormous volumes of data for subsequent storage or transfer. Error-controlled lossy compression effectively reduces data sizes and preserves data fidelity based on user-defined error bounds, but compression ratios are unknown until after compression. Many use cases, however, require knowledge of compression ratios a priori to pre-allocate memory for the compressed data at runtime and avoid simulation crashes caused by lack of storage space. We propose Surrogate-based Error-controlled Lossy Compression Ratio Estimation Framework (SECRE), which estimates the true compression ratio via data sampling and a lightweight compression surrogate. Results for SZx on 4 real-world scientific datasets show an extremely low estimation error (e.g., ~1% estimation errors for SZx) and low execution overhead (e.g., ~2% estimation cost for SZx).
AB - Modern large-scale HPC applications generate enormous volumes of data for subsequent storage or transfer. Error-controlled lossy compression effectively reduces data sizes and preserves data fidelity based on user-defined error bounds, but compression ratios are unknown until after compression. Many use cases, however, require knowledge of compression ratios a priori to pre-allocate memory for the compressed data at runtime and avoid simulation crashes caused by lack of storage space. We propose Surrogate-based Error-controlled Lossy Compression Ratio Estimation Framework (SECRE), which estimates the true compression ratio via data sampling and a lightweight compression surrogate. Results for SZx on 4 real-world scientific datasets show an extremely low estimation error (e.g., ~1% estimation errors for SZx) and low execution overhead (e.g., ~2% estimation cost for SZx).
KW - compression ratio estimation
KW - error-controlled lossy compression
KW - sampling
KW - scientific datasets
UR - http://www.scopus.com/inward/record.url?scp=85179620746&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85179620746&partnerID=8YFLogxK
U2 - 10.1109/CLUSTERWorkshops61457.2023.00019
DO - 10.1109/CLUSTERWorkshops61457.2023.00019
M3 - Conference contribution
AN - SCOPUS:85179620746
T3 - Proceedings - IEEE International Conference on Cluster Computing, ICCC
SP - 48
EP - 49
BT - Proceedings - 2023 IEEE International Conference on Cluster Computing Workshops and Posters, CLUSTER Workshops 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 25th IEEE International Conference on Cluster Computing Workshops, CLUSTER Workshops 2023
Y2 - 31 October 2023 through 3 November 2023
ER -