TY - GEN
T1 - Fast Error-Bounded Lossy HPC Data Compression with SZ
AU - Di, Sheng
AU - Cappello, Franck
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/7/18
Y1 - 2016/7/18
N2 - Today's HPC applications are producing extremely large amounts of data, thus it is necessary to use an efficient compression before storing them to parallel file systems. In this paper, we optimize the error-bounded HPC data compression, by proposing a novel HPC data compression method that works very effectively on compressing large-scale HPC data sets. The compression method starts by linearizing multi-dimensional snapshot data. The key idea is to fit/predict the successive data points with the bestfit selection of curve fitting models. The data that can be predicted precisely will be replaced by the code of the corresponding curve-fitting model. As for the unpredictable data that cannot be approximated by curve-fitting models, we perform an optimized lossy compression via a binary representation analysis. We evaluate our proposed solution using 13 real-world HPC applications across different scientific domains, and compare it to many other state-of-the-art compression methods (including Gzip, FPC, ISABELA, NUMARCK, ZFP, FPZIP, etc.). Experiments show that the compression ratio of our compressor ranges in 3.3/1 - 436/1, which is higher than the second-best solution ZFP by as little as 2x and as much as an order of magnitude for most cases. The compression time of SZ is comparable to other solutions', while its decompression time is less than the second best one by 50%-90%. On an extreme-scale use case, experiments show that the compression ratio of SZ exceeds that of ZFP by 80%.
AB - Today's HPC applications are producing extremely large amounts of data, thus it is necessary to use an efficient compression before storing them to parallel file systems. In this paper, we optimize the error-bounded HPC data compression, by proposing a novel HPC data compression method that works very effectively on compressing large-scale HPC data sets. The compression method starts by linearizing multi-dimensional snapshot data. The key idea is to fit/predict the successive data points with the bestfit selection of curve fitting models. The data that can be predicted precisely will be replaced by the code of the corresponding curve-fitting model. As for the unpredictable data that cannot be approximated by curve-fitting models, we perform an optimized lossy compression via a binary representation analysis. We evaluate our proposed solution using 13 real-world HPC applications across different scientific domains, and compare it to many other state-of-the-art compression methods (including Gzip, FPC, ISABELA, NUMARCK, ZFP, FPZIP, etc.). Experiments show that the compression ratio of our compressor ranges in 3.3/1 - 436/1, which is higher than the second-best solution ZFP by as little as 2x and as much as an order of magnitude for most cases. The compression time of SZ is comparable to other solutions', while its decompression time is less than the second best one by 50%-90%. On an extreme-scale use case, experiments show that the compression ratio of SZ exceeds that of ZFP by 80%.
UR - http://www.scopus.com/inward/record.url?scp=84983332081&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84983332081&partnerID=8YFLogxK
U2 - 10.1109/IPDPS.2016.11
DO - 10.1109/IPDPS.2016.11
M3 - Conference contribution
AN - SCOPUS:84983332081
T3 - Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium, IPDPS 2016
SP - 730
EP - 739
BT - Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium, IPDPS 2016
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 30th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2016
Y2 - 23 May 2016 through 27 May 2016
ER -