TY - GEN
T1 - Resilient error-bounded lossy compressor for data transfer
AU - Li, Sihuan
AU - Di, Sheng
AU - Zhao, Kai
AU - Liang, Xin
AU - Chen, Zizhong
AU - Cappello, Franck
N1 - Publisher Copyright:
© 2021 IEEE Computer Society. All rights reserved.
PY - 2021/11/14
Y1 - 2021/11/14
N2 - Todays exa-scale scientific applications or advanced instruments are producing vast volumes of data, which need to be shared/transferred through the network/devices with relatively low bandwidth (e.g., data sharing on WAN or transferring from edge devices to supercomputers). Lossy compression is one of the candidate strategies to address the big data issue. However, little work was done to make it resilient against silent errors, which may happen during the stage of compression or data transferring. In this paper, we propose a resilient error-bounded lossy compressor based on the SZ compression framework. Specifically, we design a new independentblock-wise model that decomposes the entire dataset into many independent sub-blocks to compress then, we design and implement a series of error detection/correction strategies elaboratively for each stage of SZ. Our method is arguably the first algorithmbased fault tolerance (ABFT) solution for lossy compression. Our proposed solution incurs negligible execution overhead in the faultfree situation. Upon soft errors happening, it ensures decompressed data strictly bounded within users requirement with a very limited degradation of compression ratio and low overhead.
AB - Todays exa-scale scientific applications or advanced instruments are producing vast volumes of data, which need to be shared/transferred through the network/devices with relatively low bandwidth (e.g., data sharing on WAN or transferring from edge devices to supercomputers). Lossy compression is one of the candidate strategies to address the big data issue. However, little work was done to make it resilient against silent errors, which may happen during the stage of compression or data transferring. In this paper, we propose a resilient error-bounded lossy compressor based on the SZ compression framework. Specifically, we design a new independentblock-wise model that decomposes the entire dataset into many independent sub-blocks to compress then, we design and implement a series of error detection/correction strategies elaboratively for each stage of SZ. Our method is arguably the first algorithmbased fault tolerance (ABFT) solution for lossy compression. Our proposed solution incurs negligible execution overhead in the faultfree situation. Upon soft errors happening, it ensures decompressed data strictly bounded within users requirement with a very limited degradation of compression ratio and low overhead.
KW - Algorithm Based Fault Tolerance
KW - Data transfer
KW - Lossy compression
UR - http://www.scopus.com/inward/record.url?scp=85119977782&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85119977782&partnerID=8YFLogxK
U2 - 10.1145/3458817.3476195
DO - 10.1145/3458817.3476195
M3 - Conference contribution
AN - SCOPUS:85119977782
T3 - International Conference for High Performance Computing, Networking, Storage and Analysis, SC
BT - Proceedings of SC 2021
PB - IEEE Computer Society
T2 - 33rd International Conference for High Performance Computing, Networking, Storage and Analysis: Science and Beyond, SC 2021
Y2 - 14 November 2021 through 19 November 2021
ER -