TY - GEN
T1 - Improving floating point compression through binary masks
AU - Gomez, Leonardo A.Bautista
AU - Cappello, Franck
PY - 2013
Y1 - 2013
N2 - Modern scientific technology such as particle accelerators, telescopes, and supercomputers are producing extremely large amounts of data. That scientific data needs to be processed by using systems with high computational capabilities such as supercomputers. Given that the scientific data is increasing in size at an exponential rate, storing and accessing the data are becoming expensive in both time and space. Most of this scientific data is stored by using floating point representation. Scientific applications executed on supercomputers spend a large amount of CPU cycles reading and writing floating point values, making data compression techniques an interesting way to increase computing efficiency. Given the accuracy requirements of scientific computing, we only focus on lossless data compression. In this paper we propose a masking technique that partially decreases the entropy of scientific datasets, allowing for a better compression ratio and higher throughput. We evaluate several data partitioning techniques for selective compression and compare these schemes with several existing compression strategies. Our approach shows up to 15% improvement in compression ratio while reducing the time spent in compression by half time in some cases.
AB - Modern scientific technology such as particle accelerators, telescopes, and supercomputers are producing extremely large amounts of data. That scientific data needs to be processed by using systems with high computational capabilities such as supercomputers. Given that the scientific data is increasing in size at an exponential rate, storing and accessing the data are becoming expensive in both time and space. Most of this scientific data is stored by using floating point representation. Scientific applications executed on supercomputers spend a large amount of CPU cycles reading and writing floating point values, making data compression techniques an interesting way to increase computing efficiency. Given the accuracy requirements of scientific computing, we only focus on lossless data compression. In this paper we propose a masking technique that partially decreases the entropy of scientific datasets, allowing for a better compression ratio and higher throughput. We evaluate several data partitioning techniques for selective compression and compare these schemes with several existing compression strategies. Our approach shows up to 15% improvement in compression ratio while reducing the time spent in compression by half time in some cases.
UR - http://www.scopus.com/inward/record.url?scp=84893264024&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84893264024&partnerID=8YFLogxK
U2 - 10.1109/BigData.2013.6691591
DO - 10.1109/BigData.2013.6691591
M3 - Conference contribution
AN - SCOPUS:84893264024
SN - 9781479912926
T3 - Proceedings - 2013 IEEE International Conference on Big Data, Big Data 2013
SP - 326
EP - 331
BT - Proceedings - 2013 IEEE International Conference on Big Data, Big Data 2013
PB - IEEE Computer Society
T2 - 2013 IEEE International Conference on Big Data, Big Data 2013
Y2 - 6 October 2013 through 9 October 2013
ER -