TY - GEN
T1 - DZip
T2 - 2021 Data Compression Conference, DCC 2021
AU - Goyal, Mohit
AU - Tatwawadi, Kedar
AU - Chandak, Shubham
AU - Ochoa, Idoia
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/3
Y1 - 2021/3
N2 - We consider lossless compression based on statistical data modeling followed by prediction-based encoding, where an accurate statistical model for the input data leads to substantial improvements in compression. We propose DZip, a general-purpose compressor for sequential data that exploits the well-known modeling capabilities of neural networks (NNs) for prediction, followed by arithmetic coding. DZip uses a novel hybrid architecture based on adaptive and semi-adaptive training. Unlike most NN-based compressors, DZip does not require additional training data and is not restricted to specific data types. The proposed compressor outperforms general-purpose compressors such as Gzip (29% size reduction on average) and 7zip (12% size reduction on average) on a variety of real datasets, achieves near-optimal compression on synthetic datasets, and performs close to specialized compressors for large sequence lengths, without any human input. While the main limitation of NN-based compressors is generally the encoding/decoding speed, we empirically demonstrate that DZip achieves comparable compression ratio to other NN-based compressors while being several times faster. The source code for DZip and links to the datasets are available at http://github.com/mohit1997/Dzip-torch.
AB - We consider lossless compression based on statistical data modeling followed by prediction-based encoding, where an accurate statistical model for the input data leads to substantial improvements in compression. We propose DZip, a general-purpose compressor for sequential data that exploits the well-known modeling capabilities of neural networks (NNs) for prediction, followed by arithmetic coding. DZip uses a novel hybrid architecture based on adaptive and semi-adaptive training. Unlike most NN-based compressors, DZip does not require additional training data and is not restricted to specific data types. The proposed compressor outperforms general-purpose compressors such as Gzip (29% size reduction on average) and 7zip (12% size reduction on average) on a variety of real datasets, achieves near-optimal compression on synthetic datasets, and performs close to specialized compressors for large sequence lengths, without any human input. While the main limitation of NN-based compressors is generally the encoding/decoding speed, we empirically demonstrate that DZip achieves comparable compression ratio to other NN-based compressors while being several times faster. The source code for DZip and links to the datasets are available at http://github.com/mohit1997/Dzip-torch.
UR - http://www.scopus.com/inward/record.url?scp=85106038791&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85106038791&partnerID=8YFLogxK
U2 - 10.1109/DCC50243.2021.00023
DO - 10.1109/DCC50243.2021.00023
M3 - Conference contribution
AN - SCOPUS:85106038791
T3 - Data Compression Conference Proceedings
SP - 153
EP - 162
BT - Proceedings - DCC 2021
A2 - Bilgin, Ali
A2 - Marcellin, Michael W.
A2 - Serra-Sagrista, Joan
A2 - Storer, James A.
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 23 March 2021 through 26 March 2021
ER -