DZip: Improved general-purpose loss less compression based on novel neural network modeling

Mohit Goyal, Kedar Tatwawadi, Shubham Chandak, Idoia Ochoa

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We consider lossless compression based on statistical data modeling followed by prediction-based encoding, where an accurate statistical model for the input data leads to substantial improvements in compression. We propose DZip, a general-purpose compressor for sequential data that exploits the well-known modeling capabilities of neural networks (NNs) for prediction, followed by arithmetic coding. DZip uses a novel hybrid architecture based on adaptive and semi-adaptive training. Unlike most NN-based compressors, DZip does not require additional training data and is not restricted to specific data types. The proposed compressor outperforms general-purpose compressors such as Gzip (29% size reduction on average) and 7zip (12% size reduction on average) on a variety of real datasets, achieves near-optimal compression on synthetic datasets, and performs close to specialized compressors for large sequence lengths, without any human input. While the main limitation of NN-based compressors is generally the encoding/decoding speed, we empirically demonstrate that DZip achieves comparable compression ratio to other NN-based compressors while being several times faster. The source code for DZip and links to the datasets are available at http://github.com/mohit1997/Dzip-torch.

Original languageEnglish (US)
Title of host publicationProceedings - DCC 2021
Subtitle of host publication2021 Data Compression Conference
EditorsAli Bilgin, Michael W. Marcellin, Joan Serra-Sagrista, James A. Storer
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages153-162
Number of pages10
ISBN (Electronic)9780738112275
DOIs
StatePublished - Mar 2021
Event2021 Data Compression Conference, DCC 2021 - Snowbird, United States
Duration: Mar 23 2021Mar 26 2021

Publication series

NameData Compression Conference Proceedings
Volume2021-March
ISSN (Print)1068-0314

Conference

Conference2021 Data Compression Conference, DCC 2021
Country/TerritoryUnited States
CitySnowbird
Period3/23/213/26/21

ASJC Scopus subject areas

  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'DZip: Improved general-purpose loss less compression based on novel neural network modeling'. Together they form a unique fingerprint.

Cite this