Mass Error-Correction Codes for Polymer-Based Data Storage

Ryan Gabrys, Srilakshmi Pattabiraman, Olgica Milenkovic

Research output: Chapter in Book/Report/Conference proceedingConference contribution


We consider the problem of correcting mass readout errors in information encoded in binary polymer strings. Our work builds on results for string reconstruction problems using composition multisets [1] and the unique string reconstruction framework proposed in [2]. Binary polymer-based data storage systems [3] operate by designing two molecules of significantly different masses to represent the symbols {0,1} and perform readouts through noisy tandem mass spectrometry. Tandem mass spectrometers fragment the strings to be read into shorter substrings and only report their masses, often with errors due to imprecise ionization. Modeling the fragmentation process output in terms of composition multisets allows for designing asymptotically optimal codes capable of unique reconstruction and the correction of a single mass error [2] through the use of derivatives of Catalan paths. Nevertheless, no solutions for multiple-mass error-corrections are currently known. Our work addresses this issue by describing the first multiple-error correction codes that use the polynomial factorization approach for the Turnpike problem [4] and the related factorization described in [1]. Adding Reed-Solomon type coding redundancy into the corresponding polynomials allows for correcting t mass errors in polynomial time using {\mathcal{O}}\left( {{t^2}\log k} \right) redundant bits, where k is the information string length. The redundancy can be improved to {\mathcal{O}}(t + \log k). However, no decoding algorithm that runs polynomial-time in both t and n for this scheme are currently known, where n is the length of the coded string.

Original languageEnglish (US)
Title of host publication2020 IEEE International Symposium on Information Theory, ISIT 2020 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages6
ISBN (Electronic)9781728164328
StatePublished - Jun 2020
Event2020 IEEE International Symposium on Information Theory, ISIT 2020 - Los Angeles, United States
Duration: Jul 21 2020Jul 26 2020

Publication series

NameIEEE International Symposium on Information Theory - Proceedings
ISSN (Print)2157-8095


Conference2020 IEEE International Symposium on Information Theory, ISIT 2020
Country/TerritoryUnited States
CityLos Angeles

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Information Systems
  • Modeling and Simulation
  • Applied Mathematics


Dive into the research topics of 'Mass Error-Correction Codes for Polymer-Based Data Storage'. Together they form a unique fingerprint.

Cite this