Unique Reconstruction of Coded Strings from Multiset Substring Spectra

Ryan Gabrys, Olgica Milenkovic

Research output: Contribution to journalArticlepeer-review


The problem of reconstructing strings from their substring spectra has a long history and in its most simple incarnation asks for determining under which conditions the spectrum uniquely determines the string. We study the problem of coded string reconstruction from multiset substring spectra, where the strings are restricted to lie in some codebook. In particular, we consider binary codebooks that allow for unique string reconstruction and propose a new method, termed repeat replacement, to create the codebook. Our contributions include algorithmic solutions for repeat replacement and constructive redundancy bounds for the underlying coding schemes. We also consider extensions of the problem to noisy settings in which substrings are compromised by burst and random errors. The study is motivated by applications in DNA-based data storage systems that use high throughput readout sequencers.

Original languageEnglish (US)
Article number8805120
Pages (from-to)7682-7696
Number of pages15
JournalIEEE Transactions on Information Theory
Issue number12
StatePublished - Dec 2019


  • Data storage systems
  • biological information theory

ASJC Scopus subject areas

  • Information Systems
  • Computer Science Applications
  • Library and Information Sciences


Dive into the research topics of 'Unique Reconstruction of Coded Strings from Multiset Substring Spectra'. Together they form a unique fingerprint.

Cite this