Reconstruction of Sets of Strings From Prefix/Suffix Compositions

Ryan Gabrys, Srilakshmi Pattabiraman, Olgica Milenkovic

Research output: Contribution to journalArticlepeer-review

Abstract

The problem of reconstructing strings from substring information has found many applications due to its importance in genomic data sequencing and DNA- and polymer-based data storage. One important paradigm requires reconstructing mixtures of strings based on the union of compositions of their prefixes and suffixes, generated by mass spectrometry devices. We describe new coding methods that allow for unique joint reconstruction of subsets of strings selected from a code and provide upper and lower bounds on the asymptotic rate of the underlying codebooks. Our code constructions combine properties of binary B_{h} and Dyck strings that can be extended to accommodate missing substrings in the pool. As auxiliary results, we present simple entropy upper bounds for binary B_{h} codes and an improved bound for h=4 , and also describe errors that arise during mass spectrometry.

Original languageEnglish (US)
Pages (from-to)3-12
Number of pages10
JournalIEEE Transactions on Communications
Volume71
Issue number1
DOIs
StatePublished - Jan 1 2023
Externally publishedYes

Keywords

  • Binary Bh codes
  • Dyck strings
  • polymer-based data storage
  • unique string reconstruction

ASJC Scopus subject areas

  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Reconstruction of Sets of Strings From Prefix/Suffix Compositions'. Together they form a unique fingerprint.

Cite this