Unique Reconstruction of Coded Sequences from Multiset Substring Spectra

Ryan Gabrys, Olgica Milenkovic

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The problem of reconstructing strings from their substring spectra has a long history and in its most simple incarnation asks for determining under which conditions the spectrum uniquely determines the string. We study the problem of coded string reconstruction from multiset substring spectra, where the strings are restricted to lie in some codebook. In particular, we consider binary codebooks that allow for unique string reconstruction and propose a new method, termed repeat replacement, to create the codebook. Our contributions include algorithmic solutions for repeat replacement and constructive redundancy bounds for the underlying coding schemes. The study is motivated by applications in DNA-based data storage systems that use high throughput readout sequencers.

Original languageEnglish (US)
Title of host publication2018 IEEE International Symposium on Information Theory, ISIT 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2540-2544
Number of pages5
Volume2018-June
ISBN (Print)9781538647806
DOIs
StatePublished - Aug 15 2018
Event2018 IEEE International Symposium on Information Theory, ISIT 2018 - Vail, United States
Duration: Jun 17 2018Jun 22 2018

Other

Other2018 IEEE International Symposium on Information Theory, ISIT 2018
CountryUnited States
CityVail
Period6/17/186/22/18

Fingerprint

Multiset
Redundancy
DNA
Strings
Throughput
Codebook
Data storage equipment
Replacement
Data Storage
Storage System
High Throughput
Coding
Binary

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Information Systems
  • Modeling and Simulation
  • Applied Mathematics

Cite this

Gabrys, R., & Milenkovic, O. (2018). Unique Reconstruction of Coded Sequences from Multiset Substring Spectra. In 2018 IEEE International Symposium on Information Theory, ISIT 2018 (Vol. 2018-June, pp. 2540-2544). [8437909] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ISIT.2018.8437909

Unique Reconstruction of Coded Sequences from Multiset Substring Spectra. / Gabrys, Ryan; Milenkovic, Olgica.

2018 IEEE International Symposium on Information Theory, ISIT 2018. Vol. 2018-June Institute of Electrical and Electronics Engineers Inc., 2018. p. 2540-2544 8437909.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Gabrys, R & Milenkovic, O 2018, Unique Reconstruction of Coded Sequences from Multiset Substring Spectra. in 2018 IEEE International Symposium on Information Theory, ISIT 2018. vol. 2018-June, 8437909, Institute of Electrical and Electronics Engineers Inc., pp. 2540-2544, 2018 IEEE International Symposium on Information Theory, ISIT 2018, Vail, United States, 6/17/18. https://doi.org/10.1109/ISIT.2018.8437909
Gabrys R, Milenkovic O. Unique Reconstruction of Coded Sequences from Multiset Substring Spectra. In 2018 IEEE International Symposium on Information Theory, ISIT 2018. Vol. 2018-June. Institute of Electrical and Electronics Engineers Inc. 2018. p. 2540-2544. 8437909 https://doi.org/10.1109/ISIT.2018.8437909
Gabrys, Ryan ; Milenkovic, Olgica. / Unique Reconstruction of Coded Sequences from Multiset Substring Spectra. 2018 IEEE International Symposium on Information Theory, ISIT 2018. Vol. 2018-June Institute of Electrical and Electronics Engineers Inc., 2018. pp. 2540-2544
@inproceedings{f4fc57cd3e1244bc8dfd32ea69ea56a7,
title = "Unique Reconstruction of Coded Sequences from Multiset Substring Spectra",
abstract = "The problem of reconstructing strings from their substring spectra has a long history and in its most simple incarnation asks for determining under which conditions the spectrum uniquely determines the string. We study the problem of coded string reconstruction from multiset substring spectra, where the strings are restricted to lie in some codebook. In particular, we consider binary codebooks that allow for unique string reconstruction and propose a new method, termed repeat replacement, to create the codebook. Our contributions include algorithmic solutions for repeat replacement and constructive redundancy bounds for the underlying coding schemes. The study is motivated by applications in DNA-based data storage systems that use high throughput readout sequencers.",
author = "Ryan Gabrys and Olgica Milenkovic",
year = "2018",
month = "8",
day = "15",
doi = "10.1109/ISIT.2018.8437909",
language = "English (US)",
isbn = "9781538647806",
volume = "2018-June",
pages = "2540--2544",
booktitle = "2018 IEEE International Symposium on Information Theory, ISIT 2018",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

TY - GEN

T1 - Unique Reconstruction of Coded Sequences from Multiset Substring Spectra

AU - Gabrys, Ryan

AU - Milenkovic, Olgica

PY - 2018/8/15

Y1 - 2018/8/15

N2 - The problem of reconstructing strings from their substring spectra has a long history and in its most simple incarnation asks for determining under which conditions the spectrum uniquely determines the string. We study the problem of coded string reconstruction from multiset substring spectra, where the strings are restricted to lie in some codebook. In particular, we consider binary codebooks that allow for unique string reconstruction and propose a new method, termed repeat replacement, to create the codebook. Our contributions include algorithmic solutions for repeat replacement and constructive redundancy bounds for the underlying coding schemes. The study is motivated by applications in DNA-based data storage systems that use high throughput readout sequencers.

AB - The problem of reconstructing strings from their substring spectra has a long history and in its most simple incarnation asks for determining under which conditions the spectrum uniquely determines the string. We study the problem of coded string reconstruction from multiset substring spectra, where the strings are restricted to lie in some codebook. In particular, we consider binary codebooks that allow for unique string reconstruction and propose a new method, termed repeat replacement, to create the codebook. Our contributions include algorithmic solutions for repeat replacement and constructive redundancy bounds for the underlying coding schemes. The study is motivated by applications in DNA-based data storage systems that use high throughput readout sequencers.

UR - http://www.scopus.com/inward/record.url?scp=85052439520&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85052439520&partnerID=8YFLogxK

U2 - 10.1109/ISIT.2018.8437909

DO - 10.1109/ISIT.2018.8437909

M3 - Conference contribution

AN - SCOPUS:85052439520

SN - 9781538647806

VL - 2018-June

SP - 2540

EP - 2544

BT - 2018 IEEE International Symposium on Information Theory, ISIT 2018

PB - Institute of Electrical and Electronics Engineers Inc.

ER -