TY - GEN
T1 - Reconstructing Mixtures of Coded Strings from Prefix and Suffix Compositions
AU - Gabrys, Ryan
AU - Pattabiraman, Srilakshmi
AU - Milenkovic, Olgica
N1 - Funding Information:
The work was funded by the DARPA Molecular Informatics, the NSF/SRC SemiSynBio program and the NSF grant CIF 2008125.
Publisher Copyright:
©2021 IEEE
PY - 2021/4/11
Y1 - 2021/4/11
N2 - The problem of string reconstruction from substring information has found many applications due to its relevance in DNA- and polymer-based data storage. One practically important and challenging paradigm requires reconstructing mixtures of strings based on the union of compositions of their prefixes and suffixes, generated by mass spectrometry readouts. We describe new coding methods that allow for unique joint reconstruction of subsets of strings selected from a code and provide matching upper and lower bounds on the asymptotic rate of the underlying codebooks. Under certain mild constraints on the problem parameters, one can show that the largest possible rate of a codebook that allows for all subcollections of less than or equal to h codestrings to be uniquely reconstructable from the prefix-suffix information equals 1{h.
AB - The problem of string reconstruction from substring information has found many applications due to its relevance in DNA- and polymer-based data storage. One practically important and challenging paradigm requires reconstructing mixtures of strings based on the union of compositions of their prefixes and suffixes, generated by mass spectrometry readouts. We describe new coding methods that allow for unique joint reconstruction of subsets of strings selected from a code and provide matching upper and lower bounds on the asymptotic rate of the underlying codebooks. Under certain mild constraints on the problem parameters, one can show that the largest possible rate of a codebook that allows for all subcollections of less than or equal to h codestrings to be uniquely reconstructable from the prefix-suffix information equals 1{h.
KW - B sequences
KW - Polymer-based data storage
KW - Unique string reconstruction
UR - http://www.scopus.com/inward/record.url?scp=85113328542&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85113328542&partnerID=8YFLogxK
U2 - 10.1109/ITW46852.2021.9457660
DO - 10.1109/ITW46852.2021.9457660
M3 - Conference contribution
AN - SCOPUS:85113328542
T3 - 2020 IEEE Information Theory Workshop, ITW 2020
BT - 2020 IEEE Information Theory Workshop, ITW 2020
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2020 IEEE Information Theory Workshop, ITW 2020
Y2 - 11 April 2021 through 15 April 2021
ER -