TY - JOUR
T1 - Coded Trace Reconstruction
AU - Cheraghchi, Mahdi
AU - Gabrys, Ryan
AU - Milenkovic, Olgica
AU - Ribeiro, Joao
N1 - This work was supported in part by the Defense Advanced Research Projects Agency (DARPA) Molecular Informatics Program, in part by NSF and Semiconductor Research Corporation (SRC) SemiSynBio Award, in part by NSF under Grant 1618366, in part by the Center for Science of Information (CSoI), in part by NSF Science and Technology Center, under Grant CCF-0939370.
Manuscript received April 8, 2019; revised March 6, 2020; accepted May 11, 2020. Date of publication May 21, 2020; date of current version September 22, 2020. This work was supported in part by the Defense Advanced Research Projects Agency (DARPA) Molecular Informatics Program, in part by NSF and Semiconductor Research Corporation (SRC) SemiSynBio Award, in part by NSF under Grant 1618366, in part by the Center for Science of Information (CSoI), in part by NSF Science and Technology Center, under Grant CCF-0939370. This article was presented in part at the 2019 IEEE Information Theory Workshop. (Corresponding author: Jo\u00E3o Ribeiro.) Mahdi Cheraghchi was with the Department of Computing, Imperial College London, London SW7 2AZ, U.K. He is now with the CSE Department, University of Michigan, Ann Arbor, MI 48109 USA (e-mail: [email protected]).
PY - 2020/10
Y1 - 2020/10
N2 - Motivated by average-case trace reconstruction and coding for portable DNA-based storage systems, we initiate the study of coded trace reconstruction, the design and analysis of high-rate efficiently encodable codes that can be efficiently decoded with high probability from few reads (also called traces) corrupted by edit errors. Codes used in current portable DNA-based storage systems with nanopore sequencers are largely based on heuristics, and have no provable robustness or performance guarantees even for an error model with i.i.d. deletions and constant deletion probability. Our work is the first step towards the design of efficient codes with provable guarantees for such systems. We consider a constant rate of i.i.d. deletions, and perform an analysis of marker-based code-constructions. This gives rise to codes with redundancy $O(n/\log n)$ (resp. $O(n/\log \log n)$ ) that can be efficiently reconstructed from $\exp (O(\log ^{2/3}n))$ (resp. $\exp (O(\log \log n)^{2/3})$ ) traces, where $n$ is the message length. Then, we give a construction of a code with $O(\log n)$ bits of redundancy that can be efficiently reconstructed from $\text {poly}(n)$ traces if the deletion probability is small enough. Finally, we show how to combine both approaches, giving rise to an efficient code with $O(n/\log n)$ bits of redundancy which can be reconstructed from $\text {poly}(\log n)$ traces for a small constant deletion probability.
AB - Motivated by average-case trace reconstruction and coding for portable DNA-based storage systems, we initiate the study of coded trace reconstruction, the design and analysis of high-rate efficiently encodable codes that can be efficiently decoded with high probability from few reads (also called traces) corrupted by edit errors. Codes used in current portable DNA-based storage systems with nanopore sequencers are largely based on heuristics, and have no provable robustness or performance guarantees even for an error model with i.i.d. deletions and constant deletion probability. Our work is the first step towards the design of efficient codes with provable guarantees for such systems. We consider a constant rate of i.i.d. deletions, and perform an analysis of marker-based code-constructions. This gives rise to codes with redundancy $O(n/\log n)$ (resp. $O(n/\log \log n)$ ) that can be efficiently reconstructed from $\exp (O(\log ^{2/3}n))$ (resp. $\exp (O(\log \log n)^{2/3})$ ) traces, where $n$ is the message length. Then, we give a construction of a code with $O(\log n)$ bits of redundancy that can be efficiently reconstructed from $\text {poly}(n)$ traces if the deletion probability is small enough. Finally, we show how to combine both approaches, giving rise to an efficient code with $O(n/\log n)$ bits of redundancy which can be reconstructed from $\text {poly}(\log n)$ traces for a small constant deletion probability.
KW - Coding
KW - DNA storage
KW - deletion channel
KW - trace reconstruction
UR - https://www.scopus.com/pages/publications/85092086537
UR - https://www.scopus.com/pages/publications/85092086537#tab=citedBy
U2 - 10.1109/TIT.2020.2996377
DO - 10.1109/TIT.2020.2996377
M3 - Article
AN - SCOPUS:85092086537
SN - 0018-9448
VL - 66
SP - 6084
EP - 6103
JO - IEEE Transactions on Information Theory
JF - IEEE Transactions on Information Theory
IS - 10
M1 - 9097932
ER -