TY - JOUR
T1 - DNA Merge-Sort
T2 - A Family of Nested Varshamov-Tenengolts Reassembly Codes for Out-of-Order Media
AU - Nassirpour, Sajjad
AU - Shomorony, Ilan
AU - Vahid, Alireza
N1 - The work of Ilan Shomorony was supported in part by the National Science Foundation (NSF) under Grants CCF-2007597 and CCF-2046991.
PY - 2024/3/1
Y1 - 2024/3/1
N2 - Motivated by the DNA storage paradigm, we consider the torn-paper channel (TPC), which models data storage in long DNA molecules and breaks the input sequence into a random number of out-of-order variable-length non-overlapped fragments. We propose a computationally-efficient code construction for this model. More specifically, we introduce a family of nested Varshamov-Tenengolts (VT) codes to merge and sort the fragments in order to recover the stored data. We numerically show that our scheme (i) obtains rates that are higher than in prior results, (ii) has a decoding complexity that is cubic in the number of codeword fragments, which is significantly lower than the complexity of the brute-force approach, and (iii) offers decreasing and negligible error rates as the codeword length increases. We also propose a new construction for VT codes, quantify the number of required parity bits, and show that our approach requires fewer parity bits compared to known results.
AB - Motivated by the DNA storage paradigm, we consider the torn-paper channel (TPC), which models data storage in long DNA molecules and breaks the input sequence into a random number of out-of-order variable-length non-overlapped fragments. We propose a computationally-efficient code construction for this model. More specifically, we introduce a family of nested Varshamov-Tenengolts (VT) codes to merge and sort the fragments in order to recover the stored data. We numerically show that our scheme (i) obtains rates that are higher than in prior results, (ii) has a decoding complexity that is cubic in the number of codeword fragments, which is significantly lower than the complexity of the brute-force approach, and (iii) offers decreasing and negligible error rates as the codeword length increases. We also propose a new construction for VT codes, quantify the number of required parity bits, and show that our approach requires fewer parity bits compared to known results.
KW - DNA storage
KW - codeword fragmentation
KW - sequence assembly
KW - shuffling channel
KW - unordered communication
UR - http://www.scopus.com/inward/record.url?scp=85178076559&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85178076559&partnerID=8YFLogxK
U2 - 10.1109/TCOMM.2023.3335409
DO - 10.1109/TCOMM.2023.3335409
M3 - Article
AN - SCOPUS:85178076559
SN - 0090-6778
VL - 72
SP - 1303
EP - 1317
JO - IEEE Transactions on Communications
JF - IEEE Transactions on Communications
IS - 3
ER -