Due to its longevity and enormous information density, DNA is an attractive medium for archival storage. In this work, we study the fundamental limits and trade-offs of DNA-based storage systems by introducing a new channel model, which we call the noisy shuffling-sampling channel. Motivated by current technological constraints on DNA synthesis and sequencing, this model captures three key distinctive aspects of DNA storage systems: (1) the data is written onto many short DNA molecules; (2) the molecules are corrupted by noise during synthesis and sequencing and (3) the data is read by randomly sampling from the DNA pool. We provide capacity results for this channel under specific noise and sampling assumptions and show that, in many scenarios, a simple index-based coding scheme is optimal.
- DNA storage
- Data storage
- channel capacity
ASJC Scopus subject areas
- Information Systems
- Computer Science Applications
- Library and Information Sciences