An Information Theory for Out-of-Order Media with Applications in DNA Data Storage

Aditya Narayan Ravi, Alireza Vahid, Ilan Shomorony

Research output: Contribution to journalArticlepeer-review

Abstract

Recent advancements in DNA-based storage prototypes focus on encoding information across multiple DNA molecules. This approach utilizes high-throughput sequencing technologies, leading to outputs that are out-of-order. We study the shuffling channel, where input codewords are split into fixed-size fragments. We show that achieving channel capacity uses index-based coding, which assigns unique indices to each fragment. We also introduce two more complex channels, which aim to model popular sequencing strategies in DNA sequencing. In the torn-paper channel, the input codeword is torn up into fragments of random sizes, while in the shotgun sequencing channel, fixed-length random substrings of the input codeword are observed at the output. In both of these channels, the lack of ordering cannot be circumvented by simply adding unique indices to the fragments. We show how the capacity of both of these channels can be achieved using random codes. We introduce and analyze code constructions based on index sequences. While these codes are computationally efficient, they are not capacity-achieving, and we leave the questions of finding efficient capacity-achieving codes for these settings as open problems.

Original languageEnglish (US)
Pages (from-to)334-348
Number of pages15
JournalIEEE Transactions on Molecular, Biological, and Multi-Scale Communications
Volume10
Issue number2
DOIs
StatePublished - Jun 1 2024

Keywords

  • Biological information theory
  • Channel coding
  • DNA
  • Decoding

ASJC Scopus subject areas

  • Biotechnology
  • Bioengineering
  • Modeling and Simulation
  • Computer Networks and Communications
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'An Information Theory for Out-of-Order Media with Applications in DNA Data Storage'. Together they form a unique fingerprint.

Cite this