Abstract

Genes in SARS-CoV-2 and, more generally, in viruses in the order of Nidovirales are expressed by a process of discontinuous transcription mediated by the viral RNA-dependent RNA polymerase. This process is distinct from alternative splicing in eukaryotes, rendering current transcript assembly methods unsuitable to Nidovirales sequencing samples. Here, we introduce the Discontinuous Transcript Assembly problem of finding transcripts and their abundances c given an alignment under a maximum likelihood model that accounts for varying transcript lengths. Underpinning our approach is the concept of a segment graph, a directed acyclic graph that, distinct from the splice graph used to characterize alternative splicing, has a unique Hamiltonian path. We provide a compact characterization of solutions as subsets of non-overlapping edges in this graph, enabling the formulation of an efficient mixed integer linear program. We show using simulations that our method, Jumper, drastically outperforms existing methods for classical transcript assembly. On short-read data of SARS-CoV-1 and SARS-CoV-2 samples, we find that Jumper not only identifies canonical transcripts that are part of the reference transcriptome, but also predicts expression of non-canonical transcripts that are well supported by direct evidence from long-read data, presence in multiple, independent samples or a conserved core sequence. Jumper enables detailed analyses of Nidovirales transcriptomes.

Code availability Software is available at https://github.com/elkebir-group/Jumper
Original languageEnglish (US)
PublisherCold Spring Harbor Laboratory Press
Number of pages47
DOIs
StateIn preparation - Feb 15 2021

Publication series

NamebioRxiv
PublisherCold Spring Harbor Laboratory Press

Keywords

  • Coronavirus
  • COVID-19
  • severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)
  • Novel coronavirus
  • 2019-nCoV
  • Pandemic

Fingerprint Dive into the research topics of 'Jumper Enables Discontinuous Transcript Assembly in Coronaviruses'. Together they form a unique fingerprint.

Cite this