Accurate Identification of Transcription Regulatory Sequences and Genes in Coronaviruses

Chuanyi Zhang, Palash Sashittal, Michael Xiang, Yichi Zhang, Ayesha Kazi, Mohammed El-Kebir

Research output: Contribution to journalArticlepeer-review


Transcription regulatory sequences (TRSs), which occur upstream of structural and accessory genes as well as the 5′ end of a coronavirus genome, play a critical role in discontinuous transcription in coronaviruses. We introduce two problems collectively aimed at identifying these regulatory sequences as well as their associated genes. First, we formulate the TRS Identification problem of identifying TRS sites in a coronavirus genome sequence with prescribed gene locations. We introduce CORSID-A, an algorithm that solves this problem to optimality in polynomial time. We demonstrate that CORSID-A outperforms existing motif-based methods in identifying TRS sites in coronaviruses. Second, we demonstrate for the first time how TRS sites can be leveraged to identify gene locations in the coronavirus genome. To that end, we formulate the TRS and Gene Identification problem of simultaneously identifying TRS sites and gene locations in unannotated coronavirus genomes. We introduce CORSID to solve this problem, which includes a web-based visualization tool to explore the space of near-optimal solutions. We show that CORSID outperforms state-of-the-art gene finding methods in coronavirus genomes. Furthermore, we demonstrate that CORSID enables de novo identification of TRS sites and genes in previously unannotated coronavirus genomes. CORSID is the first method to perform accurate and simultaneous identification of TRS sites and genes in coronavirus genomes without the use of any prior information.

Original languageEnglish (US)
Article numbermsac133
JournalMolecular biology and evolution
Issue number7
Early online dateJun 14 2022
StatePublished - Jul 1 2022


  • Core sequences
  • Interval graph
  • Maximum weight independent set
  • Local alignment
  • Motif finding
  • Coronavirus
  • Gene identification
  • core sequences
  • coronavirus
  • gene identification
  • maximum weight independent set
  • local alignment
  • motif finding
  • interval graph

ASJC Scopus subject areas

  • Genetics
  • Ecology, Evolution, Behavior and Systematics
  • Molecular Biology


Dive into the research topics of 'Accurate Identification of Transcription Regulatory Sequences and Genes in Coronaviruses'. Together they form a unique fingerprint.

Cite this