Small-Sample Estimation of the Mutational Support and Distribution of SARS-CoV-2

Vishal Rana, Eli Chien, Jianhao Peng, Olgica Milenkovic

Research output: Contribution to journalArticlepeer-review


We consider the problem of determining the mutational support and distribution of the SARS-CoV-2 viral genome in the small-sample regime. The mutational support refers to the unknown number of sites that may eventually mutate in the SARS-CoV-2 genome while mutational distribution refers to the distribution of point mutations in the viral genome across a population. The mutational support may be used to assess the virulence of the virus and guide primer selection for real-time RT-PCR testing. Estimating the distribution of mutations in the genome of different subpopulations while accounting for the unseen may also aid in discovering new variants. To estimate the mutational support in the small-sample regime, we use GISAID sequencing data and our state-of-the-art polynomial estimation techniques based on new weighted and regularized Chebyshev approximation methods. For distribution estimation, we adapt the well-known Good-Turing estimator. Our analysis reveals several findings: First, the mutational supports exhibit significant differences in the ORF6 and ORF7a regions (older versus younger patients), ORF1b and ORF10 regions (females versus males) and in almost all ORFs (Asia/Europe/North America). Second, even though the N region of SARS-CoV-2 has a predicted 10% mutational support, mutations fall outside of the primer regions recommended by the CDC.

Original languageEnglish (US)
Pages (from-to)668-682
Number of pages15
JournalIEEE/ACM Transactions on Computational Biology and Bioinformatics
Issue number1
StatePublished - Jan 1 2023


  • Bioinformatics
  • Chebyshev and weighted Chebyshev approximations
  • Coronaviruses
  • Estimation
  • Genomics
  • Good-Turing estimators
  • Proteins
  • Small-sample distribution estimation
  • Small-sample support estimation
  • Sociology
  • Statistics
  • Virology
  • small-sample distribution estimation
  • virology

ASJC Scopus subject areas

  • Applied Mathematics
  • Genetics
  • Biotechnology


Dive into the research topics of 'Small-Sample Estimation of the Mutational Support and Distribution of SARS-CoV-2'. Together they form a unique fingerprint.

Cite this