Global SNP analysis of 11,183 SARS-CoV-2 strains reveals high genetic diversity

Fangfeng Yuan, Liping Wang, Ying Fang, Leyi Wang

Research output: Contribution to journalArticlepeer-review


Since first identified in December of 2019, COVID-19 has been quickly spreading to the world in few months and COVID-19 cases are still undergoing rapid surge in most countries worldwide. The causative agent, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), adapts and evolves rapidly in nature. With the availability of 16,092 SARS-CoV-2 full genomes in GISAID as of 13 May, we removed the poor-quality genomes and performed mutational profiling analysis for the remaining 11,183 viral genomes. Global analysis of all sequences identified all single nucleotide polymorphisms (SNPs) across the whole genome and critical SNPs with high mutation frequency that contributes to five-clade classification of global strains. A total of 119 SNPs were found with 74 non-synonymous mutations, 43 synonymous mutations and two mutations in intergenic regions. Analysis of geographic pattern of mutational profiling for the whole genome reveals differences between each continent. A transition mutation from C to T represents the most mutation types across the genome, suggesting rapid evolution and adaptation of the virus in host. Amino acid (AA) deletions and insertions found across the genome results in changes in viral protein length and potential function alteration. Mutational profiling for each gene was analysed, and results show that nucleocapsid gene demonstrates the highest mutational frequency, followed by Nsp2, Nsp3 and Spike gene. We further focused on non-synonymous mutational distributions on four key viral proteins, spike with 75 mutations, RNA-dependent-RNA-polymerase with 41 mutations, 3C-like protease with 22 mutations and Papain-like protease with 10 mutations. Results show that non-synonymous mutations on critical sites of these four proteins pose great challenge for development of anti-viral drugs and other countering measures. Overall, this study provides more understanding of genetic diversity/variability of SARS-CoV-2 and insights for development of anti-viral therapeutics.

Original languageEnglish (US)
Pages (from-to)3288-3304
Number of pages17
JournalTransboundary and Emerging Diseases
Issue number6
Early online dateDec 8 2020
StatePublished - Nov 2021


  • 3CLpro
  • complete genome sequence
  • SNP analysis
  • SARS-CoV-2
  • S protein
  • RdRp
  • PLpro
  • COVID-19

ASJC Scopus subject areas

  • Immunology and Microbiology(all)
  • veterinary(all)


Dive into the research topics of 'Global SNP analysis of 11,183 SARS-CoV-2 strains reveals high genetic diversity'. Together they form a unique fingerprint.

Cite this