MAGUS1eHMMs: improved multiple sequence alignment accuracy for fragmentary sequences

Chengze Shen, Paul Zaharias, Tandy Warnow

Research output: Contribution to journalArticlepeer-review

Abstract

Multiple sequence alignment is an initial step in many bioinformatics pipelines, including phylogeny estimation, protein structure prediction and taxonomic identification of reads produced in amplicon or metagenomic datasets, etc. Yet, alignment estimation is challenging on datasets that exhibit substantial sequence length heterogeneity, and especially when the datasets have fragmentary sequences as a result of including reads or contigs generated by next-generation sequencing technologies. Here, we examine techniques that have been developed to improve alignment estimation when datasets contain substantial numbers of fragmentary sequences. We find that MAGUS, a recently developed MSA method, is fairly robust to fragmentary sequences under many conditions, and that using a two-stage approach where MAGUS is used to align selected 'backbone sequences' and the remaining sequences are added into the alignment using ensembles of Hidden Markov Models further improves alignment accuracy. The combination of MAGUS with the ensemble of eHMMs (i.e. MAGUS eHMMs) clearly improves on UPP, the previous leading method for aligning datasets with high levels of fragmentation.

Original languageEnglish (US)
Pages (from-to)918-924
Number of pages7
JournalBioinformatics
Volume38
Issue number4
DOIs
StatePublished - Feb 15 2022
Externally publishedYes

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Fingerprint

Dive into the research topics of 'MAGUS1eHMMs: improved multiple sequence alignment accuracy for fragmentary sequences'. Together they form a unique fingerprint.

Cite this