Large-Scale Multiple Sequence Alignment and the Maximum Weight Trace Alignment Merging Problem

Paul Zaharias, Vladimir Smirnov, Tandy Warnow

Research output: Contribution to journalArticlepeer-review

Abstract

MAGUS is a recent multiple sequence alignment method that provides excellent accuracy on large challenging datasets. MAGUS uses divide-and-conquer: it divides the sequences into disjoint sets, computes alignments on the disjoint sets, and then merges the alignments using a technique it calls the Graph Clustering Method (GCM). To understand why MAGUS is so accurate, we show that GCM is a good heuristic for the NP-hard MWT-AM problem (Maximum Weight Trace, adapted to the Alignment Merging problem). Our study, using both biological and simulated data, establishes that MWT-AM scores correlate very well with alignment accuracy and presents improvements to GCM that are even better heuristics for MWT-AM. This study suggests a new direction for large-scale MSA estimation based on improved divide-and-conquer strategies, with the merging step based on optimizing MWT-AM. MAGUS and its enhanced versions are available at https://github.com/vlasmirnov/MAGUS.

Original languageEnglish (US)
Pages (from-to)1700-1712
Number of pages13
JournalIEEE/ACM Transactions on Computational Biology and Bioinformatics
Volume20
Issue number3
DOIs
StatePublished - May 1 2023
Externally publishedYes

Keywords

  • Clustering algorithms
  • Corporate acquisitions
  • Estimation
  • Markov clustering
  • Markov processes
  • Merging
  • Optimization
  • Pipelines
  • maximum weight trace
  • multiple sequence alignment
  • Multiple sequence alignment

ASJC Scopus subject areas

  • Applied Mathematics
  • Genetics
  • Biotechnology

Fingerprint

Dive into the research topics of 'Large-Scale Multiple Sequence Alignment and the Maximum Weight Trace Alignment Merging Problem'. Together they form a unique fingerprint.

Cite this