Abstract

MAGUS is a recent multiple sequence alignment method that provides excellent accuracy on large challenging datasets. MAGUS uses divide-and-conquer: it divides the sequences into disjoint sets, computes alignments on the disjoint sets, and then merges the alignments using a technique it calls the Graph Clustering Method (GCM). To understand why MAGUS is so accurate, we show that GCM is a good heuristic for the NP-hard MWT-AM problem (Maximum Weight Trace, adapted to the Alignment Merging problem). Our study, using both biological and simulated data, establishes that MWT-AM scores correlate very well with alignment accuracy and presents improvements to GCM that are even better heuristics for MWT-AM. This study suggests a new direction for large-scale MSA estimation based on improved divide-and-conquer strategies, with the merging step based on optimizing MWT-AM. MAGUS and its enhanced versions are available at <uri>https://github.com/vlasmirnov/MAGUS</uri>.

Original languageEnglish (US)
Pages (from-to)1-13
Number of pages13
JournalIEEE/ACM Transactions on Computational Biology and Bioinformatics
DOIs
StateAccepted/In press - 2022

Keywords

  • Clustering algorithms
  • Corporate acquisitions
  • Estimation
  • Markov clustering
  • Markov processes
  • maximum weight trace
  • Merging
  • multiple sequence alignment
  • Optimization
  • Pipelines

ASJC Scopus subject areas

  • Biotechnology
  • Genetics
  • Applied Mathematics

Fingerprint

Dive into the research topics of 'Large-Scale Multiple Sequence Alignment and the Maximum Weight Trace Alignment Merging Problem'. Together they form a unique fingerprint.

Cite this