Abstract
MAGUS is a recent multiple sequence alignment method that provides excellent accuracy on large challenging datasets. MAGUS uses divide-and-conquer: it divides the sequences into disjoint sets, computes alignments on the disjoint sets, and then merges the alignments using a technique it calls the Graph Clustering Method (GCM). To understand why MAGUS is so accurate, we show that GCM is a good heuristic for the NP-hard MWT-AM problem (Maximum Weight Trace, adapted to the Alignment Merging problem). Our study, using both biological and simulated data, establishes that MWT-AM scores correlate very well with alignment accuracy and presents improvements to GCM that are even better heuristics for MWT-AM. This study suggests a new direction for large-scale MSA estimation based on improved divide-and-conquer strategies, with the merging step based on optimizing MWT-AM. MAGUS and its enhanced versions are available at https://github.com/vlasmirnov/MAGUS.
Original language | English (US) |
---|---|
Pages (from-to) | 1700-1712 |
Number of pages | 13 |
Journal | IEEE/ACM Transactions on Computational Biology and Bioinformatics |
Volume | 20 |
Issue number | 3 |
DOIs | |
State | Published - May 1 2023 |
Externally published | Yes |
Keywords
- Clustering algorithms
- Corporate acquisitions
- Estimation
- Markov clustering
- Markov processes
- Merging
- Optimization
- Pipelines
- maximum weight trace
- multiple sequence alignment
- Multiple sequence alignment
ASJC Scopus subject areas
- Applied Mathematics
- Genetics
- Biotechnology