TY - GEN
T1 - The Maximum Weight Trace Alignment Merging Problem
AU - Zaharias, Paul
AU - Smirnov, Vladimir
AU - Warnow, Tandy
N1 - Funding Information:
Supported by the University of Illinois.
Funding Information:
Acknowledgments. This work was supported in part by NSF ABI-1458652 to TW and by the Debra and Ira Cohen fellowship to VS.
Publisher Copyright:
© 2021, Springer Nature Switzerland AG.
PY - 2021
Y1 - 2021
N2 - The Maximum Weight Trace (MWT) is an optimization problem for multiple sequence alignment that takes a set of sequences and weights on pairs of letters from different sequences and seeks a multiple sequence alignment that maximizes the sum of the weights for the pairs of letters that appear in the same column. MWT was introduced by Kececioglu in 1993, then proven to be NP-hard, and heuristics and exact solutions for MWT developed. Unfortunately none of the MWT methods are scalable to even moderate-sized datasets. Here we propose the MWT-AM problem (MWT for Alignment Merging), an extension of the MWT problem to be used in a divide-and-conquer setting, where we seek a merged alignment of a set of disjoint alignments that optimizes the MWT score. We present variations of GCM (the Graph Clustering Merger, originally developed for the MAGUS multiple sequence alignment method) that are specifically designed for MWT-AM. We show that the best of these variants, which we refer to as GCM-MWT, perform well for the MWT-AM criterion. We explore GCM-MWT in comparison to other methods for merging alignments, T-coffee and MAFFT–merge, and find that GCM-MWT produces more accurate merged alignments. GCM-MWT is available in open source form at https://github.com/vlasmirnov/MAGUS.
AB - The Maximum Weight Trace (MWT) is an optimization problem for multiple sequence alignment that takes a set of sequences and weights on pairs of letters from different sequences and seeks a multiple sequence alignment that maximizes the sum of the weights for the pairs of letters that appear in the same column. MWT was introduced by Kececioglu in 1993, then proven to be NP-hard, and heuristics and exact solutions for MWT developed. Unfortunately none of the MWT methods are scalable to even moderate-sized datasets. Here we propose the MWT-AM problem (MWT for Alignment Merging), an extension of the MWT problem to be used in a divide-and-conquer setting, where we seek a merged alignment of a set of disjoint alignments that optimizes the MWT score. We present variations of GCM (the Graph Clustering Merger, originally developed for the MAGUS multiple sequence alignment method) that are specifically designed for MWT-AM. We show that the best of these variants, which we refer to as GCM-MWT, perform well for the MWT-AM criterion. We explore GCM-MWT in comparison to other methods for merging alignments, T-coffee and MAFFT–merge, and find that GCM-MWT produces more accurate merged alignments. GCM-MWT is available in open source form at https://github.com/vlasmirnov/MAGUS.
KW - Clustering
KW - Maximum Weight Trace
KW - Multiple sequence alignment
UR - http://www.scopus.com/inward/record.url?scp=85111153753&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85111153753&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-74432-8_12
DO - 10.1007/978-3-030-74432-8_12
M3 - Conference contribution
AN - SCOPUS:85111153753
SN - 9783030744311
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 159
EP - 171
BT - Algorithms for Computational Biology - 8th International Conference, AlCoB 2021, Proceedings
A2 - Martín-Vide, Carlos
A2 - Vega-Rodríguez, Miguel A.
A2 - Wheeler, Travis
PB - Springer
T2 - 8th International Conference on Algorithms for Computational Biology, AlCoB 2021
Y2 - 7 June 2021 through 11 June 2021
ER -