TY - GEN
T1 - PASTA
T2 - 18th Annual International Conference on Research in Computational Molecular Biology, RECOMB 2014
AU - Mirarab, Siavash
AU - Nguyen, Nam
AU - Warnow, Tandy
PY - 2014
Y1 - 2014
N2 - In this paper, we introduce a new and highly scalable algorithm, PASTA, for large-scale multiple sequence alignment estimation. PASTA uses a new technique to produce an alignment given a guide tree that enables it to be both highly scalable and very accurate. We present a study on biological and simulated data with up to 200,000 sequences, showing that PASTA produces highly accurate alignments, improving on the accuracy of the leading alignment methods on large datasets, and is able to analyze much larger datasets than the current methods. We also show that trees estimated on PASTA alignments are highly accurate - slightly better than SATé trees, but with substantial improvements relative to other methods. Finally, PASTA is very fast, highly parallelizable, and requires relatively little memory.
AB - In this paper, we introduce a new and highly scalable algorithm, PASTA, for large-scale multiple sequence alignment estimation. PASTA uses a new technique to produce an alignment given a guide tree that enables it to be both highly scalable and very accurate. We present a study on biological and simulated data with up to 200,000 sequences, showing that PASTA produces highly accurate alignments, improving on the accuracy of the leading alignment methods on large datasets, and is able to analyze much larger datasets than the current methods. We also show that trees estimated on PASTA alignments are highly accurate - slightly better than SATé trees, but with substantial improvements relative to other methods. Finally, PASTA is very fast, highly parallelizable, and requires relatively little memory.
KW - Multiple sequence alignment
KW - SATé
KW - Ultra-large
UR - http://www.scopus.com/inward/record.url?scp=84958551141&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84958551141&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-05269-4_15
DO - 10.1007/978-3-319-05269-4_15
M3 - Conference contribution
AN - SCOPUS:84958551141
SN - 9783319052687
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 177
EP - 191
BT - Research in Computational Molecular Biology - 18th Annual International Conference, RECOMB 2014, Proceedings
PB - Springer
Y2 - 2 April 2014 through 5 April 2014
ER -