PASTA: Ultra-large multiple sequence alignment

Siavash Mirarab, Nam Nguyen, Tandy Warnow

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper, we introduce a new and highly scalable algorithm, PASTA, for large-scale multiple sequence alignment estimation. PASTA uses a new technique to produce an alignment given a guide tree that enables it to be both highly scalable and very accurate. We present a study on biological and simulated data with up to 200,000 sequences, showing that PASTA produces highly accurate alignments, improving on the accuracy of the leading alignment methods on large datasets, and is able to analyze much larger datasets than the current methods. We also show that trees estimated on PASTA alignments are highly accurate - slightly better than SATé trees, but with substantial improvements relative to other methods. Finally, PASTA is very fast, highly parallelizable, and requires relatively little memory.

Original languageEnglish (US)
Title of host publicationResearch in Computational Molecular Biology - 18th Annual International Conference, RECOMB 2014, Proceedings
PublisherSpringer-Verlag
Pages177-191
Number of pages15
ISBN (Print)9783319052687
DOIs
StatePublished - Jan 1 2014
Externally publishedYes
Event18th Annual International Conference on Research in Computational Molecular Biology, RECOMB 2014 - Pittsburgh, PA, United States
Duration: Apr 2 2014Apr 5 2014

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8394 LNBI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other18th Annual International Conference on Research in Computational Molecular Biology, RECOMB 2014
CountryUnited States
CityPittsburgh, PA
Period4/2/144/5/14

Fingerprint

Multiple Sequence Alignment
Alignment
Large Data Sets
Data storage equipment

Keywords

  • Multiple sequence alignment
  • SATé
  • Ultra-large

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Mirarab, S., Nguyen, N., & Warnow, T. (2014). PASTA: Ultra-large multiple sequence alignment. In Research in Computational Molecular Biology - 18th Annual International Conference, RECOMB 2014, Proceedings (pp. 177-191). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8394 LNBI). Springer-Verlag. https://doi.org/10.1007/978-3-319-05269-4_15

PASTA : Ultra-large multiple sequence alignment. / Mirarab, Siavash; Nguyen, Nam; Warnow, Tandy.

Research in Computational Molecular Biology - 18th Annual International Conference, RECOMB 2014, Proceedings. Springer-Verlag, 2014. p. 177-191 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8394 LNBI).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Mirarab, S, Nguyen, N & Warnow, T 2014, PASTA: Ultra-large multiple sequence alignment. in Research in Computational Molecular Biology - 18th Annual International Conference, RECOMB 2014, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 8394 LNBI, Springer-Verlag, pp. 177-191, 18th Annual International Conference on Research in Computational Molecular Biology, RECOMB 2014, Pittsburgh, PA, United States, 4/2/14. https://doi.org/10.1007/978-3-319-05269-4_15
Mirarab S, Nguyen N, Warnow T. PASTA: Ultra-large multiple sequence alignment. In Research in Computational Molecular Biology - 18th Annual International Conference, RECOMB 2014, Proceedings. Springer-Verlag. 2014. p. 177-191. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-319-05269-4_15
Mirarab, Siavash ; Nguyen, Nam ; Warnow, Tandy. / PASTA : Ultra-large multiple sequence alignment. Research in Computational Molecular Biology - 18th Annual International Conference, RECOMB 2014, Proceedings. Springer-Verlag, 2014. pp. 177-191 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{eb7ce679056c4f708029bfc15c52029a,
title = "PASTA: Ultra-large multiple sequence alignment",
abstract = "In this paper, we introduce a new and highly scalable algorithm, PASTA, for large-scale multiple sequence alignment estimation. PASTA uses a new technique to produce an alignment given a guide tree that enables it to be both highly scalable and very accurate. We present a study on biological and simulated data with up to 200,000 sequences, showing that PASTA produces highly accurate alignments, improving on the accuracy of the leading alignment methods on large datasets, and is able to analyze much larger datasets than the current methods. We also show that trees estimated on PASTA alignments are highly accurate - slightly better than SAT{\'e} trees, but with substantial improvements relative to other methods. Finally, PASTA is very fast, highly parallelizable, and requires relatively little memory.",
keywords = "Multiple sequence alignment, SAT{\'e}, Ultra-large",
author = "Siavash Mirarab and Nam Nguyen and Tandy Warnow",
year = "2014",
month = "1",
day = "1",
doi = "10.1007/978-3-319-05269-4_15",
language = "English (US)",
isbn = "9783319052687",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer-Verlag",
pages = "177--191",
booktitle = "Research in Computational Molecular Biology - 18th Annual International Conference, RECOMB 2014, Proceedings",

}

TY - GEN

T1 - PASTA

T2 - Ultra-large multiple sequence alignment

AU - Mirarab, Siavash

AU - Nguyen, Nam

AU - Warnow, Tandy

PY - 2014/1/1

Y1 - 2014/1/1

N2 - In this paper, we introduce a new and highly scalable algorithm, PASTA, for large-scale multiple sequence alignment estimation. PASTA uses a new technique to produce an alignment given a guide tree that enables it to be both highly scalable and very accurate. We present a study on biological and simulated data with up to 200,000 sequences, showing that PASTA produces highly accurate alignments, improving on the accuracy of the leading alignment methods on large datasets, and is able to analyze much larger datasets than the current methods. We also show that trees estimated on PASTA alignments are highly accurate - slightly better than SATé trees, but with substantial improvements relative to other methods. Finally, PASTA is very fast, highly parallelizable, and requires relatively little memory.

AB - In this paper, we introduce a new and highly scalable algorithm, PASTA, for large-scale multiple sequence alignment estimation. PASTA uses a new technique to produce an alignment given a guide tree that enables it to be both highly scalable and very accurate. We present a study on biological and simulated data with up to 200,000 sequences, showing that PASTA produces highly accurate alignments, improving on the accuracy of the leading alignment methods on large datasets, and is able to analyze much larger datasets than the current methods. We also show that trees estimated on PASTA alignments are highly accurate - slightly better than SATé trees, but with substantial improvements relative to other methods. Finally, PASTA is very fast, highly parallelizable, and requires relatively little memory.

KW - Multiple sequence alignment

KW - SATé

KW - Ultra-large

UR - http://www.scopus.com/inward/record.url?scp=84958551141&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84958551141&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-05269-4_15

DO - 10.1007/978-3-319-05269-4_15

M3 - Conference contribution

AN - SCOPUS:84958551141

SN - 9783319052687

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 177

EP - 191

BT - Research in Computational Molecular Biology - 18th Annual International Conference, RECOMB 2014, Proceedings

PB - Springer-Verlag

ER -