PASTA: Ultra-large multiple sequence alignment

Siavash Mirarab, Nam Nguyen, Tandy Warnow

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper, we introduce a new and highly scalable algorithm, PASTA, for large-scale multiple sequence alignment estimation. PASTA uses a new technique to produce an alignment given a guide tree that enables it to be both highly scalable and very accurate. We present a study on biological and simulated data with up to 200,000 sequences, showing that PASTA produces highly accurate alignments, improving on the accuracy of the leading alignment methods on large datasets, and is able to analyze much larger datasets than the current methods. We also show that trees estimated on PASTA alignments are highly accurate - slightly better than SATé trees, but with substantial improvements relative to other methods. Finally, PASTA is very fast, highly parallelizable, and requires relatively little memory.

Original languageEnglish (US)
Title of host publicationResearch in Computational Molecular Biology - 18th Annual International Conference, RECOMB 2014, Proceedings
PublisherSpringer
Pages177-191
Number of pages15
ISBN (Print)9783319052687
DOIs
StatePublished - 2014
Externally publishedYes
Event18th Annual International Conference on Research in Computational Molecular Biology, RECOMB 2014 - Pittsburgh, PA, United States
Duration: Apr 2 2014Apr 5 2014

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8394 LNBI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other18th Annual International Conference on Research in Computational Molecular Biology, RECOMB 2014
Country/TerritoryUnited States
CityPittsburgh, PA
Period4/2/144/5/14

Keywords

  • Multiple sequence alignment
  • SATé
  • Ultra-large

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'PASTA: Ultra-large multiple sequence alignment'. Together they form a unique fingerprint.

Cite this