Abstract

Summary PASTA is a multiple sequence method that uses divide-and-conquer plus iteration to enable base alignment methods to scale with high accuracy to large sequence datasets. By default, PASTA included MAFFT L-INS-i; our new extension of PASTA enables the use of MAFFT G-INS-i, MAFFT Homologs, CONTRAlign and ProbCons. We analyzed the performance of each base method and PASTA using these base methods on 224 datasets from BAliBASE 4 with at least 50 sequences. We show that PASTA enables the most accurate base methods to scale to larger datasets at reduced computational effort, and generally improves alignment and tree accuracy on the largest BAliBASE datasets. Availability and implementation PASTA is available at https://github.com/kodicollins/pasta and has also been integrated into the original PASTA repository at https://github.com/smirarab/pasta. Supplementary informationSupplementary dataare available at Bioinformatics online.

Original languageEnglish (US)
Pages (from-to)3939-3941
Number of pages3
JournalBioinformatics
Volume34
Issue number22
DOIs
StatePublished - Nov 15 2018

Fingerprint

Proteins
Protein
Bioinformatics
Large Data Sets
Availability
Alignment
Divide and conquer
Computational Biology
Repository
High Accuracy
Iteration
Datasets

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this

PASTA for proteins. / Collins, Kodi; Warnow, Tandy.

In: Bioinformatics, Vol. 34, No. 22, 15.11.2018, p. 3939-3941.

Research output: Contribution to journalArticle

Collins, Kodi ; Warnow, Tandy. / PASTA for proteins. In: Bioinformatics. 2018 ; Vol. 34, No. 22. pp. 3939-3941.
@article{8bba1db6009447b68f07073b1c8037c6,
title = "PASTA for proteins",
abstract = "Summary PASTA is a multiple sequence method that uses divide-and-conquer plus iteration to enable base alignment methods to scale with high accuracy to large sequence datasets. By default, PASTA included MAFFT L-INS-i; our new extension of PASTA enables the use of MAFFT G-INS-i, MAFFT Homologs, CONTRAlign and ProbCons. We analyzed the performance of each base method and PASTA using these base methods on 224 datasets from BAliBASE 4 with at least 50 sequences. We show that PASTA enables the most accurate base methods to scale to larger datasets at reduced computational effort, and generally improves alignment and tree accuracy on the largest BAliBASE datasets. Availability and implementation PASTA is available at https://github.com/kodicollins/pasta and has also been integrated into the original PASTA repository at https://github.com/smirarab/pasta. Supplementary informationSupplementary dataare available at Bioinformatics online.",
author = "Kodi Collins and Tandy Warnow",
year = "2018",
month = "11",
day = "15",
doi = "10.1093/bioinformatics/bty495",
language = "English (US)",
volume = "34",
pages = "3939--3941",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "22",

}

TY - JOUR

T1 - PASTA for proteins

AU - Collins, Kodi

AU - Warnow, Tandy

PY - 2018/11/15

Y1 - 2018/11/15

N2 - Summary PASTA is a multiple sequence method that uses divide-and-conquer plus iteration to enable base alignment methods to scale with high accuracy to large sequence datasets. By default, PASTA included MAFFT L-INS-i; our new extension of PASTA enables the use of MAFFT G-INS-i, MAFFT Homologs, CONTRAlign and ProbCons. We analyzed the performance of each base method and PASTA using these base methods on 224 datasets from BAliBASE 4 with at least 50 sequences. We show that PASTA enables the most accurate base methods to scale to larger datasets at reduced computational effort, and generally improves alignment and tree accuracy on the largest BAliBASE datasets. Availability and implementation PASTA is available at https://github.com/kodicollins/pasta and has also been integrated into the original PASTA repository at https://github.com/smirarab/pasta. Supplementary informationSupplementary dataare available at Bioinformatics online.

AB - Summary PASTA is a multiple sequence method that uses divide-and-conquer plus iteration to enable base alignment methods to scale with high accuracy to large sequence datasets. By default, PASTA included MAFFT L-INS-i; our new extension of PASTA enables the use of MAFFT G-INS-i, MAFFT Homologs, CONTRAlign and ProbCons. We analyzed the performance of each base method and PASTA using these base methods on 224 datasets from BAliBASE 4 with at least 50 sequences. We show that PASTA enables the most accurate base methods to scale to larger datasets at reduced computational effort, and generally improves alignment and tree accuracy on the largest BAliBASE datasets. Availability and implementation PASTA is available at https://github.com/kodicollins/pasta and has also been integrated into the original PASTA repository at https://github.com/smirarab/pasta. Supplementary informationSupplementary dataare available at Bioinformatics online.

UR - http://www.scopus.com/inward/record.url?scp=85056373690&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85056373690&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/bty495

DO - 10.1093/bioinformatics/bty495

M3 - Article

C2 - 29931282

AN - SCOPUS:85056373690

VL - 34

SP - 3939

EP - 3941

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 22

ER -