Benchmark datasets and software for developing and testing methods for large-scale multiple sequence alignment and phylogenetic inference

C. Randal Linder, Rahul Suri, Kevin Liu, Tandy Warnow

Research output: Contribution to journalArticle

Abstract

We have assembled a collection of web pages that contain benchmark datasets and software tools to enable the evaluation of the accuracy and scalability of computational methods for estimating evolutionary relationships. They provide a resource to the scientific community for development of new alignment and tree inference methods on very difficult datasets. The datasets are intended to help address three problems: multiple sequence alignment, phylogeny estimation given aligned sequences, and supertree estimation. Datasets from our work include empirical datasets with carefully curated alignments suitable for testing alignment and phylogenetic methods for large-scale systematics studies. Links to other empirical datasets, lacking curated alignments, are also provided. We also include simulated datasets with properties typical of large-scale systematics studies, including high rates of substitutions and indels, and we include the true alignment and tree for each simulated dataset. Finally, we provide links to software tools for generating simulated datasets, and for evaluating the accuracy of alignments and trees estimated on these datasets. We welcome contributions to the benchmark datasets from other researchers.

Original languageEnglish (US)
Article numberecurrents.RRN1195
JournalPLoS Currents
Issue numberNOV
DOIs
StatePublished - Dec 1 2010
Externally publishedYes

Fingerprint

Benchmarking
Sequence Alignment
Software
Datasets
Social Planning
Phylogeny
Research Personnel

ASJC Scopus subject areas

  • Medicine (miscellaneous)

Cite this

Benchmark datasets and software for developing and testing methods for large-scale multiple sequence alignment and phylogenetic inference. / Linder, C. Randal; Suri, Rahul; Liu, Kevin; Warnow, Tandy.

In: PLoS Currents, No. NOV, ecurrents.RRN1195, 01.12.2010.

Research output: Contribution to journalArticle

@article{5f2cf52dcda74a1fb8afc29ec6554ff1,
title = "Benchmark datasets and software for developing and testing methods for large-scale multiple sequence alignment and phylogenetic inference",
abstract = "We have assembled a collection of web pages that contain benchmark datasets and software tools to enable the evaluation of the accuracy and scalability of computational methods for estimating evolutionary relationships. They provide a resource to the scientific community for development of new alignment and tree inference methods on very difficult datasets. The datasets are intended to help address three problems: multiple sequence alignment, phylogeny estimation given aligned sequences, and supertree estimation. Datasets from our work include empirical datasets with carefully curated alignments suitable for testing alignment and phylogenetic methods for large-scale systematics studies. Links to other empirical datasets, lacking curated alignments, are also provided. We also include simulated datasets with properties typical of large-scale systematics studies, including high rates of substitutions and indels, and we include the true alignment and tree for each simulated dataset. Finally, we provide links to software tools for generating simulated datasets, and for evaluating the accuracy of alignments and trees estimated on these datasets. We welcome contributions to the benchmark datasets from other researchers.",
author = "Linder, {C. Randal} and Rahul Suri and Kevin Liu and Tandy Warnow",
year = "2010",
month = "12",
day = "1",
doi = "10.1371/currents.RRN1195",
language = "English (US)",
journal = "PLoS Currents",
issn = "2157-3999",
publisher = "Public Library of Science",
number = "NOV",

}

TY - JOUR

T1 - Benchmark datasets and software for developing and testing methods for large-scale multiple sequence alignment and phylogenetic inference

AU - Linder, C. Randal

AU - Suri, Rahul

AU - Liu, Kevin

AU - Warnow, Tandy

PY - 2010/12/1

Y1 - 2010/12/1

N2 - We have assembled a collection of web pages that contain benchmark datasets and software tools to enable the evaluation of the accuracy and scalability of computational methods for estimating evolutionary relationships. They provide a resource to the scientific community for development of new alignment and tree inference methods on very difficult datasets. The datasets are intended to help address three problems: multiple sequence alignment, phylogeny estimation given aligned sequences, and supertree estimation. Datasets from our work include empirical datasets with carefully curated alignments suitable for testing alignment and phylogenetic methods for large-scale systematics studies. Links to other empirical datasets, lacking curated alignments, are also provided. We also include simulated datasets with properties typical of large-scale systematics studies, including high rates of substitutions and indels, and we include the true alignment and tree for each simulated dataset. Finally, we provide links to software tools for generating simulated datasets, and for evaluating the accuracy of alignments and trees estimated on these datasets. We welcome contributions to the benchmark datasets from other researchers.

AB - We have assembled a collection of web pages that contain benchmark datasets and software tools to enable the evaluation of the accuracy and scalability of computational methods for estimating evolutionary relationships. They provide a resource to the scientific community for development of new alignment and tree inference methods on very difficult datasets. The datasets are intended to help address three problems: multiple sequence alignment, phylogeny estimation given aligned sequences, and supertree estimation. Datasets from our work include empirical datasets with carefully curated alignments suitable for testing alignment and phylogenetic methods for large-scale systematics studies. Links to other empirical datasets, lacking curated alignments, are also provided. We also include simulated datasets with properties typical of large-scale systematics studies, including high rates of substitutions and indels, and we include the true alignment and tree for each simulated dataset. Finally, we provide links to software tools for generating simulated datasets, and for evaluating the accuracy of alignments and trees estimated on these datasets. We welcome contributions to the benchmark datasets from other researchers.

UR - http://www.scopus.com/inward/record.url?scp=84873446299&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84873446299&partnerID=8YFLogxK

U2 - 10.1371/currents.RRN1195

DO - 10.1371/currents.RRN1195

M3 - Article

C2 - 21113335

AN - SCOPUS:84873446299

JO - PLoS Currents

JF - PLoS Currents

SN - 2157-3999

IS - NOV

M1 - ecurrents.RRN1195

ER -