SuperFine: Fast and accurate supertree estimation

M. Shel Swenson, Rahul Suri, C. Randal Linder, Tandy Warnow

Research output: Contribution to journalArticle

Abstract

Many research groups are estimating trees containing anywhere from a few thousands to hundreds of thousands of species, toward the eventual goal of the estimation of a Tree of Life, containing perhaps as many as several million leaves. These phylogenetic estimations present enormous computational challenges, and current computational methods are likely to fail to run even on data sets in the low end of this range. One approach to estimate a large species tree is to use phylogenetic estimation methods (such as maximum likelihood) on a supermatrix produced by concatenating multiple sequence alignments for a collection of markers; however, the most accurate of these phylogenetic estimation methods are extremely computationally intensive for data sets with more than a few thousand sequences. Supertree methods, which assemble phylogenetic trees from a collection of trees on subsets of the taxa, are important tools for phylogeny estimation where phylogenetic analyses based upon maximum likelihood (ML) are infeasible. In this paper, we introduce SuperFine, a meta-method that utilizes a novel two-step procedure in order to improve the accuracy and scalability of supertree methods. Our study, using both simulated and empirical data, shows that SuperFine-boosted supertree methods produce more accurate trees than standard supertree methods, and run quickly on very large data sets with thousands of sequences. Furthermore, SuperFine-boosted matrix representation with parsimony (MRP, the most well-known supertree method) approaches the accuracy of ML methods on supermatrix data sets under realistic conditions.

Original languageEnglish (US)
Pages (from-to)214-227
Number of pages14
JournalSystematic biology
Volume61
Issue number2
DOIs
StatePublished - Mar 1 2012
Externally publishedYes

Fingerprint

phylogenetics
phylogeny
estimation method
methodology
method
sequence alignment
Sequence Alignment
Phylogeny
matrix
Datasets
Research
leaves

Keywords

  • Algorithms
  • maximum likelihood
  • MRP
  • phylogenetics
  • simulation
  • supertrees

ASJC Scopus subject areas

  • Ecology, Evolution, Behavior and Systematics
  • Genetics

Cite this

SuperFine : Fast and accurate supertree estimation. / Swenson, M. Shel; Suri, Rahul; Linder, C. Randal; Warnow, Tandy.

In: Systematic biology, Vol. 61, No. 2, 01.03.2012, p. 214-227.

Research output: Contribution to journalArticle

Swenson, M. Shel ; Suri, Rahul ; Linder, C. Randal ; Warnow, Tandy. / SuperFine : Fast and accurate supertree estimation. In: Systematic biology. 2012 ; Vol. 61, No. 2. pp. 214-227.
@article{17e8452297424de2b90ffcfc0981733e,
title = "SuperFine: Fast and accurate supertree estimation",
abstract = "Many research groups are estimating trees containing anywhere from a few thousands to hundreds of thousands of species, toward the eventual goal of the estimation of a Tree of Life, containing perhaps as many as several million leaves. These phylogenetic estimations present enormous computational challenges, and current computational methods are likely to fail to run even on data sets in the low end of this range. One approach to estimate a large species tree is to use phylogenetic estimation methods (such as maximum likelihood) on a supermatrix produced by concatenating multiple sequence alignments for a collection of markers; however, the most accurate of these phylogenetic estimation methods are extremely computationally intensive for data sets with more than a few thousand sequences. Supertree methods, which assemble phylogenetic trees from a collection of trees on subsets of the taxa, are important tools for phylogeny estimation where phylogenetic analyses based upon maximum likelihood (ML) are infeasible. In this paper, we introduce SuperFine, a meta-method that utilizes a novel two-step procedure in order to improve the accuracy and scalability of supertree methods. Our study, using both simulated and empirical data, shows that SuperFine-boosted supertree methods produce more accurate trees than standard supertree methods, and run quickly on very large data sets with thousands of sequences. Furthermore, SuperFine-boosted matrix representation with parsimony (MRP, the most well-known supertree method) approaches the accuracy of ML methods on supermatrix data sets under realistic conditions.",
keywords = "Algorithms, maximum likelihood, MRP, phylogenetics, simulation, supertrees",
author = "Swenson, {M. Shel} and Rahul Suri and Linder, {C. Randal} and Tandy Warnow",
year = "2012",
month = "3",
day = "1",
doi = "10.1093/sysbio/syr092",
language = "English (US)",
volume = "61",
pages = "214--227",
journal = "Systematic Biology",
issn = "1063-5157",
publisher = "Oxford University Press",
number = "2",

}

TY - JOUR

T1 - SuperFine

T2 - Fast and accurate supertree estimation

AU - Swenson, M. Shel

AU - Suri, Rahul

AU - Linder, C. Randal

AU - Warnow, Tandy

PY - 2012/3/1

Y1 - 2012/3/1

N2 - Many research groups are estimating trees containing anywhere from a few thousands to hundreds of thousands of species, toward the eventual goal of the estimation of a Tree of Life, containing perhaps as many as several million leaves. These phylogenetic estimations present enormous computational challenges, and current computational methods are likely to fail to run even on data sets in the low end of this range. One approach to estimate a large species tree is to use phylogenetic estimation methods (such as maximum likelihood) on a supermatrix produced by concatenating multiple sequence alignments for a collection of markers; however, the most accurate of these phylogenetic estimation methods are extremely computationally intensive for data sets with more than a few thousand sequences. Supertree methods, which assemble phylogenetic trees from a collection of trees on subsets of the taxa, are important tools for phylogeny estimation where phylogenetic analyses based upon maximum likelihood (ML) are infeasible. In this paper, we introduce SuperFine, a meta-method that utilizes a novel two-step procedure in order to improve the accuracy and scalability of supertree methods. Our study, using both simulated and empirical data, shows that SuperFine-boosted supertree methods produce more accurate trees than standard supertree methods, and run quickly on very large data sets with thousands of sequences. Furthermore, SuperFine-boosted matrix representation with parsimony (MRP, the most well-known supertree method) approaches the accuracy of ML methods on supermatrix data sets under realistic conditions.

AB - Many research groups are estimating trees containing anywhere from a few thousands to hundreds of thousands of species, toward the eventual goal of the estimation of a Tree of Life, containing perhaps as many as several million leaves. These phylogenetic estimations present enormous computational challenges, and current computational methods are likely to fail to run even on data sets in the low end of this range. One approach to estimate a large species tree is to use phylogenetic estimation methods (such as maximum likelihood) on a supermatrix produced by concatenating multiple sequence alignments for a collection of markers; however, the most accurate of these phylogenetic estimation methods are extremely computationally intensive for data sets with more than a few thousand sequences. Supertree methods, which assemble phylogenetic trees from a collection of trees on subsets of the taxa, are important tools for phylogeny estimation where phylogenetic analyses based upon maximum likelihood (ML) are infeasible. In this paper, we introduce SuperFine, a meta-method that utilizes a novel two-step procedure in order to improve the accuracy and scalability of supertree methods. Our study, using both simulated and empirical data, shows that SuperFine-boosted supertree methods produce more accurate trees than standard supertree methods, and run quickly on very large data sets with thousands of sequences. Furthermore, SuperFine-boosted matrix representation with parsimony (MRP, the most well-known supertree method) approaches the accuracy of ML methods on supermatrix data sets under realistic conditions.

KW - Algorithms

KW - maximum likelihood

KW - MRP

KW - phylogenetics

KW - simulation

KW - supertrees

UR - http://www.scopus.com/inward/record.url?scp=84857250162&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84857250162&partnerID=8YFLogxK

U2 - 10.1093/sysbio/syr092

DO - 10.1093/sysbio/syr092

M3 - Article

C2 - 21934137

AN - SCOPUS:84857250162

VL - 61

SP - 214

EP - 227

JO - Systematic Biology

JF - Systematic Biology

SN - 1063-5157

IS - 2

ER -