FastRFS: Fast and accurate Robinson-Foulds Supertrees using constrained exact optimization

Pranjal Vachaspati, Tandy Warnow

Research output: Contribution to journalArticle

Abstract

Motivation: The estimation of phylogenetic trees is a major part of many biological dataset analyses, but maximum likelihood approaches are NP-hard and Bayesian MCMC methods do not scale well to even moderate-sized datasets. Supertree methods, which are used to construct trees from trees computed on subsets, are critically important tools for enabling the statistical estimation of phylogenies for large and potentially heterogeneous datasets. Supertree estimation is itself NP-hard, and no current supertree method has sufficient accuracy and scalability to provide good accuracy on the large datasets that supertree methods were designed for, containing thousands of species and many subset trees. Results: We present FastRFS, a new method based on a dynamic programming method we have developed to find an exact solution to the Robinson-Foulds Supertree problem within a constrained search space. FastRFS has excellent accuracy in terms of criterion scores and topological accuracy of the resultant trees, substantially improving on competing methods on a large collection of biological and simulated data. In addition, FastRFS is extremely fast, finishing in minutes on even very large datasets, and in under an hour on a biological dataset with 2228 species.

Original languageEnglish (US)
Pages (from-to)631-639
Number of pages9
JournalBioinformatics
Volume33
Issue number5
DOIs
StatePublished - Jan 1 2017

Fingerprint

Constrained optimization
Optimization
Trees (mathematics)
Dynamic programming
Large Data Sets
Maximum likelihood
Scalability
NP-complete problem
MCMC Methods
Statistical Estimation
Subset
Bayes Theorem
Phylogeny
Phylogenetic Tree
Bayesian Methods
Search Space
Dynamic Programming
Maximum Likelihood
Datasets
Exact Solution

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this

FastRFS : Fast and accurate Robinson-Foulds Supertrees using constrained exact optimization. / Vachaspati, Pranjal; Warnow, Tandy.

In: Bioinformatics, Vol. 33, No. 5, 01.01.2017, p. 631-639.

Research output: Contribution to journalArticle

@article{2428349f5d284199bbf5937e4848a1e7,
title = "FastRFS: Fast and accurate Robinson-Foulds Supertrees using constrained exact optimization",
abstract = "Motivation: The estimation of phylogenetic trees is a major part of many biological dataset analyses, but maximum likelihood approaches are NP-hard and Bayesian MCMC methods do not scale well to even moderate-sized datasets. Supertree methods, which are used to construct trees from trees computed on subsets, are critically important tools for enabling the statistical estimation of phylogenies for large and potentially heterogeneous datasets. Supertree estimation is itself NP-hard, and no current supertree method has sufficient accuracy and scalability to provide good accuracy on the large datasets that supertree methods were designed for, containing thousands of species and many subset trees. Results: We present FastRFS, a new method based on a dynamic programming method we have developed to find an exact solution to the Robinson-Foulds Supertree problem within a constrained search space. FastRFS has excellent accuracy in terms of criterion scores and topological accuracy of the resultant trees, substantially improving on competing methods on a large collection of biological and simulated data. In addition, FastRFS is extremely fast, finishing in minutes on even very large datasets, and in under an hour on a biological dataset with 2228 species.",
author = "Pranjal Vachaspati and Tandy Warnow",
year = "2017",
month = "1",
day = "1",
doi = "10.1093/bioinformatics/btw600",
language = "English (US)",
volume = "33",
pages = "631--639",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "5",

}

TY - JOUR

T1 - FastRFS

T2 - Fast and accurate Robinson-Foulds Supertrees using constrained exact optimization

AU - Vachaspati, Pranjal

AU - Warnow, Tandy

PY - 2017/1/1

Y1 - 2017/1/1

N2 - Motivation: The estimation of phylogenetic trees is a major part of many biological dataset analyses, but maximum likelihood approaches are NP-hard and Bayesian MCMC methods do not scale well to even moderate-sized datasets. Supertree methods, which are used to construct trees from trees computed on subsets, are critically important tools for enabling the statistical estimation of phylogenies for large and potentially heterogeneous datasets. Supertree estimation is itself NP-hard, and no current supertree method has sufficient accuracy and scalability to provide good accuracy on the large datasets that supertree methods were designed for, containing thousands of species and many subset trees. Results: We present FastRFS, a new method based on a dynamic programming method we have developed to find an exact solution to the Robinson-Foulds Supertree problem within a constrained search space. FastRFS has excellent accuracy in terms of criterion scores and topological accuracy of the resultant trees, substantially improving on competing methods on a large collection of biological and simulated data. In addition, FastRFS is extremely fast, finishing in minutes on even very large datasets, and in under an hour on a biological dataset with 2228 species.

AB - Motivation: The estimation of phylogenetic trees is a major part of many biological dataset analyses, but maximum likelihood approaches are NP-hard and Bayesian MCMC methods do not scale well to even moderate-sized datasets. Supertree methods, which are used to construct trees from trees computed on subsets, are critically important tools for enabling the statistical estimation of phylogenies for large and potentially heterogeneous datasets. Supertree estimation is itself NP-hard, and no current supertree method has sufficient accuracy and scalability to provide good accuracy on the large datasets that supertree methods were designed for, containing thousands of species and many subset trees. Results: We present FastRFS, a new method based on a dynamic programming method we have developed to find an exact solution to the Robinson-Foulds Supertree problem within a constrained search space. FastRFS has excellent accuracy in terms of criterion scores and topological accuracy of the resultant trees, substantially improving on competing methods on a large collection of biological and simulated data. In addition, FastRFS is extremely fast, finishing in minutes on even very large datasets, and in under an hour on a biological dataset with 2228 species.

UR - http://www.scopus.com/inward/record.url?scp=85020067505&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85020067505&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btw600

DO - 10.1093/bioinformatics/btw600

M3 - Article

C2 - 27663499

AN - SCOPUS:85020067505

VL - 33

SP - 631

EP - 639

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 5

ER -