Abstract

Motivation: At RECOMB-CG 2018, we presented NJMerge and showed that it could be used within a divide-and-conquer framework to scale computationally intensive methods for species tree estimation to larger datasets. However, NJMerge has two significant limitations: it can fail to return a tree and, when used within the proposed divide-and-conquer framework, has O(n5) running time for datasets with n species. Results: Here we present a new method called 'TreeMerge' that improves on NJMerge in two ways: it is guaranteed to return a tree and it has dramatically faster running time within the same divide-and-conquer framework'only O(n2) time. We use a simulation study to evaluate TreeMerge in the context of multi-locus species tree estimation with two leading methods, ASTRAL-III and RAxML. We find that the divide-and-conquer framework using TreeMerge has a minor impact on species tree accuracy, dramatically reduces running time, and enables both ASTRAL-III and RAxML to complete on datasets (that they would otherwise fail on), when given 64 GB of memory and 48 h maximum running time. Thus, TreeMerge is a step toward a larger vision of enabling researchers with limited computational resources to perform large-scale species tree estimation, which we call Phylogenomics for All.

Original languageEnglish (US)
Article numberbtz344
Pages (from-to)i417-i426
JournalBioinformatics
Volume35
Issue number14
DOIs
StatePublished - Jul 15 2019

Fingerprint

Scalability
Divide and conquer
Data storage equipment
Large Data Sets
Locus
Minor
Research Personnel
Simulation Study
Resources
Framework
Evaluate
Datasets

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this

TreeMerge : A new method for improving the scalability of species tree estimation methods. / Molloy, Erin K.; Warnow, Tandy.

In: Bioinformatics, Vol. 35, No. 14, btz344, 15.07.2019, p. i417-i426.

Research output: Contribution to journalArticle

@article{17d66f527bdb492990f15afa8a3e5e02,
title = "TreeMerge: A new method for improving the scalability of species tree estimation methods",
abstract = "Motivation: At RECOMB-CG 2018, we presented NJMerge and showed that it could be used within a divide-and-conquer framework to scale computationally intensive methods for species tree estimation to larger datasets. However, NJMerge has two significant limitations: it can fail to return a tree and, when used within the proposed divide-and-conquer framework, has O(n5) running time for datasets with n species. Results: Here we present a new method called 'TreeMerge' that improves on NJMerge in two ways: it is guaranteed to return a tree and it has dramatically faster running time within the same divide-and-conquer framework'only O(n2) time. We use a simulation study to evaluate TreeMerge in the context of multi-locus species tree estimation with two leading methods, ASTRAL-III and RAxML. We find that the divide-and-conquer framework using TreeMerge has a minor impact on species tree accuracy, dramatically reduces running time, and enables both ASTRAL-III and RAxML to complete on datasets (that they would otherwise fail on), when given 64 GB of memory and 48 h maximum running time. Thus, TreeMerge is a step toward a larger vision of enabling researchers with limited computational resources to perform large-scale species tree estimation, which we call Phylogenomics for All.",
author = "Molloy, {Erin K.} and Tandy Warnow",
year = "2019",
month = "7",
day = "15",
doi = "10.1093/bioinformatics/btz344",
language = "English (US)",
volume = "35",
pages = "i417--i426",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "14",

}

TY - JOUR

T1 - TreeMerge

T2 - A new method for improving the scalability of species tree estimation methods

AU - Molloy, Erin K.

AU - Warnow, Tandy

PY - 2019/7/15

Y1 - 2019/7/15

N2 - Motivation: At RECOMB-CG 2018, we presented NJMerge and showed that it could be used within a divide-and-conquer framework to scale computationally intensive methods for species tree estimation to larger datasets. However, NJMerge has two significant limitations: it can fail to return a tree and, when used within the proposed divide-and-conquer framework, has O(n5) running time for datasets with n species. Results: Here we present a new method called 'TreeMerge' that improves on NJMerge in two ways: it is guaranteed to return a tree and it has dramatically faster running time within the same divide-and-conquer framework'only O(n2) time. We use a simulation study to evaluate TreeMerge in the context of multi-locus species tree estimation with two leading methods, ASTRAL-III and RAxML. We find that the divide-and-conquer framework using TreeMerge has a minor impact on species tree accuracy, dramatically reduces running time, and enables both ASTRAL-III and RAxML to complete on datasets (that they would otherwise fail on), when given 64 GB of memory and 48 h maximum running time. Thus, TreeMerge is a step toward a larger vision of enabling researchers with limited computational resources to perform large-scale species tree estimation, which we call Phylogenomics for All.

AB - Motivation: At RECOMB-CG 2018, we presented NJMerge and showed that it could be used within a divide-and-conquer framework to scale computationally intensive methods for species tree estimation to larger datasets. However, NJMerge has two significant limitations: it can fail to return a tree and, when used within the proposed divide-and-conquer framework, has O(n5) running time for datasets with n species. Results: Here we present a new method called 'TreeMerge' that improves on NJMerge in two ways: it is guaranteed to return a tree and it has dramatically faster running time within the same divide-and-conquer framework'only O(n2) time. We use a simulation study to evaluate TreeMerge in the context of multi-locus species tree estimation with two leading methods, ASTRAL-III and RAxML. We find that the divide-and-conquer framework using TreeMerge has a minor impact on species tree accuracy, dramatically reduces running time, and enables both ASTRAL-III and RAxML to complete on datasets (that they would otherwise fail on), when given 64 GB of memory and 48 h maximum running time. Thus, TreeMerge is a step toward a larger vision of enabling researchers with limited computational resources to perform large-scale species tree estimation, which we call Phylogenomics for All.

UR - http://www.scopus.com/inward/record.url?scp=85068899846&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85068899846&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btz344

DO - 10.1093/bioinformatics/btz344

M3 - Article

AN - SCOPUS:85068899846

VL - 35

SP - i417-i426

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 14

M1 - btz344

ER -