NJMerge: A Generic technique for scaling phylogeny estimation methods and its application to species trees

Erin K. Molloy, Tandy Warnow

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Divide-and-conquer methods, which divide the species set into overlapping subsets, construct trees on the subsets, and then combine the trees using a supertree method, provide a key algorithmic framework for boosting the scalability of phylogeny estimation methods to large datasets. Yet the use of supertree methods, which typically attempt to solve NP-hard optimization problems, limits the scalability of these approaches. In this paper, we present a new divide-and-conquer approach that does not require supertree estimation: we divide the species set into disjoint subsets, construct trees on the subsets, and then combine the trees using a distance matrix computed on the full species set. For this merger step, we present a new method, called NJMerge, which is a polynomial-time extension of the Neighbor Joining algorithm. We report on the results of an extensive simulation study evaluating NJMerge’s utility in scaling three popular species tree estimation methods: ASTRAL, SVDquartets, and concatenation analysis using RAxML. We find that NJMerge provides substantial improvements in running time without sacrificing accuracy and sometimes even improves accuracy. Furthermore, although NJMerge can sometimes fail to return a tree, the failure rate in our experiments is less than 1%. Together, these results suggest that NJMerge is a valuable technique for scaling computationally intensive methods to larger datasets, especially when computational resources are limited. NJMerge is freely available on Github: https://github.com/ekmolloy/njmerge. All datasets, scripts, and supplementary materials are freely available through the Illinois Data Bank: https://doi.org/10.13012/B2IDB-1424746_V1.

Original languageEnglish (US)
Title of host publicationComparative Genomics - 16th International Conference, RECOMB-CG 2018, Proceedings
EditorsAïda Ouangraoua, Mathieu Blanchette
PublisherSpringer-Verlag
Pages260-276
Number of pages17
ISBN (Print)9783030008338
DOIs
StatePublished - Jan 1 2018
Event16th International Conference on Comparative Genomics, RECOMB-CG 2018 - Magog-Orford, Canada
Duration: Oct 9 2018Oct 12 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11183 LNBI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other16th International Conference on Comparative Genomics, RECOMB-CG 2018
CountryCanada
CityMagog-Orford
Period10/9/1810/12/18

Fingerprint

Phylogeny
Scaling
Scalability
Subset
Divide and conquer
Joining
Large Data Sets
Divides
Polynomials
Mergers
Distance Matrix
Concatenation
Failure Rate
Boosting
NP-hard Problems
Experiments
Overlapping
Polynomial time
Disjoint
Simulation Study

Keywords

  • ASTRAL
  • Divide-and-conquer
  • Incomplete lineage sorting
  • NJst
  • Neighbor Joining
  • Phylogenomics
  • SVDquartets
  • Species trees

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Molloy, E. K., & Warnow, T. (2018). NJMerge: A Generic technique for scaling phylogeny estimation methods and its application to species trees. In A. Ouangraoua, & M. Blanchette (Eds.), Comparative Genomics - 16th International Conference, RECOMB-CG 2018, Proceedings (pp. 260-276). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11183 LNBI). Springer-Verlag. https://doi.org/10.1007/978-3-030-00834-5_15

NJMerge : A Generic technique for scaling phylogeny estimation methods and its application to species trees. / Molloy, Erin K.; Warnow, Tandy.

Comparative Genomics - 16th International Conference, RECOMB-CG 2018, Proceedings. ed. / Aïda Ouangraoua; Mathieu Blanchette. Springer-Verlag, 2018. p. 260-276 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11183 LNBI).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Molloy, EK & Warnow, T 2018, NJMerge: A Generic technique for scaling phylogeny estimation methods and its application to species trees. in A Ouangraoua & M Blanchette (eds), Comparative Genomics - 16th International Conference, RECOMB-CG 2018, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11183 LNBI, Springer-Verlag, pp. 260-276, 16th International Conference on Comparative Genomics, RECOMB-CG 2018, Magog-Orford, Canada, 10/9/18. https://doi.org/10.1007/978-3-030-00834-5_15
Molloy EK, Warnow T. NJMerge: A Generic technique for scaling phylogeny estimation methods and its application to species trees. In Ouangraoua A, Blanchette M, editors, Comparative Genomics - 16th International Conference, RECOMB-CG 2018, Proceedings. Springer-Verlag. 2018. p. 260-276. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-030-00834-5_15
Molloy, Erin K. ; Warnow, Tandy. / NJMerge : A Generic technique for scaling phylogeny estimation methods and its application to species trees. Comparative Genomics - 16th International Conference, RECOMB-CG 2018, Proceedings. editor / Aïda Ouangraoua ; Mathieu Blanchette. Springer-Verlag, 2018. pp. 260-276 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{1b248ab37a3c4602ad78d1ec661e90bf,
title = "NJMerge: A Generic technique for scaling phylogeny estimation methods and its application to species trees",
abstract = "Divide-and-conquer methods, which divide the species set into overlapping subsets, construct trees on the subsets, and then combine the trees using a supertree method, provide a key algorithmic framework for boosting the scalability of phylogeny estimation methods to large datasets. Yet the use of supertree methods, which typically attempt to solve NP-hard optimization problems, limits the scalability of these approaches. In this paper, we present a new divide-and-conquer approach that does not require supertree estimation: we divide the species set into disjoint subsets, construct trees on the subsets, and then combine the trees using a distance matrix computed on the full species set. For this merger step, we present a new method, called NJMerge, which is a polynomial-time extension of the Neighbor Joining algorithm. We report on the results of an extensive simulation study evaluating NJMerge’s utility in scaling three popular species tree estimation methods: ASTRAL, SVDquartets, and concatenation analysis using RAxML. We find that NJMerge provides substantial improvements in running time without sacrificing accuracy and sometimes even improves accuracy. Furthermore, although NJMerge can sometimes fail to return a tree, the failure rate in our experiments is less than 1{\%}. Together, these results suggest that NJMerge is a valuable technique for scaling computationally intensive methods to larger datasets, especially when computational resources are limited. NJMerge is freely available on Github: https://github.com/ekmolloy/njmerge. All datasets, scripts, and supplementary materials are freely available through the Illinois Data Bank: https://doi.org/10.13012/B2IDB-1424746_V1.",
keywords = "ASTRAL, Divide-and-conquer, Incomplete lineage sorting, NJst, Neighbor Joining, Phylogenomics, SVDquartets, Species trees",
author = "Molloy, {Erin K.} and Tandy Warnow",
year = "2018",
month = "1",
day = "1",
doi = "10.1007/978-3-030-00834-5_15",
language = "English (US)",
isbn = "9783030008338",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer-Verlag",
pages = "260--276",
editor = "A{\"i}da Ouangraoua and Mathieu Blanchette",
booktitle = "Comparative Genomics - 16th International Conference, RECOMB-CG 2018, Proceedings",

}

TY - GEN

T1 - NJMerge

T2 - A Generic technique for scaling phylogeny estimation methods and its application to species trees

AU - Molloy, Erin K.

AU - Warnow, Tandy

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Divide-and-conquer methods, which divide the species set into overlapping subsets, construct trees on the subsets, and then combine the trees using a supertree method, provide a key algorithmic framework for boosting the scalability of phylogeny estimation methods to large datasets. Yet the use of supertree methods, which typically attempt to solve NP-hard optimization problems, limits the scalability of these approaches. In this paper, we present a new divide-and-conquer approach that does not require supertree estimation: we divide the species set into disjoint subsets, construct trees on the subsets, and then combine the trees using a distance matrix computed on the full species set. For this merger step, we present a new method, called NJMerge, which is a polynomial-time extension of the Neighbor Joining algorithm. We report on the results of an extensive simulation study evaluating NJMerge’s utility in scaling three popular species tree estimation methods: ASTRAL, SVDquartets, and concatenation analysis using RAxML. We find that NJMerge provides substantial improvements in running time without sacrificing accuracy and sometimes even improves accuracy. Furthermore, although NJMerge can sometimes fail to return a tree, the failure rate in our experiments is less than 1%. Together, these results suggest that NJMerge is a valuable technique for scaling computationally intensive methods to larger datasets, especially when computational resources are limited. NJMerge is freely available on Github: https://github.com/ekmolloy/njmerge. All datasets, scripts, and supplementary materials are freely available through the Illinois Data Bank: https://doi.org/10.13012/B2IDB-1424746_V1.

AB - Divide-and-conquer methods, which divide the species set into overlapping subsets, construct trees on the subsets, and then combine the trees using a supertree method, provide a key algorithmic framework for boosting the scalability of phylogeny estimation methods to large datasets. Yet the use of supertree methods, which typically attempt to solve NP-hard optimization problems, limits the scalability of these approaches. In this paper, we present a new divide-and-conquer approach that does not require supertree estimation: we divide the species set into disjoint subsets, construct trees on the subsets, and then combine the trees using a distance matrix computed on the full species set. For this merger step, we present a new method, called NJMerge, which is a polynomial-time extension of the Neighbor Joining algorithm. We report on the results of an extensive simulation study evaluating NJMerge’s utility in scaling three popular species tree estimation methods: ASTRAL, SVDquartets, and concatenation analysis using RAxML. We find that NJMerge provides substantial improvements in running time without sacrificing accuracy and sometimes even improves accuracy. Furthermore, although NJMerge can sometimes fail to return a tree, the failure rate in our experiments is less than 1%. Together, these results suggest that NJMerge is a valuable technique for scaling computationally intensive methods to larger datasets, especially when computational resources are limited. NJMerge is freely available on Github: https://github.com/ekmolloy/njmerge. All datasets, scripts, and supplementary materials are freely available through the Illinois Data Bank: https://doi.org/10.13012/B2IDB-1424746_V1.

KW - ASTRAL

KW - Divide-and-conquer

KW - Incomplete lineage sorting

KW - NJst

KW - Neighbor Joining

KW - Phylogenomics

KW - SVDquartets

KW - Species trees

UR - http://www.scopus.com/inward/record.url?scp=85055104293&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85055104293&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-00834-5_15

DO - 10.1007/978-3-030-00834-5_15

M3 - Conference contribution

AN - SCOPUS:85055104293

SN - 9783030008338

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 260

EP - 276

BT - Comparative Genomics - 16th International Conference, RECOMB-CG 2018, Proceedings

A2 - Ouangraoua, Aïda

A2 - Blanchette, Mathieu

PB - Springer-Verlag

ER -