New absolute fast converging phylogeny estimation methods with improved scalability and accuracy

Qiuyi Zhang, Satish Rao, Tandy Warnow

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Absolute fast converging (AFC) phylogeny estimation methods are ones that have been proven to recover the true tree with high probability given sequences whose lengths are polynomial in the number of number of leaves in the tree (once the shortest and longest branch lengths are fixed). While there has been a large literature on AFC methods, the best in terms of empirical performance was DCMNJ , published in SODA 2001. The main empirical advantage of DCMNJ over other AFC methods is its use of neighbor joining (NJ) to construct trees on smaller taxon subsets, which are then combined into a tree on the full set of species using a supertree method; in contrast, the other AFC methods in essence depend on quartet trees that are computed independently of each other, which reduces accuracy compared to neighbor joining. However, DCMNJ is unlikely to scale to large datasets due to its reliance on supertree methods, as no current supertree methods are able to scale to large datasets with high accuracy. In this study we present a new approach to large-scale phylogeny estimation that shares some of the features of DCMNJ but bypasses the use of supertree methods. We prove that this new approach is AFC and uses polynomial time. Furthermore, we describe variations on this basic approach that can be used with leaf-disjoint constraint trees (computed using methods such as maximum likelihood) to produce other AFC methods that are likely to provide even better accuracy. Thus, we present a new generalizable technique for large-scale tree estimation that is designed to improve scalability for phylogeny estimation methods to ultra-large datasets, and that can be used in a variety of settings (including tree estimation from unaligned sequences, and species tree estimation from gene trees).

Original languageEnglish (US)
Title of host publication18th International Workshop on Algorithms in Bioinformatics, WABI 2018
EditorsLaxmi Parida, Esko Ukkonen
PublisherSchloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
ISBN (Print)9783959770828
DOIs
StatePublished - Aug 1 2018
Event18th International Workshop on Algorithms in Bioinformatics, WABI 2018 - Helsinki, Finland
Duration: Aug 20 2018Aug 22 2018

Publication series

NameLeibniz International Proceedings in Informatics, LIPIcs
Volume113
ISSN (Print)1868-8969

Other

Other18th International Workshop on Algorithms in Bioinformatics, WABI 2018
CountryFinland
CityHelsinki
Period8/20/188/22/18

Fingerprint

Scalability
Joining
Polynomials
Trees (mathematics)
Maximum likelihood
Phylogeny
Genes

Keywords

  • Absolute fast converging methods
  • Maximum likelihood
  • Neighbor joining
  • Phylogeny estimation
  • Sample complexity
  • Short quartets

ASJC Scopus subject areas

  • Software

Cite this

Zhang, Q., Rao, S., & Warnow, T. (2018). New absolute fast converging phylogeny estimation methods with improved scalability and accuracy. In L. Parida, & E. Ukkonen (Eds.), 18th International Workshop on Algorithms in Bioinformatics, WABI 2018 [8] (Leibniz International Proceedings in Informatics, LIPIcs; Vol. 113). Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing. https://doi.org/10.4230/LIPIcs.WABI.2018.8

New absolute fast converging phylogeny estimation methods with improved scalability and accuracy. / Zhang, Qiuyi; Rao, Satish; Warnow, Tandy.

18th International Workshop on Algorithms in Bioinformatics, WABI 2018. ed. / Laxmi Parida; Esko Ukkonen. Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing, 2018. 8 (Leibniz International Proceedings in Informatics, LIPIcs; Vol. 113).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Zhang, Q, Rao, S & Warnow, T 2018, New absolute fast converging phylogeny estimation methods with improved scalability and accuracy. in L Parida & E Ukkonen (eds), 18th International Workshop on Algorithms in Bioinformatics, WABI 2018., 8, Leibniz International Proceedings in Informatics, LIPIcs, vol. 113, Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing, 18th International Workshop on Algorithms in Bioinformatics, WABI 2018, Helsinki, Finland, 8/20/18. https://doi.org/10.4230/LIPIcs.WABI.2018.8
Zhang Q, Rao S, Warnow T. New absolute fast converging phylogeny estimation methods with improved scalability and accuracy. In Parida L, Ukkonen E, editors, 18th International Workshop on Algorithms in Bioinformatics, WABI 2018. Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing. 2018. 8. (Leibniz International Proceedings in Informatics, LIPIcs). https://doi.org/10.4230/LIPIcs.WABI.2018.8
Zhang, Qiuyi ; Rao, Satish ; Warnow, Tandy. / New absolute fast converging phylogeny estimation methods with improved scalability and accuracy. 18th International Workshop on Algorithms in Bioinformatics, WABI 2018. editor / Laxmi Parida ; Esko Ukkonen. Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing, 2018. (Leibniz International Proceedings in Informatics, LIPIcs).
@inproceedings{7463ac53ef1544ae9904cef564faf3c8,
title = "New absolute fast converging phylogeny estimation methods with improved scalability and accuracy",
abstract = "Absolute fast converging (AFC) phylogeny estimation methods are ones that have been proven to recover the true tree with high probability given sequences whose lengths are polynomial in the number of number of leaves in the tree (once the shortest and longest branch lengths are fixed). While there has been a large literature on AFC methods, the best in terms of empirical performance was DCMNJ , published in SODA 2001. The main empirical advantage of DCMNJ over other AFC methods is its use of neighbor joining (NJ) to construct trees on smaller taxon subsets, which are then combined into a tree on the full set of species using a supertree method; in contrast, the other AFC methods in essence depend on quartet trees that are computed independently of each other, which reduces accuracy compared to neighbor joining. However, DCMNJ is unlikely to scale to large datasets due to its reliance on supertree methods, as no current supertree methods are able to scale to large datasets with high accuracy. In this study we present a new approach to large-scale phylogeny estimation that shares some of the features of DCMNJ but bypasses the use of supertree methods. We prove that this new approach is AFC and uses polynomial time. Furthermore, we describe variations on this basic approach that can be used with leaf-disjoint constraint trees (computed using methods such as maximum likelihood) to produce other AFC methods that are likely to provide even better accuracy. Thus, we present a new generalizable technique for large-scale tree estimation that is designed to improve scalability for phylogeny estimation methods to ultra-large datasets, and that can be used in a variety of settings (including tree estimation from unaligned sequences, and species tree estimation from gene trees).",
keywords = "Absolute fast converging methods, Maximum likelihood, Neighbor joining, Phylogeny estimation, Sample complexity, Short quartets",
author = "Qiuyi Zhang and Satish Rao and Tandy Warnow",
year = "2018",
month = "8",
day = "1",
doi = "10.4230/LIPIcs.WABI.2018.8",
language = "English (US)",
isbn = "9783959770828",
series = "Leibniz International Proceedings in Informatics, LIPIcs",
publisher = "Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing",
editor = "Laxmi Parida and Esko Ukkonen",
booktitle = "18th International Workshop on Algorithms in Bioinformatics, WABI 2018",

}

TY - GEN

T1 - New absolute fast converging phylogeny estimation methods with improved scalability and accuracy

AU - Zhang, Qiuyi

AU - Rao, Satish

AU - Warnow, Tandy

PY - 2018/8/1

Y1 - 2018/8/1

N2 - Absolute fast converging (AFC) phylogeny estimation methods are ones that have been proven to recover the true tree with high probability given sequences whose lengths are polynomial in the number of number of leaves in the tree (once the shortest and longest branch lengths are fixed). While there has been a large literature on AFC methods, the best in terms of empirical performance was DCMNJ , published in SODA 2001. The main empirical advantage of DCMNJ over other AFC methods is its use of neighbor joining (NJ) to construct trees on smaller taxon subsets, which are then combined into a tree on the full set of species using a supertree method; in contrast, the other AFC methods in essence depend on quartet trees that are computed independently of each other, which reduces accuracy compared to neighbor joining. However, DCMNJ is unlikely to scale to large datasets due to its reliance on supertree methods, as no current supertree methods are able to scale to large datasets with high accuracy. In this study we present a new approach to large-scale phylogeny estimation that shares some of the features of DCMNJ but bypasses the use of supertree methods. We prove that this new approach is AFC and uses polynomial time. Furthermore, we describe variations on this basic approach that can be used with leaf-disjoint constraint trees (computed using methods such as maximum likelihood) to produce other AFC methods that are likely to provide even better accuracy. Thus, we present a new generalizable technique for large-scale tree estimation that is designed to improve scalability for phylogeny estimation methods to ultra-large datasets, and that can be used in a variety of settings (including tree estimation from unaligned sequences, and species tree estimation from gene trees).

AB - Absolute fast converging (AFC) phylogeny estimation methods are ones that have been proven to recover the true tree with high probability given sequences whose lengths are polynomial in the number of number of leaves in the tree (once the shortest and longest branch lengths are fixed). While there has been a large literature on AFC methods, the best in terms of empirical performance was DCMNJ , published in SODA 2001. The main empirical advantage of DCMNJ over other AFC methods is its use of neighbor joining (NJ) to construct trees on smaller taxon subsets, which are then combined into a tree on the full set of species using a supertree method; in contrast, the other AFC methods in essence depend on quartet trees that are computed independently of each other, which reduces accuracy compared to neighbor joining. However, DCMNJ is unlikely to scale to large datasets due to its reliance on supertree methods, as no current supertree methods are able to scale to large datasets with high accuracy. In this study we present a new approach to large-scale phylogeny estimation that shares some of the features of DCMNJ but bypasses the use of supertree methods. We prove that this new approach is AFC and uses polynomial time. Furthermore, we describe variations on this basic approach that can be used with leaf-disjoint constraint trees (computed using methods such as maximum likelihood) to produce other AFC methods that are likely to provide even better accuracy. Thus, we present a new generalizable technique for large-scale tree estimation that is designed to improve scalability for phylogeny estimation methods to ultra-large datasets, and that can be used in a variety of settings (including tree estimation from unaligned sequences, and species tree estimation from gene trees).

KW - Absolute fast converging methods

KW - Maximum likelihood

KW - Neighbor joining

KW - Phylogeny estimation

KW - Sample complexity

KW - Short quartets

UR - http://www.scopus.com/inward/record.url?scp=85051352719&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85051352719&partnerID=8YFLogxK

U2 - 10.4230/LIPIcs.WABI.2018.8

DO - 10.4230/LIPIcs.WABI.2018.8

M3 - Conference contribution

AN - SCOPUS:85051352719

SN - 9783959770828

T3 - Leibniz International Proceedings in Informatics, LIPIcs

BT - 18th International Workshop on Algorithms in Bioinformatics, WABI 2018

A2 - Parida, Laxmi

A2 - Ukkonen, Esko

PB - Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing

ER -