TY - JOUR
T1 - Re-evaluating Deep Neural Networks for Phylogeny Estimation
T2 - The Issue of Taxon Sampling
AU - Zaharias, Paul
AU - Grosshauser, Martin
AU - Warnow, Tandy
N1 - Funding Information:
This work was supported in part by NSF grants 1513629 and 1458652 to T.W. This research is part of the Blue Waters sustained-petascale computing project, which is supported by the National Science Foundation (awards OCI-0725070 and ACI-1238993), the State of Illinois, and as of December 2019, the National Geospatial-Intelligence Agency. Blue Waters is a joint effort of the University of Illinois at Urbana-Champaign and its National Center for Supercomputing Applications.
Publisher Copyright:
© Copyright 2022, Mary Ann Liebert, Inc., publishers 2022.
PY - 2022/1
Y1 - 2022/1
N2 - Deep neural networks (DNNs) have been recently proposed for quartet tree phylogeny estimation. Here, we present a study evaluating recently trained DNNs in comparison to a collection of standard phylogeny estimation methods on a heterogeneous collection of datasets simulated under the same models that were used to train the DNNs, and also under similar conditions but with higher rates of evolution. Our study shows that using DNNs with quartet amalgamation is less accurate than several standard phylogeny estimation methods we explore (e.g., maximum likelihood and maximum parsimony). We further find that simple standard phylogeny estimation methods match or improve on DNNs for quartet accuracy, especially, but not exclusively, when used in a global manner (i.e., the tree on the full dataset is computed and then the induced quartet trees are extracted from the full tree). Thus, our study provides evidence that a major challenge impacting the utility of current DNNs for phylogeny estimation is their restriction to estimating quartet trees that must subsequently be combined into a tree on the full dataset. In contrast, global methods (i.e., those that estimate trees from the full set of sequences) are able to benefit from taxon sampling, and hence have higher accuracy on large datasets.
AB - Deep neural networks (DNNs) have been recently proposed for quartet tree phylogeny estimation. Here, we present a study evaluating recently trained DNNs in comparison to a collection of standard phylogeny estimation methods on a heterogeneous collection of datasets simulated under the same models that were used to train the DNNs, and also under similar conditions but with higher rates of evolution. Our study shows that using DNNs with quartet amalgamation is less accurate than several standard phylogeny estimation methods we explore (e.g., maximum likelihood and maximum parsimony). We further find that simple standard phylogeny estimation methods match or improve on DNNs for quartet accuracy, especially, but not exclusively, when used in a global manner (i.e., the tree on the full dataset is computed and then the induced quartet trees are extracted from the full tree). Thus, our study provides evidence that a major challenge impacting the utility of current DNNs for phylogeny estimation is their restriction to estimating quartet trees that must subsequently be combined into a tree on the full dataset. In contrast, global methods (i.e., those that estimate trees from the full set of sequences) are able to benefit from taxon sampling, and hence have higher accuracy on large datasets.
KW - deep neural networks
KW - phylogeny estimation and heterotachy
UR - http://www.scopus.com/inward/record.url?scp=85123272614&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85123272614&partnerID=8YFLogxK
U2 - 10.1089/cmb.2021.0383
DO - 10.1089/cmb.2021.0383
M3 - Article
C2 - 34986031
AN - SCOPUS:85123272614
SN - 1066-5277
VL - 29
SP - 74
EP - 89
JO - Journal of Computational Biology
JF - Journal of Computational Biology
IS - 1
ER -