Re-evaluating Deep Neural Networks for Phylogeny Estimation: The Issue of Taxon Sampling

Paul Zaharias, Martin Grosshauser, Tandy Warnow

Research output: Contribution to journalArticlepeer-review

Abstract

Deep neural networks (DNNs) have been recently proposed for quartet tree phylogeny estimation. Here, we present a study evaluating recently trained DNNs in comparison to a collection of standard phylogeny estimation methods on a heterogeneous collection of datasets simulated under the same models that were used to train the DNNs, and also under similar conditions but with higher rates of evolution. Our study shows that using DNNs with quartet amalgamation is less accurate than several standard phylogeny estimation methods we explore (e.g., maximum likelihood and maximum parsimony). We further find that simple standard phylogeny estimation methods match or improve on DNNs for quartet accuracy, especially, but not exclusively, when used in a global manner (i.e., the tree on the full dataset is computed and then the induced quartet trees are extracted from the full tree). Thus, our study provides evidence that a major challenge impacting the utility of current DNNs for phylogeny estimation is their restriction to estimating quartet trees that must subsequently be combined into a tree on the full dataset. In contrast, global methods (i.e., those that estimate trees from the full set of sequences) are able to benefit from taxon sampling, and hence have higher accuracy on large datasets.

Original languageEnglish (US)
Pages (from-to)74-89
Number of pages16
JournalJournal of Computational Biology
Volume29
Issue number1
DOIs
StatePublished - Jan 2022
Externally publishedYes

Keywords

  • deep neural networks
  • phylogeny estimation and heterotachy

ASJC Scopus subject areas

  • Modeling and Simulation
  • Molecular Biology
  • Genetics
  • Computational Mathematics
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'Re-evaluating Deep Neural Networks for Phylogeny Estimation: The Issue of Taxon Sampling'. Together they form a unique fingerprint.

Cite this