Abstract

The estimation of phylogenetic trees for individual genes or multi-locus datasets is a basic part of considerable biological research. In order to enable large trees to be computed, Disjoint Tree Mergers (DTMs) have been developed; these methods operate by dividing the input sequence dataset into disjoint sets, constructing trees on each subset, and then combining the subset trees (using auxiliary information) into a tree on the full dataset. DTMs have been used to advantage for multi-locus species tree estimation, enabling highly accurate species trees at reduced computational effort, compared to leading species tree estimation methods. Here, we evaluate the feasibility of using DTMs to improve the scalability of maximum likelihood (ML) gene tree estimation to large numbers of input sequences. Our study shows distinct differences between the three selected ML codes—RAxML-NG, IQ-TREE 2, and FastTree 2—and shows that good DTM pipeline design can provide advantages over these ML codes on large datasets.

Original languageEnglish (US)
Article number148
JournalAlgorithms
Volume14
Issue number5
DOIs
StatePublished - May 2021

Keywords

  • Cox1
  • Disjoint tree mergers
  • FastTree
  • Heterotachy
  • IQ-TREE
  • Maximum likelihood
  • Phylogeny estimation
  • RAxML
  • Tree of life

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Numerical Analysis
  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this