QR-STAR: A Polynomial-Time Statistically Consistent Method for Rooting Species Trees Under the Coalescent

Yasamin Tabatabaee, Sebastien Roch, Tandy Warnow

Research output: Contribution to journalArticlepeer-review

Abstract

We address the problem of rooting an unrooted species tree given a set of unrooted gene trees, under the assumption that gene trees evolve within the model species tree under the multispecies coalescent (MSC) model. Quintet Rooting (QR) is a polynomial time algorithm that was recently proposed for this problem, which is based on the theory developed by Allman, Degnan, and Rhodes that proves the identifiability of rooted 5-taxon trees from unrooted gene trees under the MSC. However, although QR had good accuracy in simulations, its statistical consistency was left as an open problem. We present QR-STAR, a variant of QR with an additional step and a different cost function, and prove that it is statistically consistent under the MSC. Moreover, we derive sample complexity bounds for QR-STAR and show that a particular variant of it based on ‘‘short quintets’’ has polynomial sample complexity. Finally, our simulation study under a variety of model conditions shows that QR-STAR matches or improves on the accuracy of QR. QR-STAR is available in open-source form on github.

Original languageEnglish (US)
Pages (from-to)1146-1181
Number of pages36
JournalJournal of Computational Biology
Volume30
Issue number11
DOIs
StatePublished - Nov 1 2023
Externally publishedYes

Keywords

  • multispecies coalescent
  • rooting
  • species tree estimation
  • statistical consistency

ASJC Scopus subject areas

  • Modeling and Simulation
  • Molecular Biology
  • Genetics
  • Computational Mathematics
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'QR-STAR: A Polynomial-Time Statistically Consistent Method for Rooting Species Trees Under the Coalescent'. Together they form a unique fingerprint.

Cite this