TY - JOUR
T1 - QR-STAR
T2 - A Polynomial-Time Statistically Consistent Method for Rooting Species Trees Under the Coalescent
AU - Tabatabaee, Yasamin
AU - Roch, Sebastien
AU - Warnow, Tandy
N1 - S.R. was supported by NSF grants DMS-1902892, DMS1916378, and DMS-2023239 (TRIPODS Phase II), as well as a Vilas Associates Award. T.W. was supported by the Grainger Foundation. Y.T. was supported in part by UIUC C.L. and Jane W-S. Liu Award. *
PY - 2023/11/1
Y1 - 2023/11/1
N2 - We address the problem of rooting an unrooted species tree given a set of unrooted gene trees, under the assumption that gene trees evolve within the model species tree under the multispecies coalescent (MSC) model. Quintet Rooting (QR) is a polynomial time algorithm that was recently proposed for this problem, which is based on the theory developed by Allman, Degnan, and Rhodes that proves the identifiability of rooted 5-taxon trees from unrooted gene trees under the MSC. However, although QR had good accuracy in simulations, its statistical consistency was left as an open problem. We present QR-STAR, a variant of QR with an additional step and a different cost function, and prove that it is statistically consistent under the MSC. Moreover, we derive sample complexity bounds for QR-STAR and show that a particular variant of it based on ‘‘short quintets’’ has polynomial sample complexity. Finally, our simulation study under a variety of model conditions shows that QR-STAR matches or improves on the accuracy of QR. QR-STAR is available in open-source form on github.
AB - We address the problem of rooting an unrooted species tree given a set of unrooted gene trees, under the assumption that gene trees evolve within the model species tree under the multispecies coalescent (MSC) model. Quintet Rooting (QR) is a polynomial time algorithm that was recently proposed for this problem, which is based on the theory developed by Allman, Degnan, and Rhodes that proves the identifiability of rooted 5-taxon trees from unrooted gene trees under the MSC. However, although QR had good accuracy in simulations, its statistical consistency was left as an open problem. We present QR-STAR, a variant of QR with an additional step and a different cost function, and prove that it is statistically consistent under the MSC. Moreover, we derive sample complexity bounds for QR-STAR and show that a particular variant of it based on ‘‘short quintets’’ has polynomial sample complexity. Finally, our simulation study under a variety of model conditions shows that QR-STAR matches or improves on the accuracy of QR. QR-STAR is available in open-source form on github.
KW - multispecies coalescent
KW - rooting
KW - species tree estimation
KW - statistical consistency
UR - http://www.scopus.com/inward/record.url?scp=85176249224&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85176249224&partnerID=8YFLogxK
U2 - 10.1089/cmb.2023.0185
DO - 10.1089/cmb.2023.0185
M3 - Article
C2 - 37902986
AN - SCOPUS:85176249224
SN - 1066-5277
VL - 30
SP - 1146
EP - 1181
JO - Journal of Computational Biology
JF - Journal of Computational Biology
IS - 11
ER -