TY - GEN
T1 - Fast and Accurate Species Trees from Weighted Internode Distances
AU - Liu, Baqiao
AU - Warnow, Tandy
N1 - Funding Information:
Funding Baqiao Liu: This work was supported in part by the U.S. Department of Energy, Office of Science, Office of Biological and Environmental Research under the Secure Biosystems Design Initiative and by the Laboratory Directed Research and Development (LDRD) program of Sandia National Laboratories, which is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525. Tandy Warnow: NSF grant 2006069
Publisher Copyright:
© Baqiao Liu and Tandy Warnow.
PY - 2022/9/1
Y1 - 2022/9/1
N2 - Species tree estimation is a basic step in many biological research projects, but is complicated by the fact that gene trees can differ from the species tree due to processes such as incomplete lineage sorting (ILS), gene duplication and loss (GDL), and horizontal gene transfer (HGT), which can cause different regions within the genome to have different evolutionary histories (i.e., “gene tree heterogeneity”). One approach to estimating species trees in the presence of gene tree heterogeneity resulting from ILS operates by computing trees on each genomic region (i.e., computing “gene trees”) and then using these gene trees to define a matrix of average internode distances, where the internode distance in a tree T between two species x and y is the number of nodes in T between the leaves corresponding to x and y. Given such a matrix, a tree can then be computed using methods such as neighbor joining. Methods such as ASTRID and NJst (which use this basic approach) are provably statistically consistent, very fast (low degree polynomial time) and have had high accuracy under many conditions that makes them competitive with other popular species tree estimation methods. In this study, inspired by the very recent work of weighted ASTRAL, we present weighted ASTRID, a variant of ASTRID that takes the branch uncertainty on the gene trees into account in the internode distance. Our experimental study evaluating weighted ASTRID shows improvements in accuracy compared to the original (unweighted) ASTRID while remaining fast. Moreover, weighted ASTRID shows competitive accuracy against weighted ASTRAL, the state of the art. Thus, this study provides a new and very fast method for species tree estimation that improves upon ASTRID and has comparable accuracy with the state of the art while remaining much faster. Weighted ASTRID is available at https://github.com/RuneBlaze/internode.
AB - Species tree estimation is a basic step in many biological research projects, but is complicated by the fact that gene trees can differ from the species tree due to processes such as incomplete lineage sorting (ILS), gene duplication and loss (GDL), and horizontal gene transfer (HGT), which can cause different regions within the genome to have different evolutionary histories (i.e., “gene tree heterogeneity”). One approach to estimating species trees in the presence of gene tree heterogeneity resulting from ILS operates by computing trees on each genomic region (i.e., computing “gene trees”) and then using these gene trees to define a matrix of average internode distances, where the internode distance in a tree T between two species x and y is the number of nodes in T between the leaves corresponding to x and y. Given such a matrix, a tree can then be computed using methods such as neighbor joining. Methods such as ASTRID and NJst (which use this basic approach) are provably statistically consistent, very fast (low degree polynomial time) and have had high accuracy under many conditions that makes them competitive with other popular species tree estimation methods. In this study, inspired by the very recent work of weighted ASTRAL, we present weighted ASTRID, a variant of ASTRID that takes the branch uncertainty on the gene trees into account in the internode distance. Our experimental study evaluating weighted ASTRID shows improvements in accuracy compared to the original (unweighted) ASTRID while remaining fast. Moreover, weighted ASTRID shows competitive accuracy against weighted ASTRAL, the state of the art. Thus, this study provides a new and very fast method for species tree estimation that improves upon ASTRID and has comparable accuracy with the state of the art while remaining much faster. Weighted ASTRID is available at https://github.com/RuneBlaze/internode.
KW - ASTRAL
KW - ASTRID
KW - Species tree estimation
KW - incomplete lineage sorting
KW - multi-species coalescent
UR - http://www.scopus.com/inward/record.url?scp=85137824083&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85137824083&partnerID=8YFLogxK
U2 - 10.4230/LIPIcs.WABI.2022.8
DO - 10.4230/LIPIcs.WABI.2022.8
M3 - Conference contribution
AN - SCOPUS:85137824083
T3 - Leibniz International Proceedings in Informatics, LIPIcs
BT - 22nd International Workshop on Algorithms in Bioinformatics, WABI 2022
A2 - Boucher, Christina
A2 - Rahmann, Sven
PB - Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
T2 - 22nd International Workshop on Algorithms in Bioinformatics, WABI 2022
Y2 - 5 September 2022 through 7 September 2022
ER -