TY - JOUR
T1 - Evaluating summarymethods for multilocus species tree estimation in the presence of incomplete lineage sorting
AU - Mirarab, Siavash
AU - Bayzid, Md Shamsuzzoha
AU - Warnow, Tandy
N1 - Publisher Copyright:
© The Author(s) 2014. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved.
PY - 2016/5
Y1 - 2016/5
N2 - Species tree estimation is complicated by processes, such as gene duplication and loss and incomplete lineage sorting (ILS), that cause discordance between gene trees and the species tree. Furthermore, while concatenation, a traditional approach to tree estimation, has excellent performance under many conditions, the expectation is that the best accuracy will be obtained through the use of species tree estimation methods that are specifically designed to address gene tree discordance. In this article, we report on a study to evaluate MP-EST-one of the most popular species tree estimation methods designed to address ILS-as well as concatenation under maximum likelihood, the greedy consensus, and two supertree methods (Matrix Representation with Parsimony and Matrix Representation with Likelihood). Our study shows that several factors impact the absolute and relative accuracy of methods, including the number of gene trees, the accuracy of the estimated gene trees, and the amount of ILS. Concatenation can be more accurate than the best summary methods in some cases (mostly when the gene trees have poor phylogenetic signal or when the level of ILS is low), but summary methods are generally more accurate than concatenation when there are an adequate number of sufficiently accurate gene trees. Our study suggests that coalescent-based species tree methods may be key to estimating highly accurate species trees from multiple loci.
AB - Species tree estimation is complicated by processes, such as gene duplication and loss and incomplete lineage sorting (ILS), that cause discordance between gene trees and the species tree. Furthermore, while concatenation, a traditional approach to tree estimation, has excellent performance under many conditions, the expectation is that the best accuracy will be obtained through the use of species tree estimation methods that are specifically designed to address gene tree discordance. In this article, we report on a study to evaluate MP-EST-one of the most popular species tree estimation methods designed to address ILS-as well as concatenation under maximum likelihood, the greedy consensus, and two supertree methods (Matrix Representation with Parsimony and Matrix Representation with Likelihood). Our study shows that several factors impact the absolute and relative accuracy of methods, including the number of gene trees, the accuracy of the estimated gene trees, and the amount of ILS. Concatenation can be more accurate than the best summary methods in some cases (mostly when the gene trees have poor phylogenetic signal or when the level of ILS is low), but summary methods are generally more accurate than concatenation when there are an adequate number of sufficiently accurate gene trees. Our study suggests that coalescent-based species tree methods may be key to estimating highly accurate species trees from multiple loci.
KW - Concatenation
KW - Consensus Methods
KW - Gene Tree Discordance
KW - Incomplete Lineage Sorting
KW - MP-EST
KW - MRL
KW - MRP
KW - Multilocus Bootstrapping
KW - Species Tree Estimation
KW - Supertree Methods
UR - http://www.scopus.com/inward/record.url?scp=84978808477&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84978808477&partnerID=8YFLogxK
U2 - 10.1093/sysbio/syu063
DO - 10.1093/sysbio/syu063
M3 - Article
C2 - 25164915
AN - SCOPUS:84978808477
SN - 1063-5157
VL - 65
SP - 366
EP - 380
JO - Systematic biology
JF - Systematic biology
IS - 3
ER -