TY - JOUR
T1 - Fast and accurate methods for phylogenomic analyses
AU - Yang, Jimmy
AU - Warnow, Tandy
N1 - Funding Information:
This research was supported by the National Science Foundation (DEB-0733029 and DBI-1062335) and by a fellowship from the John Simon Guggenheim Foundation to TW. We thank Luay Nakhleh and Yun Yu for making the 17-taxon datasets available to us for this study. We thank Cecile Ané, Luay Nakhleh, Noah Rosenberg, and the anonymous referees for very helpful suggestions, and Steve Evans for discussions about statistical tests. This article has been published as part of BMC Bioinformatics Volume 12 Supplement 9, 2011: Proceedings of the Ninth Annual Research in Computational Molecular Biology (RECOMB) Satellite Workshop on Comparative Genomics. The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/12?issue=S9.
PY - 2011/10/5
Y1 - 2011/10/5
N2 - Background: Species phylogenies are not estimated directly, but rather through phylogenetic analyses of different gene datasets. However, true gene trees can differ from the true species tree (and hence from one another) due to biological processes such as horizontal gene transfer, incomplete lineage sorting, and gene duplication and loss, so that no single gene tree is a reliable estimate of the species tree. Several methods have been developed to estimate species trees from estimated gene trees, differing according to the specific algorithmic technique used and the biological model used to explain differences between species and gene trees. Relatively little is known about the relative performance of these methods.Results: We report on a study evaluating several different methods for estimating species trees from sequence datasets, simulating sequence evolution under a complex model including indels (insertions and deletions), substitutions, and incomplete lineage sorting. The most important finding of our study is that some fast and simple methods are nearly as accurate as the most accurate methods, which employ sophisticated statistical methods and are computationally quite intensive. We also observe that methods that explicitly consider errors in the estimated gene trees produce more accurate trees than methods that assume the estimated gene trees are correct.Conclusions: Our study shows that highly accurate estimations of species trees are achievable, even when gene trees differ from each other and from the species tree, and that these estimations can be obtained using fairly simple and computationally tractable methods.
AB - Background: Species phylogenies are not estimated directly, but rather through phylogenetic analyses of different gene datasets. However, true gene trees can differ from the true species tree (and hence from one another) due to biological processes such as horizontal gene transfer, incomplete lineage sorting, and gene duplication and loss, so that no single gene tree is a reliable estimate of the species tree. Several methods have been developed to estimate species trees from estimated gene trees, differing according to the specific algorithmic technique used and the biological model used to explain differences between species and gene trees. Relatively little is known about the relative performance of these methods.Results: We report on a study evaluating several different methods for estimating species trees from sequence datasets, simulating sequence evolution under a complex model including indels (insertions and deletions), substitutions, and incomplete lineage sorting. The most important finding of our study is that some fast and simple methods are nearly as accurate as the most accurate methods, which employ sophisticated statistical methods and are computationally quite intensive. We also observe that methods that explicitly consider errors in the estimated gene trees produce more accurate trees than methods that assume the estimated gene trees are correct.Conclusions: Our study shows that highly accurate estimations of species trees are achievable, even when gene trees differ from each other and from the species tree, and that these estimations can be obtained using fairly simple and computationally tractable methods.
UR - http://www.scopus.com/inward/record.url?scp=80053547401&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=80053547401&partnerID=8YFLogxK
U2 - 10.1186/1471-2105-12-S9-S4
DO - 10.1186/1471-2105-12-S9-S4
M3 - Article
C2 - 22152123
AN - SCOPUS:80053547401
SN - 1471-2105
VL - 12
JO - BMC bioinformatics
JF - BMC bioinformatics
IS - SUPPL. 9
M1 - S4
ER -