TY - JOUR
T1 - Recent progress on methods for estimating and updating large phylogenies
AU - Zaharias, Paul
AU - Warnow, Tandy
N1 - Funding Information:
This research was supported in part by the US National Science Foundation grant no. 2006069 to T.W. and by The Grainger Foundation support to T.W.
Publisher Copyright:
© 2022 The Authors.
PY - 2022/10/10
Y1 - 2022/10/10
N2 - With the increased availability of sequence data and even of fully sequenced and assembled genomes, phylogeny estimation of very large trees (even of hundreds of thousands of sequences) is now a goal for some biologists. Yet, the construction of these phylogenies is a complex pipeline presenting analytical and computational challenges, especially when the number of sequences is very large. In the past few years, new methods have been developed that aim to enable highly accurate phylogeny estimations on these large datasets, including divide-and-conquer techniques for multiple sequence alignment and/or tree estimation, methods that can estimate species trees from multi-locus datasets while addressing heterogeneity due to biological processes (e.g. incomplete lineage sorting and gene duplication and loss), and methods to add sequences into large gene trees or species trees. Here we present some of these recent advances and discuss opportunities for future improvements. This article is part of a discussion meeting issue 'Genomic population structures of microbial pathogens'.
AB - With the increased availability of sequence data and even of fully sequenced and assembled genomes, phylogeny estimation of very large trees (even of hundreds of thousands of sequences) is now a goal for some biologists. Yet, the construction of these phylogenies is a complex pipeline presenting analytical and computational challenges, especially when the number of sequences is very large. In the past few years, new methods have been developed that aim to enable highly accurate phylogeny estimations on these large datasets, including divide-and-conquer techniques for multiple sequence alignment and/or tree estimation, methods that can estimate species trees from multi-locus datasets while addressing heterogeneity due to biological processes (e.g. incomplete lineage sorting and gene duplication and loss), and methods to add sequences into large gene trees or species trees. Here we present some of these recent advances and discuss opportunities for future improvements. This article is part of a discussion meeting issue 'Genomic population structures of microbial pathogens'.
KW - maximum likelihood
KW - multiple sequence alignment
KW - phylogenetic placement
KW - phylogenomics
KW - phylogeny estimation
KW - taxon identification
UR - http://www.scopus.com/inward/record.url?scp=85136908705&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85136908705&partnerID=8YFLogxK
U2 - 10.1098/rstb.2021.0244
DO - 10.1098/rstb.2021.0244
M3 - Review article
C2 - 35989607
AN - SCOPUS:85136908705
SN - 0962-8436
VL - 377
JO - Philosophical Transactions of the Royal Society B: Biological Sciences
JF - Philosophical Transactions of the Royal Society B: Biological Sciences
IS - 1861
M1 - 20210244
ER -