TY - GEN
T1 - New Divide-and-Conquer Techniques for Large-Scale Phylogenetic Estimation
AU - Warnow, Tandy
N1 - Publisher Copyright:
© 2019, Springer Nature Switzerland AG.
PY - 2019
Y1 - 2019
N2 - Over the last years, the availability of genomic sequence data from thousands of different species has led to hopes that a phylogenetic tree of all life might be achievable. Yet, the most accurate methods for estimating phylogenies are heuristics for NP-hard optimization problems, many of which are too computationally intensive to use on large datasets. Divide-and-conquer approaches have been proposed to address scalability to large datasets that divide the species into subsets, construct trees on subsets, and then merge the trees together. Prior approaches have divided species sets into overlapping subsets and used supertree methods to merge the subset trees, but limitations in supertree methods suggest this kind of divide-and-conquer approach is unlikely to provide scalability to ultra-large datasets. Recently, a new approach has been developed that divides the species dataset into disjoint subsets, computes trees on subsets, and then combines the subset trees using auxiliary information (e.g., a distance matrix). Here, we describe these strategies and their theoretical properties, present open problems, and discuss opportunities for impact in large-scale phylogenetic estimation using these and similar approaches.
AB - Over the last years, the availability of genomic sequence data from thousands of different species has led to hopes that a phylogenetic tree of all life might be achievable. Yet, the most accurate methods for estimating phylogenies are heuristics for NP-hard optimization problems, many of which are too computationally intensive to use on large datasets. Divide-and-conquer approaches have been proposed to address scalability to large datasets that divide the species into subsets, construct trees on subsets, and then merge the trees together. Prior approaches have divided species sets into overlapping subsets and used supertree methods to merge the subset trees, but limitations in supertree methods suggest this kind of divide-and-conquer approach is unlikely to provide scalability to ultra-large datasets. Recently, a new approach has been developed that divides the species dataset into disjoint subsets, computes trees on subsets, and then combines the subset trees using auxiliary information (e.g., a distance matrix). Here, we describe these strategies and their theoretical properties, present open problems, and discuss opportunities for impact in large-scale phylogenetic estimation using these and similar approaches.
KW - Absolute fast converging methods
KW - Divide-and-conquer
KW - Gene trees
KW - Incomplete lineage sorting
KW - Inferring the evolutionary phylogeny of species
KW - Species trees
KW - Statistical consistency
UR - http://www.scopus.com/inward/record.url?scp=85066151325&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85066151325&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-18174-1_1
DO - 10.1007/978-3-030-18174-1_1
M3 - Conference contribution
AN - SCOPUS:85066151325
SN - 9783030181734
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 3
EP - 21
BT - Algorithms for Computational Biology - 6th International Conference, AlCoB 2019, Proceedings
A2 - Vega-Rodríguez, Miguel A.
A2 - Holmes, Ian
A2 - Martín-Vide, Carlos
PB - Springer
T2 - 6th International Conference on Algorithms for Computational Biology, AlCoB 2019
Y2 - 28 May 2019 through 30 May 2019
ER -