TY - GEN
T1 - Advancing divide-and-conquer phylogeny estimation using robinson-foulds supertrees
AU - Yu, Xilin
AU - Le, Thien
AU - Christensen, Sarah
AU - Molloy, Erin K.
AU - Warnow, Tandy
N1 - Publisher Copyright:
© Xilin Yu, Thien Le, Sarah Christensen, Erin K. Molloy, and Tandy Warnow; licensed under Creative Commons License CC-BY
PY - 2020/8/1
Y1 - 2020/8/1
N2 - One of the Grand Challenges in Science is the construction of the Tree of Life, an evolutionary tree containing several million species, spanning all life on earth. However, the construction of the Tree of Life is enormously computationally challenging, as all the current most accurate methods are either heuristics for NP-hard optimization problems or Bayesian MCMC methods that sample from tree space. One of the most promising approaches for improving scalability and accuracy for phylogeny estimation uses divide-and-conquer: a set of species is divided into overlapping subsets, trees are constructed on the subsets, and then merged together using a “supertree method”. Here, we present Exact-RFS-2, the first polynomial-time algorithm to find an optimal supertree of two trees, using the Robinson-Foulds Supertree (RFS) criterion (a major approach in supertree estimation that is related to maximum likelihood supertrees), and we prove that finding the RFS of three input trees is NP-hard. We also present GreedyRFS (a greedy heuristic that operates by repeatedly using Exact-RFS-2 on pairs of trees, until all the trees are merged into a single supertree). We evaluate Exact-RFS-2 and GreedyRFS, and show that they have better accuracy than the current leading heuristic for RFS.
AB - One of the Grand Challenges in Science is the construction of the Tree of Life, an evolutionary tree containing several million species, spanning all life on earth. However, the construction of the Tree of Life is enormously computationally challenging, as all the current most accurate methods are either heuristics for NP-hard optimization problems or Bayesian MCMC methods that sample from tree space. One of the most promising approaches for improving scalability and accuracy for phylogeny estimation uses divide-and-conquer: a set of species is divided into overlapping subsets, trees are constructed on the subsets, and then merged together using a “supertree method”. Here, we present Exact-RFS-2, the first polynomial-time algorithm to find an optimal supertree of two trees, using the Robinson-Foulds Supertree (RFS) criterion (a major approach in supertree estimation that is related to maximum likelihood supertrees), and we prove that finding the RFS of three input trees is NP-hard. We also present GreedyRFS (a greedy heuristic that operates by repeatedly using Exact-RFS-2 on pairs of trees, until all the trees are merged into a single supertree). We evaluate Exact-RFS-2 and GreedyRFS, and show that they have better accuracy than the current leading heuristic for RFS.
KW - Divide-and-conquer
KW - Phylogeny estimation
KW - Supertrees
UR - http://www.scopus.com/inward/record.url?scp=85092787323&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85092787323&partnerID=8YFLogxK
U2 - 10.4230/LIPIcs.WABI.2020.15
DO - 10.4230/LIPIcs.WABI.2020.15
M3 - Conference contribution
AN - SCOPUS:85092787323
T3 - Leibniz International Proceedings in Informatics, LIPIcs
BT - 20th International Workshop on Algorithms in Bioinformatics, WABI 2020
A2 - Kingsford, Carl
A2 - Pisanti, Nadia
PB - Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
T2 - 20th International Workshop on Algorithms in Bioinformatics, WABI 2020
Y2 - 7 September 2020 through 9 September 2020
ER -