Abstract
Here we introduce the Optimal Tree Completion Problem, a general optimization problem that involves completing an unrooted binary tree (i.e., adding missing leaves) so as to minimize its distance from a reference tree on a superset of the leaves. More formally, given a pair of unrooted binary trees (T, t) where T has leaf set S and t has leaf set R⊆S, we wish to add all the leaves from S \ R to t so as to produce a new tree t0 on leaf set S that has the minimum distance to T. We show that when the distance is defined by the RobinsonFoulds (RF) distance, an optimal solution can be found in polynomial time. We also present OCTAL, an algorithm that solves this RF Optimal Tree Completion Problem exactly in O(S^{2}) time. We report on a simulation study where we complete estimated gene trees using a reference tree that is based on a species tree estimated from a multilocus dataset. OCTAL produces completed gene trees that are closer to the true gene trees than an existing heuristic approach, but the accuracy of the completed gene trees computed by OCTAL depends on how topologically similar the estimated species tree is to the true gene tree. Hence, under conditions with relatively low gene tree heterogeneity, OCTAL can be used to provide highly accurate completions of estimated gene trees. We close with a discussion of future research.
Original language  English (US) 

Title of host publication  17th International Workshop on Algorithms in Bioinformatics, WABI 2017 
Editors  Knut Reinert, Russell Schwartz 
Publisher  Schloss Dagstuhl LeibnizZentrum fur Informatik GmbH, Dagstuhl Publishing 
ISBN (Electronic)  9783959770507 
DOIs  
State  Published  Aug 1 2017 
Event  17th International Workshop on Algorithms in Bioinformatics, WABI 2017  Boston, United States Duration: Aug 21 2017 → Aug 23 2017 
Publication series
Name  Leibniz International Proceedings in Informatics, LIPIcs 

Volume  88 
ISSN (Print)  18688969 
Other
Other  17th International Workshop on Algorithms in Bioinformatics, WABI 2017 

Country  United States 
City  Boston 
Period  8/21/17 → 8/23/17 
Keywords
 Coalescentbased species tree estimation
 Gene trees
 Missing data
 Phylogenomics
ASJC Scopus subject areas
 Software
Fingerprint Dive into the research topics of 'Optimal completion of incomplete gene trees in polynomial time using OCTAL'. Together they form a unique fingerprint.
Datasets

Datasets from the study: Optimal completion of incomplete gene trees in polynomial time using OCTAL
Christensen, S. (Creator), Molloy, E. K. (Creator), Vachaspati, P. (Creator) & Warnow, T. (Creator), University of Illinois at UrbanaChampaign, Jun 15 2017
DOI: 10.13012/B2IDB8402610_V1
Dataset