Optimal completion of incomplete gene trees in polynomial time using OCTAL

Sarah Christensen, Erin K. Molloy, Pranjal Vachaspati, Tandy Warnow

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Here we introduce the Optimal Tree Completion Problem, a general optimization problem that involves completing an unrooted binary tree (i.e., adding missing leaves) so as to minimize its distance from a reference tree on a superset of the leaves. More formally, given a pair of unrooted binary trees (T, t) where T has leaf set S and t has leaf set R⊆S, we wish to add all the leaves from S \ R to t so as to produce a new tree t0 on leaf set S that has the minimum distance to T. We show that when the distance is defined by the Robinson-Foulds (RF) distance, an optimal solution can be found in polynomial time. We also present OCTAL, an algorithm that solves this RF Optimal Tree Completion Problem exactly in O(|S|2) time. We report on a simulation study where we complete estimated gene trees using a reference tree that is based on a species tree estimated from a multi-locus dataset. OCTAL produces completed gene trees that are closer to the true gene trees than an existing heuristic approach, but the accuracy of the completed gene trees computed by OCTAL depends on how topologically similar the estimated species tree is to the true gene tree. Hence, under conditions with relatively low gene tree heterogeneity, OCTAL can be used to provide highly accurate completions of estimated gene trees. We close with a discussion of future research.

Original languageEnglish (US)
Title of host publication17th International Workshop on Algorithms in Bioinformatics, WABI 2017
EditorsKnut Reinert, Russell Schwartz
PublisherSchloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
ISBN (Electronic)9783959770507
DOIs
StatePublished - Aug 1 2017
Event17th International Workshop on Algorithms in Bioinformatics, WABI 2017 - Boston, United States
Duration: Aug 21 2017Aug 23 2017

Publication series

NameLeibniz International Proceedings in Informatics, LIPIcs
Volume88
ISSN (Print)1868-8969

Other

Other17th International Workshop on Algorithms in Bioinformatics, WABI 2017
CountryUnited States
CityBoston
Period8/21/178/23/17

Fingerprint

Genes
Polynomials
Binary trees

Keywords

  • Coalescent-based species tree estimation
  • Gene trees
  • Missing data
  • Phylogenomics

ASJC Scopus subject areas

  • Software

Cite this

Christensen, S., Molloy, E. K., Vachaspati, P., & Warnow, T. (2017). Optimal completion of incomplete gene trees in polynomial time using OCTAL. In K. Reinert, & R. Schwartz (Eds.), 17th International Workshop on Algorithms in Bioinformatics, WABI 2017 [27] (Leibniz International Proceedings in Informatics, LIPIcs; Vol. 88). Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing. https://doi.org/10.4230/LIPIcs.WABI.2017.27

Optimal completion of incomplete gene trees in polynomial time using OCTAL. / Christensen, Sarah; Molloy, Erin K.; Vachaspati, Pranjal; Warnow, Tandy.

17th International Workshop on Algorithms in Bioinformatics, WABI 2017. ed. / Knut Reinert; Russell Schwartz. Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing, 2017. 27 (Leibniz International Proceedings in Informatics, LIPIcs; Vol. 88).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Christensen, S, Molloy, EK, Vachaspati, P & Warnow, T 2017, Optimal completion of incomplete gene trees in polynomial time using OCTAL. in K Reinert & R Schwartz (eds), 17th International Workshop on Algorithms in Bioinformatics, WABI 2017., 27, Leibniz International Proceedings in Informatics, LIPIcs, vol. 88, Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing, 17th International Workshop on Algorithms in Bioinformatics, WABI 2017, Boston, United States, 8/21/17. https://doi.org/10.4230/LIPIcs.WABI.2017.27
Christensen S, Molloy EK, Vachaspati P, Warnow T. Optimal completion of incomplete gene trees in polynomial time using OCTAL. In Reinert K, Schwartz R, editors, 17th International Workshop on Algorithms in Bioinformatics, WABI 2017. Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing. 2017. 27. (Leibniz International Proceedings in Informatics, LIPIcs). https://doi.org/10.4230/LIPIcs.WABI.2017.27
Christensen, Sarah ; Molloy, Erin K. ; Vachaspati, Pranjal ; Warnow, Tandy. / Optimal completion of incomplete gene trees in polynomial time using OCTAL. 17th International Workshop on Algorithms in Bioinformatics, WABI 2017. editor / Knut Reinert ; Russell Schwartz. Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing, 2017. (Leibniz International Proceedings in Informatics, LIPIcs).
@inproceedings{7994173c43554c209fadb304d87a137f,
title = "Optimal completion of incomplete gene trees in polynomial time using OCTAL",
abstract = "Here we introduce the Optimal Tree Completion Problem, a general optimization problem that involves completing an unrooted binary tree (i.e., adding missing leaves) so as to minimize its distance from a reference tree on a superset of the leaves. More formally, given a pair of unrooted binary trees (T, t) where T has leaf set S and t has leaf set R⊆S, we wish to add all the leaves from S \ R to t so as to produce a new tree t0 on leaf set S that has the minimum distance to T. We show that when the distance is defined by the Robinson-Foulds (RF) distance, an optimal solution can be found in polynomial time. We also present OCTAL, an algorithm that solves this RF Optimal Tree Completion Problem exactly in O(|S|2) time. We report on a simulation study where we complete estimated gene trees using a reference tree that is based on a species tree estimated from a multi-locus dataset. OCTAL produces completed gene trees that are closer to the true gene trees than an existing heuristic approach, but the accuracy of the completed gene trees computed by OCTAL depends on how topologically similar the estimated species tree is to the true gene tree. Hence, under conditions with relatively low gene tree heterogeneity, OCTAL can be used to provide highly accurate completions of estimated gene trees. We close with a discussion of future research.",
keywords = "Coalescent-based species tree estimation, Gene trees, Missing data, Phylogenomics",
author = "Sarah Christensen and Molloy, {Erin K.} and Pranjal Vachaspati and Tandy Warnow",
year = "2017",
month = "8",
day = "1",
doi = "10.4230/LIPIcs.WABI.2017.27",
language = "English (US)",
series = "Leibniz International Proceedings in Informatics, LIPIcs",
publisher = "Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing",
editor = "Knut Reinert and Russell Schwartz",
booktitle = "17th International Workshop on Algorithms in Bioinformatics, WABI 2017",

}

TY - GEN

T1 - Optimal completion of incomplete gene trees in polynomial time using OCTAL

AU - Christensen, Sarah

AU - Molloy, Erin K.

AU - Vachaspati, Pranjal

AU - Warnow, Tandy

PY - 2017/8/1

Y1 - 2017/8/1

N2 - Here we introduce the Optimal Tree Completion Problem, a general optimization problem that involves completing an unrooted binary tree (i.e., adding missing leaves) so as to minimize its distance from a reference tree on a superset of the leaves. More formally, given a pair of unrooted binary trees (T, t) where T has leaf set S and t has leaf set R⊆S, we wish to add all the leaves from S \ R to t so as to produce a new tree t0 on leaf set S that has the minimum distance to T. We show that when the distance is defined by the Robinson-Foulds (RF) distance, an optimal solution can be found in polynomial time. We also present OCTAL, an algorithm that solves this RF Optimal Tree Completion Problem exactly in O(|S|2) time. We report on a simulation study where we complete estimated gene trees using a reference tree that is based on a species tree estimated from a multi-locus dataset. OCTAL produces completed gene trees that are closer to the true gene trees than an existing heuristic approach, but the accuracy of the completed gene trees computed by OCTAL depends on how topologically similar the estimated species tree is to the true gene tree. Hence, under conditions with relatively low gene tree heterogeneity, OCTAL can be used to provide highly accurate completions of estimated gene trees. We close with a discussion of future research.

AB - Here we introduce the Optimal Tree Completion Problem, a general optimization problem that involves completing an unrooted binary tree (i.e., adding missing leaves) so as to minimize its distance from a reference tree on a superset of the leaves. More formally, given a pair of unrooted binary trees (T, t) where T has leaf set S and t has leaf set R⊆S, we wish to add all the leaves from S \ R to t so as to produce a new tree t0 on leaf set S that has the minimum distance to T. We show that when the distance is defined by the Robinson-Foulds (RF) distance, an optimal solution can be found in polynomial time. We also present OCTAL, an algorithm that solves this RF Optimal Tree Completion Problem exactly in O(|S|2) time. We report on a simulation study where we complete estimated gene trees using a reference tree that is based on a species tree estimated from a multi-locus dataset. OCTAL produces completed gene trees that are closer to the true gene trees than an existing heuristic approach, but the accuracy of the completed gene trees computed by OCTAL depends on how topologically similar the estimated species tree is to the true gene tree. Hence, under conditions with relatively low gene tree heterogeneity, OCTAL can be used to provide highly accurate completions of estimated gene trees. We close with a discussion of future research.

KW - Coalescent-based species tree estimation

KW - Gene trees

KW - Missing data

KW - Phylogenomics

UR - http://www.scopus.com/inward/record.url?scp=85028747086&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85028747086&partnerID=8YFLogxK

U2 - 10.4230/LIPIcs.WABI.2017.27

DO - 10.4230/LIPIcs.WABI.2017.27

M3 - Conference contribution

AN - SCOPUS:85028747086

T3 - Leibniz International Proceedings in Informatics, LIPIcs

BT - 17th International Workshop on Algorithms in Bioinformatics, WABI 2017

A2 - Reinert, Knut

A2 - Schwartz, Russell

PB - Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing

ER -