Traction: Fast non-parametric improvement of estimated gene trees

Sarah Christensen, Erin K. Molloy, Pranjal Vachaspati, Tandy Warnow

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Gene tree correction aims to improve the accuracy of a gene tree by using computational techniques along with a reference tree (and in some cases available sequence data). It is an active area of research when dealing with gene tree heterogeneity due to duplication and loss (GDL). Here, we study the problem of gene tree correction where gene tree heterogeneity is instead due to incomplete lineage sorting (ILS, a common problem in eukaryotic phylogenetics) and horizontal gene transfer (HGT, a common problem in bacterial phylogenetics). We introduce TRACTION, a simple polynomial time method that provably finds an optimal solution to the RF-Optimal Tree Refinement and Completion Problem, which seeks a refinement and completion of an input tree t with respect to a given binary tree T so as to minimize the Robinson-Foulds (RF) distance. We present the results of an extensive simulation study evaluating TRACTION within gene tree correction pipelines on 68,000 estimated gene trees, using estimated species trees as reference trees. We explore accuracy under conditions with varying levels of gene tree heterogeneity due to ILS and HGT. We show that TRACTION matches or improves the accuracy of well-established methods from the GDL literature under conditions with HGT and ILS, and ties for best under the ILS-only conditions. Furthermore, TRACTION ties for fastest on these datasets. TRACTION is available at https://github.com/pranjalv123/TRACTION-RF and the study datasets are available at https://doi.org/10.13012/B2IDB-1747658_V1.

Original languageEnglish (US)
Title of host publication19th International Workshop on Algorithms in Bioinformatics, WABI 2019
EditorsKatharina T. Huber, Dan Gusfield
PublisherSchloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
ISBN (Electronic)9783959771238
DOIs
StatePublished - Sep 2019
Event19th International Workshop on Algorithms in Bioinformatics, WABI 2019 - Niagara Falls, United States
Duration: Sep 8 2019Sep 10 2019

Publication series

NameLeibniz International Proceedings in Informatics, LIPIcs
Volume143
ISSN (Print)1868-8969

Conference

Conference19th International Workshop on Algorithms in Bioinformatics, WABI 2019
CountryUnited States
CityNiagara Falls
Period9/8/199/10/19

Fingerprint

Genes
Gene transfer
Binary trees
Sorting
Pipelines
Polynomials

Keywords

  • Gene tree correction
  • Horizontal gene transfer
  • Incomplete lineage sorting

ASJC Scopus subject areas

  • Software

Cite this

Christensen, S., Molloy, E. K., Vachaspati, P., & Warnow, T. (2019). Traction: Fast non-parametric improvement of estimated gene trees. In K. T. Huber, & D. Gusfield (Eds.), 19th International Workshop on Algorithms in Bioinformatics, WABI 2019 [4] (Leibniz International Proceedings in Informatics, LIPIcs; Vol. 143). Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing. https://doi.org/10.4230/LIPIcs.WABI.2019.4

Traction : Fast non-parametric improvement of estimated gene trees. / Christensen, Sarah; Molloy, Erin K.; Vachaspati, Pranjal; Warnow, Tandy.

19th International Workshop on Algorithms in Bioinformatics, WABI 2019. ed. / Katharina T. Huber; Dan Gusfield. Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing, 2019. 4 (Leibniz International Proceedings in Informatics, LIPIcs; Vol. 143).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Christensen, S, Molloy, EK, Vachaspati, P & Warnow, T 2019, Traction: Fast non-parametric improvement of estimated gene trees. in KT Huber & D Gusfield (eds), 19th International Workshop on Algorithms in Bioinformatics, WABI 2019., 4, Leibniz International Proceedings in Informatics, LIPIcs, vol. 143, Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing, 19th International Workshop on Algorithms in Bioinformatics, WABI 2019, Niagara Falls, United States, 9/8/19. https://doi.org/10.4230/LIPIcs.WABI.2019.4
Christensen S, Molloy EK, Vachaspati P, Warnow T. Traction: Fast non-parametric improvement of estimated gene trees. In Huber KT, Gusfield D, editors, 19th International Workshop on Algorithms in Bioinformatics, WABI 2019. Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing. 2019. 4. (Leibniz International Proceedings in Informatics, LIPIcs). https://doi.org/10.4230/LIPIcs.WABI.2019.4
Christensen, Sarah ; Molloy, Erin K. ; Vachaspati, Pranjal ; Warnow, Tandy. / Traction : Fast non-parametric improvement of estimated gene trees. 19th International Workshop on Algorithms in Bioinformatics, WABI 2019. editor / Katharina T. Huber ; Dan Gusfield. Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing, 2019. (Leibniz International Proceedings in Informatics, LIPIcs).
@inproceedings{390b0494ffa648f19af90222609a4342,
title = "Traction: Fast non-parametric improvement of estimated gene trees",
abstract = "Gene tree correction aims to improve the accuracy of a gene tree by using computational techniques along with a reference tree (and in some cases available sequence data). It is an active area of research when dealing with gene tree heterogeneity due to duplication and loss (GDL). Here, we study the problem of gene tree correction where gene tree heterogeneity is instead due to incomplete lineage sorting (ILS, a common problem in eukaryotic phylogenetics) and horizontal gene transfer (HGT, a common problem in bacterial phylogenetics). We introduce TRACTION, a simple polynomial time method that provably finds an optimal solution to the RF-Optimal Tree Refinement and Completion Problem, which seeks a refinement and completion of an input tree t with respect to a given binary tree T so as to minimize the Robinson-Foulds (RF) distance. We present the results of an extensive simulation study evaluating TRACTION within gene tree correction pipelines on 68,000 estimated gene trees, using estimated species trees as reference trees. We explore accuracy under conditions with varying levels of gene tree heterogeneity due to ILS and HGT. We show that TRACTION matches or improves the accuracy of well-established methods from the GDL literature under conditions with HGT and ILS, and ties for best under the ILS-only conditions. Furthermore, TRACTION ties for fastest on these datasets. TRACTION is available at https://github.com/pranjalv123/TRACTION-RF and the study datasets are available at https://doi.org/10.13012/B2IDB-1747658_V1.",
keywords = "Gene tree correction, Horizontal gene transfer, Incomplete lineage sorting",
author = "Sarah Christensen and Molloy, {Erin K.} and Pranjal Vachaspati and Tandy Warnow",
year = "2019",
month = "9",
doi = "10.4230/LIPIcs.WABI.2019.4",
language = "English (US)",
series = "Leibniz International Proceedings in Informatics, LIPIcs",
publisher = "Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing",
editor = "Huber, {Katharina T.} and Dan Gusfield",
booktitle = "19th International Workshop on Algorithms in Bioinformatics, WABI 2019",

}

TY - GEN

T1 - Traction

T2 - Fast non-parametric improvement of estimated gene trees

AU - Christensen, Sarah

AU - Molloy, Erin K.

AU - Vachaspati, Pranjal

AU - Warnow, Tandy

PY - 2019/9

Y1 - 2019/9

N2 - Gene tree correction aims to improve the accuracy of a gene tree by using computational techniques along with a reference tree (and in some cases available sequence data). It is an active area of research when dealing with gene tree heterogeneity due to duplication and loss (GDL). Here, we study the problem of gene tree correction where gene tree heterogeneity is instead due to incomplete lineage sorting (ILS, a common problem in eukaryotic phylogenetics) and horizontal gene transfer (HGT, a common problem in bacterial phylogenetics). We introduce TRACTION, a simple polynomial time method that provably finds an optimal solution to the RF-Optimal Tree Refinement and Completion Problem, which seeks a refinement and completion of an input tree t with respect to a given binary tree T so as to minimize the Robinson-Foulds (RF) distance. We present the results of an extensive simulation study evaluating TRACTION within gene tree correction pipelines on 68,000 estimated gene trees, using estimated species trees as reference trees. We explore accuracy under conditions with varying levels of gene tree heterogeneity due to ILS and HGT. We show that TRACTION matches or improves the accuracy of well-established methods from the GDL literature under conditions with HGT and ILS, and ties for best under the ILS-only conditions. Furthermore, TRACTION ties for fastest on these datasets. TRACTION is available at https://github.com/pranjalv123/TRACTION-RF and the study datasets are available at https://doi.org/10.13012/B2IDB-1747658_V1.

AB - Gene tree correction aims to improve the accuracy of a gene tree by using computational techniques along with a reference tree (and in some cases available sequence data). It is an active area of research when dealing with gene tree heterogeneity due to duplication and loss (GDL). Here, we study the problem of gene tree correction where gene tree heterogeneity is instead due to incomplete lineage sorting (ILS, a common problem in eukaryotic phylogenetics) and horizontal gene transfer (HGT, a common problem in bacterial phylogenetics). We introduce TRACTION, a simple polynomial time method that provably finds an optimal solution to the RF-Optimal Tree Refinement and Completion Problem, which seeks a refinement and completion of an input tree t with respect to a given binary tree T so as to minimize the Robinson-Foulds (RF) distance. We present the results of an extensive simulation study evaluating TRACTION within gene tree correction pipelines on 68,000 estimated gene trees, using estimated species trees as reference trees. We explore accuracy under conditions with varying levels of gene tree heterogeneity due to ILS and HGT. We show that TRACTION matches or improves the accuracy of well-established methods from the GDL literature under conditions with HGT and ILS, and ties for best under the ILS-only conditions. Furthermore, TRACTION ties for fastest on these datasets. TRACTION is available at https://github.com/pranjalv123/TRACTION-RF and the study datasets are available at https://doi.org/10.13012/B2IDB-1747658_V1.

KW - Gene tree correction

KW - Horizontal gene transfer

KW - Incomplete lineage sorting

UR - http://www.scopus.com/inward/record.url?scp=85072638069&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85072638069&partnerID=8YFLogxK

U2 - 10.4230/LIPIcs.WABI.2019.4

DO - 10.4230/LIPIcs.WABI.2019.4

M3 - Conference contribution

AN - SCOPUS:85072638069

T3 - Leibniz International Proceedings in Informatics, LIPIcs

BT - 19th International Workshop on Algorithms in Bioinformatics, WABI 2019

A2 - Huber, Katharina T.

A2 - Gusfield, Dan

PB - Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing

ER -