Gene tree parsimony for incomplete gene trees

Md Shamsuzzoha Bayzid, Tandy Warnow

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Species tree estimation from gene trees can be complicated by gene duplication and loss, and "gene tree parsimony" (GTP) is one approach for estimating species trees from multiple gene trees. In its standard formulation, the objective is to find a species tree that minimizes the total number of gene duplications and losses with respect to the input set of gene trees. Although much is known about GTP, little is known about how to treat inputs containing some incomplete gene trees (i.e., gene trees lacking one or more of the species). We present new theory for GTP considering whether the incompleteness is due to gene birth and death (i.e., true biological loss) or taxon sampling, and present dynamic programming algorithms that can be used for an exact but exponential time solution for small numbers of taxa, or as a heuristic for larger numbers of taxa. We also prove that the "standard" calculations for duplications and losses exactly solve GTP when incompleteness results from taxon sampling, although they can be incorrect when incompleteness results from true biological loss. The software for the DP algorithm is freely available as open source code at https://github.com/shamsbayzid/DynaDup.

Original languageEnglish (US)
Title of host publication17th International Workshop on Algorithms in Bioinformatics, WABI 2017
EditorsKnut Reinert, Russell Schwartz
PublisherSchloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
ISBN (Electronic)9783959770507
DOIs
StatePublished - Aug 1 2017
Event17th International Workshop on Algorithms in Bioinformatics, WABI 2017 - Boston, United States
Duration: Aug 21 2017Aug 23 2017

Publication series

NameLeibniz International Proceedings in Informatics, LIPIcs
Volume88
ISSN (Print)1868-8969

Other

Other17th International Workshop on Algorithms in Bioinformatics, WABI 2017
CountryUnited States
CityBoston
Period8/21/178/23/17

Fingerprint

Genes
Sampling
Dynamic programming

Keywords

  • Deep coalescence
  • Gene duplication and loss
  • Gene tree parsimony

ASJC Scopus subject areas

  • Software

Cite this

Bayzid, M. S., & Warnow, T. (2017). Gene tree parsimony for incomplete gene trees. In K. Reinert, & R. Schwartz (Eds.), 17th International Workshop on Algorithms in Bioinformatics, WABI 2017 [2] (Leibniz International Proceedings in Informatics, LIPIcs; Vol. 88). Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing. https://doi.org/10.4230/LIPIcs.WABI.2017.2

Gene tree parsimony for incomplete gene trees. / Bayzid, Md Shamsuzzoha; Warnow, Tandy.

17th International Workshop on Algorithms in Bioinformatics, WABI 2017. ed. / Knut Reinert; Russell Schwartz. Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing, 2017. 2 (Leibniz International Proceedings in Informatics, LIPIcs; Vol. 88).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Bayzid, MS & Warnow, T 2017, Gene tree parsimony for incomplete gene trees. in K Reinert & R Schwartz (eds), 17th International Workshop on Algorithms in Bioinformatics, WABI 2017., 2, Leibniz International Proceedings in Informatics, LIPIcs, vol. 88, Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing, 17th International Workshop on Algorithms in Bioinformatics, WABI 2017, Boston, United States, 8/21/17. https://doi.org/10.4230/LIPIcs.WABI.2017.2
Bayzid MS, Warnow T. Gene tree parsimony for incomplete gene trees. In Reinert K, Schwartz R, editors, 17th International Workshop on Algorithms in Bioinformatics, WABI 2017. Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing. 2017. 2. (Leibniz International Proceedings in Informatics, LIPIcs). https://doi.org/10.4230/LIPIcs.WABI.2017.2
Bayzid, Md Shamsuzzoha ; Warnow, Tandy. / Gene tree parsimony for incomplete gene trees. 17th International Workshop on Algorithms in Bioinformatics, WABI 2017. editor / Knut Reinert ; Russell Schwartz. Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing, 2017. (Leibniz International Proceedings in Informatics, LIPIcs).
@inproceedings{dc95aa724e34445daff1f4d520ba4e4f,
title = "Gene tree parsimony for incomplete gene trees",
abstract = "Species tree estimation from gene trees can be complicated by gene duplication and loss, and {"}gene tree parsimony{"} (GTP) is one approach for estimating species trees from multiple gene trees. In its standard formulation, the objective is to find a species tree that minimizes the total number of gene duplications and losses with respect to the input set of gene trees. Although much is known about GTP, little is known about how to treat inputs containing some incomplete gene trees (i.e., gene trees lacking one or more of the species). We present new theory for GTP considering whether the incompleteness is due to gene birth and death (i.e., true biological loss) or taxon sampling, and present dynamic programming algorithms that can be used for an exact but exponential time solution for small numbers of taxa, or as a heuristic for larger numbers of taxa. We also prove that the {"}standard{"} calculations for duplications and losses exactly solve GTP when incompleteness results from taxon sampling, although they can be incorrect when incompleteness results from true biological loss. The software for the DP algorithm is freely available as open source code at https://github.com/shamsbayzid/DynaDup.",
keywords = "Deep coalescence, Gene duplication and loss, Gene tree parsimony",
author = "Bayzid, {Md Shamsuzzoha} and Tandy Warnow",
year = "2017",
month = "8",
day = "1",
doi = "10.4230/LIPIcs.WABI.2017.2",
language = "English (US)",
series = "Leibniz International Proceedings in Informatics, LIPIcs",
publisher = "Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing",
editor = "Knut Reinert and Russell Schwartz",
booktitle = "17th International Workshop on Algorithms in Bioinformatics, WABI 2017",

}

TY - GEN

T1 - Gene tree parsimony for incomplete gene trees

AU - Bayzid, Md Shamsuzzoha

AU - Warnow, Tandy

PY - 2017/8/1

Y1 - 2017/8/1

N2 - Species tree estimation from gene trees can be complicated by gene duplication and loss, and "gene tree parsimony" (GTP) is one approach for estimating species trees from multiple gene trees. In its standard formulation, the objective is to find a species tree that minimizes the total number of gene duplications and losses with respect to the input set of gene trees. Although much is known about GTP, little is known about how to treat inputs containing some incomplete gene trees (i.e., gene trees lacking one or more of the species). We present new theory for GTP considering whether the incompleteness is due to gene birth and death (i.e., true biological loss) or taxon sampling, and present dynamic programming algorithms that can be used for an exact but exponential time solution for small numbers of taxa, or as a heuristic for larger numbers of taxa. We also prove that the "standard" calculations for duplications and losses exactly solve GTP when incompleteness results from taxon sampling, although they can be incorrect when incompleteness results from true biological loss. The software for the DP algorithm is freely available as open source code at https://github.com/shamsbayzid/DynaDup.

AB - Species tree estimation from gene trees can be complicated by gene duplication and loss, and "gene tree parsimony" (GTP) is one approach for estimating species trees from multiple gene trees. In its standard formulation, the objective is to find a species tree that minimizes the total number of gene duplications and losses with respect to the input set of gene trees. Although much is known about GTP, little is known about how to treat inputs containing some incomplete gene trees (i.e., gene trees lacking one or more of the species). We present new theory for GTP considering whether the incompleteness is due to gene birth and death (i.e., true biological loss) or taxon sampling, and present dynamic programming algorithms that can be used for an exact but exponential time solution for small numbers of taxa, or as a heuristic for larger numbers of taxa. We also prove that the "standard" calculations for duplications and losses exactly solve GTP when incompleteness results from taxon sampling, although they can be incorrect when incompleteness results from true biological loss. The software for the DP algorithm is freely available as open source code at https://github.com/shamsbayzid/DynaDup.

KW - Deep coalescence

KW - Gene duplication and loss

KW - Gene tree parsimony

UR - http://www.scopus.com/inward/record.url?scp=85028768974&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85028768974&partnerID=8YFLogxK

U2 - 10.4230/LIPIcs.WABI.2017.2

DO - 10.4230/LIPIcs.WABI.2017.2

M3 - Conference contribution

AN - SCOPUS:85028768974

T3 - Leibniz International Proceedings in Informatics, LIPIcs

BT - 17th International Workshop on Algorithms in Bioinformatics, WABI 2017

A2 - Reinert, Knut

A2 - Schwartz, Russell

PB - Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing

ER -