Abstract

Motivation: Species tree estimation from gene trees can be complicated by gene duplication and loss, and "gene tree parsimony" (GTP) is one approach for estimating species trees from multiple gene trees. In its standard formulation, the objective is to find a species tree that minimizes the total number of gene duplications and losses with respect to the input set of gene trees. Although much is known about GTP, little is known about how to treat inputs containing some incomplete gene trees (i.e., gene trees lacking one or more of the species). Results: We present new theory for GTP considering whether the incompleteness is due to gene birth and death (i.e., true biological loss) or taxon sampling, and present dynamic programming algorithms that can be used for an exact but exponential time solution for small numbers of taxa, or as a heuristic for larger numbers of taxa. We also prove that the "standard" calculations for duplications and losses exactly solve GTP when incompleteness results from taxon sampling, although they can be incorrect when incompleteness results from true biological loss. The software for the DP algorithm is freely available as open source code at https://github.com/smirarab/DynaDup.

Original languageEnglish (US)
Article number1
JournalAlgorithms for Molecular Biology
Volume13
Issue number1
DOIs
StatePublished - Jan 19 2018

Fingerprint

Parsimony
Genes
Gene
Incompleteness
Duplication
Gene Duplication
Sampling
Exponential time
Dynamic programming
Open Source
Dynamic Programming
Software

Keywords

  • Deep coalescence
  • Dynamic programming
  • Gene duplication and loss
  • Gene tree parsimony

ASJC Scopus subject areas

  • Structural Biology
  • Molecular Biology
  • Computational Theory and Mathematics
  • Applied Mathematics

Cite this

Gene tree parsimony for incomplete gene trees : Addressing true biological loss. / Bayzid, Md Shamsuzzoha; Warnow, Tandy.

In: Algorithms for Molecular Biology, Vol. 13, No. 1, 1, 19.01.2018.

Research output: Contribution to journalArticle

@article{affd230c8f2143a9869b492841d95fe9,
title = "Gene tree parsimony for incomplete gene trees: Addressing true biological loss",
abstract = "Motivation: Species tree estimation from gene trees can be complicated by gene duplication and loss, and {"}gene tree parsimony{"} (GTP) is one approach for estimating species trees from multiple gene trees. In its standard formulation, the objective is to find a species tree that minimizes the total number of gene duplications and losses with respect to the input set of gene trees. Although much is known about GTP, little is known about how to treat inputs containing some incomplete gene trees (i.e., gene trees lacking one or more of the species). Results: We present new theory for GTP considering whether the incompleteness is due to gene birth and death (i.e., true biological loss) or taxon sampling, and present dynamic programming algorithms that can be used for an exact but exponential time solution for small numbers of taxa, or as a heuristic for larger numbers of taxa. We also prove that the {"}standard{"} calculations for duplications and losses exactly solve GTP when incompleteness results from taxon sampling, although they can be incorrect when incompleteness results from true biological loss. The software for the DP algorithm is freely available as open source code at https://github.com/smirarab/DynaDup.",
keywords = "Deep coalescence, Dynamic programming, Gene duplication and loss, Gene tree parsimony",
author = "Bayzid, {Md Shamsuzzoha} and Tandy Warnow",
year = "2018",
month = "1",
day = "19",
doi = "10.1186/s13015-017-0120-1",
language = "English (US)",
volume = "13",
journal = "Algorithms for Molecular Biology",
issn = "1748-7188",
publisher = "BioMed Central",
number = "1",

}

TY - JOUR

T1 - Gene tree parsimony for incomplete gene trees

T2 - Addressing true biological loss

AU - Bayzid, Md Shamsuzzoha

AU - Warnow, Tandy

PY - 2018/1/19

Y1 - 2018/1/19

N2 - Motivation: Species tree estimation from gene trees can be complicated by gene duplication and loss, and "gene tree parsimony" (GTP) is one approach for estimating species trees from multiple gene trees. In its standard formulation, the objective is to find a species tree that minimizes the total number of gene duplications and losses with respect to the input set of gene trees. Although much is known about GTP, little is known about how to treat inputs containing some incomplete gene trees (i.e., gene trees lacking one or more of the species). Results: We present new theory for GTP considering whether the incompleteness is due to gene birth and death (i.e., true biological loss) or taxon sampling, and present dynamic programming algorithms that can be used for an exact but exponential time solution for small numbers of taxa, or as a heuristic for larger numbers of taxa. We also prove that the "standard" calculations for duplications and losses exactly solve GTP when incompleteness results from taxon sampling, although they can be incorrect when incompleteness results from true biological loss. The software for the DP algorithm is freely available as open source code at https://github.com/smirarab/DynaDup.

AB - Motivation: Species tree estimation from gene trees can be complicated by gene duplication and loss, and "gene tree parsimony" (GTP) is one approach for estimating species trees from multiple gene trees. In its standard formulation, the objective is to find a species tree that minimizes the total number of gene duplications and losses with respect to the input set of gene trees. Although much is known about GTP, little is known about how to treat inputs containing some incomplete gene trees (i.e., gene trees lacking one or more of the species). Results: We present new theory for GTP considering whether the incompleteness is due to gene birth and death (i.e., true biological loss) or taxon sampling, and present dynamic programming algorithms that can be used for an exact but exponential time solution for small numbers of taxa, or as a heuristic for larger numbers of taxa. We also prove that the "standard" calculations for duplications and losses exactly solve GTP when incompleteness results from taxon sampling, although they can be incorrect when incompleteness results from true biological loss. The software for the DP algorithm is freely available as open source code at https://github.com/smirarab/DynaDup.

KW - Deep coalescence

KW - Dynamic programming

KW - Gene duplication and loss

KW - Gene tree parsimony

UR - http://www.scopus.com/inward/record.url?scp=85040796600&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85040796600&partnerID=8YFLogxK

U2 - 10.1186/s13015-017-0120-1

DO - 10.1186/s13015-017-0120-1

M3 - Article

AN - SCOPUS:85040796600

VL - 13

JO - Algorithms for Molecular Biology

JF - Algorithms for Molecular Biology

SN - 1748-7188

IS - 1

M1 - 1

ER -