TY - JOUR
T1 - Gene tree parsimony for incomplete gene trees
T2 - Addressing true biological loss
AU - Bayzid, Md Shamsuzzoha
AU - Warnow, Tandy
N1 - Funding Information:
The authors thank Siavash Mirarab for his helpful suggestions and help in implementing the software. The authors wish to thank the anonymous reviewers for their helpful comments, which led to improvements to the manuscript. This work was supported by US National Science Foundation (1062335 and CCF-1535977) to TW and Fulbright Fellowship to MSB.
Publisher Copyright:
© 2018 The Author(s).
PY - 2018/1/19
Y1 - 2018/1/19
N2 - Motivation: Species tree estimation from gene trees can be complicated by gene duplication and loss, and "gene tree parsimony" (GTP) is one approach for estimating species trees from multiple gene trees. In its standard formulation, the objective is to find a species tree that minimizes the total number of gene duplications and losses with respect to the input set of gene trees. Although much is known about GTP, little is known about how to treat inputs containing some incomplete gene trees (i.e., gene trees lacking one or more of the species). Results: We present new theory for GTP considering whether the incompleteness is due to gene birth and death (i.e., true biological loss) or taxon sampling, and present dynamic programming algorithms that can be used for an exact but exponential time solution for small numbers of taxa, or as a heuristic for larger numbers of taxa. We also prove that the "standard" calculations for duplications and losses exactly solve GTP when incompleteness results from taxon sampling, although they can be incorrect when incompleteness results from true biological loss. The software for the DP algorithm is freely available as open source code at https://github.com/smirarab/DynaDup.
AB - Motivation: Species tree estimation from gene trees can be complicated by gene duplication and loss, and "gene tree parsimony" (GTP) is one approach for estimating species trees from multiple gene trees. In its standard formulation, the objective is to find a species tree that minimizes the total number of gene duplications and losses with respect to the input set of gene trees. Although much is known about GTP, little is known about how to treat inputs containing some incomplete gene trees (i.e., gene trees lacking one or more of the species). Results: We present new theory for GTP considering whether the incompleteness is due to gene birth and death (i.e., true biological loss) or taxon sampling, and present dynamic programming algorithms that can be used for an exact but exponential time solution for small numbers of taxa, or as a heuristic for larger numbers of taxa. We also prove that the "standard" calculations for duplications and losses exactly solve GTP when incompleteness results from taxon sampling, although they can be incorrect when incompleteness results from true biological loss. The software for the DP algorithm is freely available as open source code at https://github.com/smirarab/DynaDup.
KW - Deep coalescence
KW - Dynamic programming
KW - Gene duplication and loss
KW - Gene tree parsimony
UR - http://www.scopus.com/inward/record.url?scp=85040796600&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85040796600&partnerID=8YFLogxK
U2 - 10.1186/s13015-017-0120-1
DO - 10.1186/s13015-017-0120-1
M3 - Article
C2 - 29387142
AN - SCOPUS:85040796600
SN - 1748-7188
VL - 13
JO - Algorithms for Molecular Biology
JF - Algorithms for Molecular Biology
IS - 1
M1 - 1
ER -