Evaluating summarymethods for multilocus species tree estimation in the presence of incomplete lineage sorting

Siavash Mirarab, Md Shamsuzzoha Bayzid, Tandy Warnow

Research output: Contribution to journalArticle

Abstract

Species tree estimation is complicated by processes, such as gene duplication and loss and incomplete lineage sorting (ILS), that cause discordance between gene trees and the species tree. Furthermore, while concatenation, a traditional approach to tree estimation, has excellent performance under many conditions, the expectation is that the best accuracy will be obtained through the use of species tree estimation methods that are specifically designed to address gene tree discordance. In this article, we report on a study to evaluate MP-EST-one of the most popular species tree estimation methods designed to address ILS-as well as concatenation under maximum likelihood, the greedy consensus, and two supertree methods (Matrix Representation with Parsimony and Matrix Representation with Likelihood). Our study shows that several factors impact the absolute and relative accuracy of methods, including the number of gene trees, the accuracy of the estimated gene trees, and the amount of ILS. Concatenation can be more accurate than the best summary methods in some cases (mostly when the gene trees have poor phylogenetic signal or when the level of ILS is low), but summary methods are generally more accurate than concatenation when there are an adequate number of sufficiently accurate gene trees. Our study suggests that coalescent-based species tree methods may be key to estimating highly accurate species trees from multiple loci.

Original languageEnglish (US)
Pages (from-to)366-380
Number of pages15
JournalSystematic biology
Volume65
Issue number3
DOIs
StatePublished - Jan 1 2016

Fingerprint

sorting
gene
Genes
genes
estimation method
methodology
Gene Duplication
Expressed Sequence Tags
matrix
gene duplication
method
phylogenetics

Keywords

  • Concatenation
  • Consensus Methods
  • Gene Tree Discordance
  • Incomplete Lineage Sorting
  • MP-EST
  • MRL
  • MRP
  • Multilocus Bootstrapping
  • Species Tree Estimation
  • Supertree Methods

ASJC Scopus subject areas

  • Ecology, Evolution, Behavior and Systematics
  • Genetics

Cite this

Evaluating summarymethods for multilocus species tree estimation in the presence of incomplete lineage sorting. / Mirarab, Siavash; Bayzid, Md Shamsuzzoha; Warnow, Tandy.

In: Systematic biology, Vol. 65, No. 3, 01.01.2016, p. 366-380.

Research output: Contribution to journalArticle

@article{74302f57cfba4da59d348306d78ef1f4,
title = "Evaluating summarymethods for multilocus species tree estimation in the presence of incomplete lineage sorting",
abstract = "Species tree estimation is complicated by processes, such as gene duplication and loss and incomplete lineage sorting (ILS), that cause discordance between gene trees and the species tree. Furthermore, while concatenation, a traditional approach to tree estimation, has excellent performance under many conditions, the expectation is that the best accuracy will be obtained through the use of species tree estimation methods that are specifically designed to address gene tree discordance. In this article, we report on a study to evaluate MP-EST-one of the most popular species tree estimation methods designed to address ILS-as well as concatenation under maximum likelihood, the greedy consensus, and two supertree methods (Matrix Representation with Parsimony and Matrix Representation with Likelihood). Our study shows that several factors impact the absolute and relative accuracy of methods, including the number of gene trees, the accuracy of the estimated gene trees, and the amount of ILS. Concatenation can be more accurate than the best summary methods in some cases (mostly when the gene trees have poor phylogenetic signal or when the level of ILS is low), but summary methods are generally more accurate than concatenation when there are an adequate number of sufficiently accurate gene trees. Our study suggests that coalescent-based species tree methods may be key to estimating highly accurate species trees from multiple loci.",
keywords = "Concatenation, Consensus Methods, Gene Tree Discordance, Incomplete Lineage Sorting, MP-EST, MRL, MRP, Multilocus Bootstrapping, Species Tree Estimation, Supertree Methods",
author = "Siavash Mirarab and Bayzid, {Md Shamsuzzoha} and Tandy Warnow",
year = "2016",
month = "1",
day = "1",
doi = "10.1093/sysbio/syu063",
language = "English (US)",
volume = "65",
pages = "366--380",
journal = "Systematic Biology",
issn = "1063-5157",
publisher = "Oxford University Press",
number = "3",

}

TY - JOUR

T1 - Evaluating summarymethods for multilocus species tree estimation in the presence of incomplete lineage sorting

AU - Mirarab, Siavash

AU - Bayzid, Md Shamsuzzoha

AU - Warnow, Tandy

PY - 2016/1/1

Y1 - 2016/1/1

N2 - Species tree estimation is complicated by processes, such as gene duplication and loss and incomplete lineage sorting (ILS), that cause discordance between gene trees and the species tree. Furthermore, while concatenation, a traditional approach to tree estimation, has excellent performance under many conditions, the expectation is that the best accuracy will be obtained through the use of species tree estimation methods that are specifically designed to address gene tree discordance. In this article, we report on a study to evaluate MP-EST-one of the most popular species tree estimation methods designed to address ILS-as well as concatenation under maximum likelihood, the greedy consensus, and two supertree methods (Matrix Representation with Parsimony and Matrix Representation with Likelihood). Our study shows that several factors impact the absolute and relative accuracy of methods, including the number of gene trees, the accuracy of the estimated gene trees, and the amount of ILS. Concatenation can be more accurate than the best summary methods in some cases (mostly when the gene trees have poor phylogenetic signal or when the level of ILS is low), but summary methods are generally more accurate than concatenation when there are an adequate number of sufficiently accurate gene trees. Our study suggests that coalescent-based species tree methods may be key to estimating highly accurate species trees from multiple loci.

AB - Species tree estimation is complicated by processes, such as gene duplication and loss and incomplete lineage sorting (ILS), that cause discordance between gene trees and the species tree. Furthermore, while concatenation, a traditional approach to tree estimation, has excellent performance under many conditions, the expectation is that the best accuracy will be obtained through the use of species tree estimation methods that are specifically designed to address gene tree discordance. In this article, we report on a study to evaluate MP-EST-one of the most popular species tree estimation methods designed to address ILS-as well as concatenation under maximum likelihood, the greedy consensus, and two supertree methods (Matrix Representation with Parsimony and Matrix Representation with Likelihood). Our study shows that several factors impact the absolute and relative accuracy of methods, including the number of gene trees, the accuracy of the estimated gene trees, and the amount of ILS. Concatenation can be more accurate than the best summary methods in some cases (mostly when the gene trees have poor phylogenetic signal or when the level of ILS is low), but summary methods are generally more accurate than concatenation when there are an adequate number of sufficiently accurate gene trees. Our study suggests that coalescent-based species tree methods may be key to estimating highly accurate species trees from multiple loci.

KW - Concatenation

KW - Consensus Methods

KW - Gene Tree Discordance

KW - Incomplete Lineage Sorting

KW - MP-EST

KW - MRL

KW - MRP

KW - Multilocus Bootstrapping

KW - Species Tree Estimation

KW - Supertree Methods

UR - http://www.scopus.com/inward/record.url?scp=84978808477&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84978808477&partnerID=8YFLogxK

U2 - 10.1093/sysbio/syu063

DO - 10.1093/sysbio/syu063

M3 - Article

C2 - 25164915

AN - SCOPUS:84978808477

VL - 65

SP - 366

EP - 380

JO - Systematic Biology

JF - Systematic Biology

SN - 1063-5157

IS - 3

ER -