On the Robustness to Gene Tree Estimation Error (or lack thereof) of Coalescent-Based Species Tree Methods

Sebastien Roch, Tandy Warnow

Research output: Contribution to journalArticle

Abstract

The estimation of species trees using multiple loci has become increasingly common. Because different loci can have different phylogenetic histories (reflected in different gene tree topologies) for multiple biological causes, new approaches to species tree estimation have been developed that take gene tree heterogeneity into account. Among these multiple causes, incomplete lineage sorting (ILS), modeled by the multi-species coalescent, is potentially the most common cause of gene tree heterogeneity, and much of the focus of the recent literature has been on how to estimate species trees in the presence of ILS. Despite progress in developing statistically consistent techniques for estimating species trees when gene trees can differ due to ILS, there is substantial controversy in the systematics community as to whether to use the new coalescent-based methods or the traditional concatenation methods. One of the key issues that has been raised is understanding the impact of gene tree estimation error on coalescent-based methods that operate by combining gene trees. Here we explore the mathematical guarantees of coalescent-based methods when analyzing estimated rather than true gene trees. Our results provide some insight into the differences between promise of coalescent-based methods in theory and their performance in practice.

Original languageEnglish (US)
Pages (from-to)663-676
Number of pages14
JournalSystematic biology
Volume64
Issue number4
DOIs
StatePublished - Jul 1 2015

Fingerprint

gene
Genes
genes
sorting
methodology
method
loci
topology
taxonomy
phylogenetics
history
phylogeny

Keywords

  • coalescent-based methods
  • gene tree estimation error
  • incomplete lineage sorting
  • multi-species coalescent
  • species tree reconstruction
  • statistical consistency

ASJC Scopus subject areas

  • Ecology, Evolution, Behavior and Systematics
  • Genetics

Cite this

On the Robustness to Gene Tree Estimation Error (or lack thereof) of Coalescent-Based Species Tree Methods. / Roch, Sebastien; Warnow, Tandy.

In: Systematic biology, Vol. 64, No. 4, 01.07.2015, p. 663-676.

Research output: Contribution to journalArticle

@article{a5cab83a34da40fa9e2cd401a6cee648,
title = "On the Robustness to Gene Tree Estimation Error (or lack thereof) of Coalescent-Based Species Tree Methods",
abstract = "The estimation of species trees using multiple loci has become increasingly common. Because different loci can have different phylogenetic histories (reflected in different gene tree topologies) for multiple biological causes, new approaches to species tree estimation have been developed that take gene tree heterogeneity into account. Among these multiple causes, incomplete lineage sorting (ILS), modeled by the multi-species coalescent, is potentially the most common cause of gene tree heterogeneity, and much of the focus of the recent literature has been on how to estimate species trees in the presence of ILS. Despite progress in developing statistically consistent techniques for estimating species trees when gene trees can differ due to ILS, there is substantial controversy in the systematics community as to whether to use the new coalescent-based methods or the traditional concatenation methods. One of the key issues that has been raised is understanding the impact of gene tree estimation error on coalescent-based methods that operate by combining gene trees. Here we explore the mathematical guarantees of coalescent-based methods when analyzing estimated rather than true gene trees. Our results provide some insight into the differences between promise of coalescent-based methods in theory and their performance in practice.",
keywords = "coalescent-based methods, gene tree estimation error, incomplete lineage sorting, multi-species coalescent, species tree reconstruction, statistical consistency",
author = "Sebastien Roch and Tandy Warnow",
year = "2015",
month = "7",
day = "1",
doi = "10.1093/sysbio/syv016",
language = "English (US)",
volume = "64",
pages = "663--676",
journal = "Systematic Biology",
issn = "1063-5157",
publisher = "Oxford University Press",
number = "4",

}

TY - JOUR

T1 - On the Robustness to Gene Tree Estimation Error (or lack thereof) of Coalescent-Based Species Tree Methods

AU - Roch, Sebastien

AU - Warnow, Tandy

PY - 2015/7/1

Y1 - 2015/7/1

N2 - The estimation of species trees using multiple loci has become increasingly common. Because different loci can have different phylogenetic histories (reflected in different gene tree topologies) for multiple biological causes, new approaches to species tree estimation have been developed that take gene tree heterogeneity into account. Among these multiple causes, incomplete lineage sorting (ILS), modeled by the multi-species coalescent, is potentially the most common cause of gene tree heterogeneity, and much of the focus of the recent literature has been on how to estimate species trees in the presence of ILS. Despite progress in developing statistically consistent techniques for estimating species trees when gene trees can differ due to ILS, there is substantial controversy in the systematics community as to whether to use the new coalescent-based methods or the traditional concatenation methods. One of the key issues that has been raised is understanding the impact of gene tree estimation error on coalescent-based methods that operate by combining gene trees. Here we explore the mathematical guarantees of coalescent-based methods when analyzing estimated rather than true gene trees. Our results provide some insight into the differences between promise of coalescent-based methods in theory and their performance in practice.

AB - The estimation of species trees using multiple loci has become increasingly common. Because different loci can have different phylogenetic histories (reflected in different gene tree topologies) for multiple biological causes, new approaches to species tree estimation have been developed that take gene tree heterogeneity into account. Among these multiple causes, incomplete lineage sorting (ILS), modeled by the multi-species coalescent, is potentially the most common cause of gene tree heterogeneity, and much of the focus of the recent literature has been on how to estimate species trees in the presence of ILS. Despite progress in developing statistically consistent techniques for estimating species trees when gene trees can differ due to ILS, there is substantial controversy in the systematics community as to whether to use the new coalescent-based methods or the traditional concatenation methods. One of the key issues that has been raised is understanding the impact of gene tree estimation error on coalescent-based methods that operate by combining gene trees. Here we explore the mathematical guarantees of coalescent-based methods when analyzing estimated rather than true gene trees. Our results provide some insight into the differences between promise of coalescent-based methods in theory and their performance in practice.

KW - coalescent-based methods

KW - gene tree estimation error

KW - incomplete lineage sorting

KW - multi-species coalescent

KW - species tree reconstruction

KW - statistical consistency

UR - http://www.scopus.com/inward/record.url?scp=84931054390&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84931054390&partnerID=8YFLogxK

U2 - 10.1093/sysbio/syv016

DO - 10.1093/sysbio/syv016

M3 - Article

C2 - 25813358

AN - SCOPUS:84931054390

VL - 64

SP - 663

EP - 676

JO - Systematic Biology

JF - Systematic Biology

SN - 1063-5157

IS - 4

ER -