Phylogenomic species tree estimation in the presence of incomplete lineage sorting and horizontal gene transfer

Ruth Davidson, Pranjal Vachaspati, Siavash Mirarab, Tandy Warnow

Research output: Contribution to journalArticle

Abstract

Background: Species tree estimation is challenged by gene tree heterogeneity resulting from biological processes such as duplication and loss, hybridization, incomplete lineage sorting (ILS), and horizontal gene transfer (HGT). Mathematical theory about reconstructing species trees in the presence of HGT alone or ILS alone suggests that quartet-based species tree methods (known to be statistically consistent under ILS, or under bounded amounts of HGT) might be effective techniques for estimating species trees when both HGT and ILS are present. Results: We evaluated several publicly available coalescent-based methods and concatenation under maximum likelihood on simulated datasets with moderate ILS and varying levels of HGT. Our study shows that two quartetbased species tree estimation methods (ASTRAL-2 and weighted Quartets MaxCut) are both highly accurate, even on datasets with high rates of HGT. In contrast, although NJst and concatenation using maximum likelihood are highly accurate under low HGT, they are less robust to high HGT rates. Conclusion: Our study shows that quartet-based species-tree estimation methods can be highly accurate under the presence of both HGT and ILS. The study suggests the possibility that some quartet-based methods might be statistically consistent under phylogenomic models of gene tree heterogeneity with both HGT and ILS.

Original languageEnglish (US)
Pages (from-to)1-12
Number of pages12
JournalBMC genomics
Volume16
DOIs
StatePublished - Jan 1 2015

Fingerprint

Horizontal Gene Transfer
Biological Phenomena
Genes

ASJC Scopus subject areas

  • Biotechnology
  • Genetics

Cite this

Phylogenomic species tree estimation in the presence of incomplete lineage sorting and horizontal gene transfer. / Davidson, Ruth; Vachaspati, Pranjal; Mirarab, Siavash; Warnow, Tandy.

In: BMC genomics, Vol. 16, 01.01.2015, p. 1-12.

Research output: Contribution to journalArticle

@article{c318c094fc214bd6956bcd93f2dd2599,
title = "Phylogenomic species tree estimation in the presence of incomplete lineage sorting and horizontal gene transfer",
abstract = "Background: Species tree estimation is challenged by gene tree heterogeneity resulting from biological processes such as duplication and loss, hybridization, incomplete lineage sorting (ILS), and horizontal gene transfer (HGT). Mathematical theory about reconstructing species trees in the presence of HGT alone or ILS alone suggests that quartet-based species tree methods (known to be statistically consistent under ILS, or under bounded amounts of HGT) might be effective techniques for estimating species trees when both HGT and ILS are present. Results: We evaluated several publicly available coalescent-based methods and concatenation under maximum likelihood on simulated datasets with moderate ILS and varying levels of HGT. Our study shows that two quartetbased species tree estimation methods (ASTRAL-2 and weighted Quartets MaxCut) are both highly accurate, even on datasets with high rates of HGT. In contrast, although NJst and concatenation using maximum likelihood are highly accurate under low HGT, they are less robust to high HGT rates. Conclusion: Our study shows that quartet-based species-tree estimation methods can be highly accurate under the presence of both HGT and ILS. The study suggests the possibility that some quartet-based methods might be statistically consistent under phylogenomic models of gene tree heterogeneity with both HGT and ILS.",
author = "Ruth Davidson and Pranjal Vachaspati and Siavash Mirarab and Tandy Warnow",
year = "2015",
month = "1",
day = "1",
doi = "10.1186/1471-2164-16-S10-S1",
language = "English (US)",
volume = "16",
pages = "1--12",
journal = "BMC Genomics",
issn = "1471-2164",
publisher = "BioMed Central",

}

TY - JOUR

T1 - Phylogenomic species tree estimation in the presence of incomplete lineage sorting and horizontal gene transfer

AU - Davidson, Ruth

AU - Vachaspati, Pranjal

AU - Mirarab, Siavash

AU - Warnow, Tandy

PY - 2015/1/1

Y1 - 2015/1/1

N2 - Background: Species tree estimation is challenged by gene tree heterogeneity resulting from biological processes such as duplication and loss, hybridization, incomplete lineage sorting (ILS), and horizontal gene transfer (HGT). Mathematical theory about reconstructing species trees in the presence of HGT alone or ILS alone suggests that quartet-based species tree methods (known to be statistically consistent under ILS, or under bounded amounts of HGT) might be effective techniques for estimating species trees when both HGT and ILS are present. Results: We evaluated several publicly available coalescent-based methods and concatenation under maximum likelihood on simulated datasets with moderate ILS and varying levels of HGT. Our study shows that two quartetbased species tree estimation methods (ASTRAL-2 and weighted Quartets MaxCut) are both highly accurate, even on datasets with high rates of HGT. In contrast, although NJst and concatenation using maximum likelihood are highly accurate under low HGT, they are less robust to high HGT rates. Conclusion: Our study shows that quartet-based species-tree estimation methods can be highly accurate under the presence of both HGT and ILS. The study suggests the possibility that some quartet-based methods might be statistically consistent under phylogenomic models of gene tree heterogeneity with both HGT and ILS.

AB - Background: Species tree estimation is challenged by gene tree heterogeneity resulting from biological processes such as duplication and loss, hybridization, incomplete lineage sorting (ILS), and horizontal gene transfer (HGT). Mathematical theory about reconstructing species trees in the presence of HGT alone or ILS alone suggests that quartet-based species tree methods (known to be statistically consistent under ILS, or under bounded amounts of HGT) might be effective techniques for estimating species trees when both HGT and ILS are present. Results: We evaluated several publicly available coalescent-based methods and concatenation under maximum likelihood on simulated datasets with moderate ILS and varying levels of HGT. Our study shows that two quartetbased species tree estimation methods (ASTRAL-2 and weighted Quartets MaxCut) are both highly accurate, even on datasets with high rates of HGT. In contrast, although NJst and concatenation using maximum likelihood are highly accurate under low HGT, they are less robust to high HGT rates. Conclusion: Our study shows that quartet-based species-tree estimation methods can be highly accurate under the presence of both HGT and ILS. The study suggests the possibility that some quartet-based methods might be statistically consistent under phylogenomic models of gene tree heterogeneity with both HGT and ILS.

UR - http://www.scopus.com/inward/record.url?scp=84944717648&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84944717648&partnerID=8YFLogxK

U2 - 10.1186/1471-2164-16-S10-S1

DO - 10.1186/1471-2164-16-S10-S1

M3 - Article

C2 - 26450506

AN - SCOPUS:84944717648

VL - 16

SP - 1

EP - 12

JO - BMC Genomics

JF - BMC Genomics

SN - 1471-2164

ER -