Abstract

Background: Species tree estimation can be challenging in the presence of gene tree conflict due to incomplete lineage sorting (ILS), which can occur when the time between speciation events is short relative to the population size. Of the many methods that have been developed to estimate species trees in the presence of ILS, *BEAST, a Bayesian method that co-estimates the species tree and gene trees given sequence alignments on multiple loci, has generally been shown to have the best accuracy. However, *BEAST is extremely computationally intensive so that it cannot be used with large numbers of loci; hence, *BEAST is not suitable for genome-scale analyses. Results: We present BBCA (boosted binned coalescent-based analysis), a method that can be used with *BEAST (and other such co-estimation methods) to improve scalability. BBCA partitions the loci randomly into subsets, uses *BEAST on each subset to co-estimate the gene trees and species tree for the subset, and then combines the newly estimated gene trees together using MP-EST, a popular coalescent-based summary method. We compare time-restricted versions of BBCA and *BEAST on simulated datasets, and show that BBCA is at least as accurate as *BEAST, and achieves better convergence rates for large numbers of loci. Conclusions: Phylogenomic analysis using *BEAST is currently limited to datasets with a small number of loci, and analyses with even just 100 loci can be computationally challenging. BBCA uses a very simple divide-and-conquer approach that makes it possible to use *BEAST on datasets containing hundreds of loci. This study shows that BBCA provides excellent accuracy and is highly scalable.

Original languageEnglish (US)
Article numberS11
JournalBMC genomics
Volume15
Issue number6
DOIs
StatePublished - Oct 17 2014

Fingerprint

Genes
Bayes Theorem
Sequence Alignment
Expressed Sequence Tags
Population Density
Genome
Datasets

Keywords

  • Binning
  • Incomplete lineage sorting
  • Multi-species coalescent
  • Phylogenomics

ASJC Scopus subject areas

  • Biotechnology
  • Genetics

Cite this

BBCA : Improving the scalability of *BEAST using random binning. / Zimmermann, Théo; Mirarab, Siavash; Warnow, Tandy.

In: BMC genomics, Vol. 15, No. 6, S11, 17.10.2014.

Research output: Contribution to journalArticle

Zimmermann, Théo ; Mirarab, Siavash ; Warnow, Tandy. / BBCA : Improving the scalability of *BEAST using random binning. In: BMC genomics. 2014 ; Vol. 15, No. 6.
@article{5a22aa2311a5452cb1224f552bcb7d2c,
title = "BBCA: Improving the scalability of *BEAST using random binning",
abstract = "Background: Species tree estimation can be challenging in the presence of gene tree conflict due to incomplete lineage sorting (ILS), which can occur when the time between speciation events is short relative to the population size. Of the many methods that have been developed to estimate species trees in the presence of ILS, *BEAST, a Bayesian method that co-estimates the species tree and gene trees given sequence alignments on multiple loci, has generally been shown to have the best accuracy. However, *BEAST is extremely computationally intensive so that it cannot be used with large numbers of loci; hence, *BEAST is not suitable for genome-scale analyses. Results: We present BBCA (boosted binned coalescent-based analysis), a method that can be used with *BEAST (and other such co-estimation methods) to improve scalability. BBCA partitions the loci randomly into subsets, uses *BEAST on each subset to co-estimate the gene trees and species tree for the subset, and then combines the newly estimated gene trees together using MP-EST, a popular coalescent-based summary method. We compare time-restricted versions of BBCA and *BEAST on simulated datasets, and show that BBCA is at least as accurate as *BEAST, and achieves better convergence rates for large numbers of loci. Conclusions: Phylogenomic analysis using *BEAST is currently limited to datasets with a small number of loci, and analyses with even just 100 loci can be computationally challenging. BBCA uses a very simple divide-and-conquer approach that makes it possible to use *BEAST on datasets containing hundreds of loci. This study shows that BBCA provides excellent accuracy and is highly scalable.",
keywords = "Binning, Incomplete lineage sorting, Multi-species coalescent, Phylogenomics",
author = "Th{\'e}o Zimmermann and Siavash Mirarab and Tandy Warnow",
year = "2014",
month = "10",
day = "17",
doi = "10.1186/1471-2164-15-S6-S11",
language = "English (US)",
volume = "15",
journal = "BMC Genomics",
issn = "1471-2164",
publisher = "BioMed Central",
number = "6",

}

TY - JOUR

T1 - BBCA

T2 - Improving the scalability of *BEAST using random binning

AU - Zimmermann, Théo

AU - Mirarab, Siavash

AU - Warnow, Tandy

PY - 2014/10/17

Y1 - 2014/10/17

N2 - Background: Species tree estimation can be challenging in the presence of gene tree conflict due to incomplete lineage sorting (ILS), which can occur when the time between speciation events is short relative to the population size. Of the many methods that have been developed to estimate species trees in the presence of ILS, *BEAST, a Bayesian method that co-estimates the species tree and gene trees given sequence alignments on multiple loci, has generally been shown to have the best accuracy. However, *BEAST is extremely computationally intensive so that it cannot be used with large numbers of loci; hence, *BEAST is not suitable for genome-scale analyses. Results: We present BBCA (boosted binned coalescent-based analysis), a method that can be used with *BEAST (and other such co-estimation methods) to improve scalability. BBCA partitions the loci randomly into subsets, uses *BEAST on each subset to co-estimate the gene trees and species tree for the subset, and then combines the newly estimated gene trees together using MP-EST, a popular coalescent-based summary method. We compare time-restricted versions of BBCA and *BEAST on simulated datasets, and show that BBCA is at least as accurate as *BEAST, and achieves better convergence rates for large numbers of loci. Conclusions: Phylogenomic analysis using *BEAST is currently limited to datasets with a small number of loci, and analyses with even just 100 loci can be computationally challenging. BBCA uses a very simple divide-and-conquer approach that makes it possible to use *BEAST on datasets containing hundreds of loci. This study shows that BBCA provides excellent accuracy and is highly scalable.

AB - Background: Species tree estimation can be challenging in the presence of gene tree conflict due to incomplete lineage sorting (ILS), which can occur when the time between speciation events is short relative to the population size. Of the many methods that have been developed to estimate species trees in the presence of ILS, *BEAST, a Bayesian method that co-estimates the species tree and gene trees given sequence alignments on multiple loci, has generally been shown to have the best accuracy. However, *BEAST is extremely computationally intensive so that it cannot be used with large numbers of loci; hence, *BEAST is not suitable for genome-scale analyses. Results: We present BBCA (boosted binned coalescent-based analysis), a method that can be used with *BEAST (and other such co-estimation methods) to improve scalability. BBCA partitions the loci randomly into subsets, uses *BEAST on each subset to co-estimate the gene trees and species tree for the subset, and then combines the newly estimated gene trees together using MP-EST, a popular coalescent-based summary method. We compare time-restricted versions of BBCA and *BEAST on simulated datasets, and show that BBCA is at least as accurate as *BEAST, and achieves better convergence rates for large numbers of loci. Conclusions: Phylogenomic analysis using *BEAST is currently limited to datasets with a small number of loci, and analyses with even just 100 loci can be computationally challenging. BBCA uses a very simple divide-and-conquer approach that makes it possible to use *BEAST on datasets containing hundreds of loci. This study shows that BBCA provides excellent accuracy and is highly scalable.

KW - Binning

KW - Incomplete lineage sorting

KW - Multi-species coalescent

KW - Phylogenomics

UR - http://www.scopus.com/inward/record.url?scp=84971249137&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84971249137&partnerID=8YFLogxK

U2 - 10.1186/1471-2164-15-S6-S11

DO - 10.1186/1471-2164-15-S6-S11

M3 - Article

C2 - 25572469

AN - SCOPUS:84971249137

VL - 15

JO - BMC Genomics

JF - BMC Genomics

SN - 1471-2164

IS - 6

M1 - S11

ER -