ASTRAL-II: Coalescent-based species tree estimation with many hundreds of taxa and thousands of genes

Siavash Mirarab, Tandy Warnow

Research output: Contribution to journalArticle

Abstract

Motivation: The estimation of species phylogenies requires multiple loci, since different loci can have different trees due to incomplete lineage sorting, modeled by the multi-species coalescent model. We recently developed a coalescent-based method, ASTRAL, which is statistically consistent under the multi-species coalescent model and which is more accurate than other coalescent-based methods on the datasets we examined. ASTRAL runs in polynomial time, by constraining the search space using a set of allowed 'bipartitions'. Despite the limitation to allowed bipartitions, ASTRAL is statistically consistent. Results: We present a new version of ASTRAL, which we call ASTRAL-II. We show that ASTRAL-II has substantial advantages over ASTRAL: it is faster, can analyze much larger datasets (up to 1000 species and 1000 genes) and has substantially better accuracy under some conditions. ASTRAL's running time is O(n2k|X|2), and ASTRAL-II's running time is O(nk|X|2), where n is the number of species, k is the number of loci and X is the set of allowed bipartitions for the search space.

Original languageEnglish (US)
Pages (from-to)i44-i52
JournalBioinformatics
Volume31
Issue number12
DOIs
StatePublished - Jun 15 2015

Fingerprint

Genes
Gene
Locus
Sorting
Search Space
Polynomials
Phylogeny
Large Data Sets
Polynomial time
Datasets
Model

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this

ASTRAL-II : Coalescent-based species tree estimation with many hundreds of taxa and thousands of genes. / Mirarab, Siavash; Warnow, Tandy.

In: Bioinformatics, Vol. 31, No. 12, 15.06.2015, p. i44-i52.

Research output: Contribution to journalArticle

@article{1b608d16b8f1441b9f72bf734fe823a8,
title = "ASTRAL-II: Coalescent-based species tree estimation with many hundreds of taxa and thousands of genes",
abstract = "Motivation: The estimation of species phylogenies requires multiple loci, since different loci can have different trees due to incomplete lineage sorting, modeled by the multi-species coalescent model. We recently developed a coalescent-based method, ASTRAL, which is statistically consistent under the multi-species coalescent model and which is more accurate than other coalescent-based methods on the datasets we examined. ASTRAL runs in polynomial time, by constraining the search space using a set of allowed 'bipartitions'. Despite the limitation to allowed bipartitions, ASTRAL is statistically consistent. Results: We present a new version of ASTRAL, which we call ASTRAL-II. We show that ASTRAL-II has substantial advantages over ASTRAL: it is faster, can analyze much larger datasets (up to 1000 species and 1000 genes) and has substantially better accuracy under some conditions. ASTRAL's running time is O(n2k|X|2), and ASTRAL-II's running time is O(nk|X|2), where n is the number of species, k is the number of loci and X is the set of allowed bipartitions for the search space.",
author = "Siavash Mirarab and Tandy Warnow",
year = "2015",
month = "6",
day = "15",
doi = "10.1093/bioinformatics/btv234",
language = "English (US)",
volume = "31",
pages = "i44--i52",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "12",

}

TY - JOUR

T1 - ASTRAL-II

T2 - Coalescent-based species tree estimation with many hundreds of taxa and thousands of genes

AU - Mirarab, Siavash

AU - Warnow, Tandy

PY - 2015/6/15

Y1 - 2015/6/15

N2 - Motivation: The estimation of species phylogenies requires multiple loci, since different loci can have different trees due to incomplete lineage sorting, modeled by the multi-species coalescent model. We recently developed a coalescent-based method, ASTRAL, which is statistically consistent under the multi-species coalescent model and which is more accurate than other coalescent-based methods on the datasets we examined. ASTRAL runs in polynomial time, by constraining the search space using a set of allowed 'bipartitions'. Despite the limitation to allowed bipartitions, ASTRAL is statistically consistent. Results: We present a new version of ASTRAL, which we call ASTRAL-II. We show that ASTRAL-II has substantial advantages over ASTRAL: it is faster, can analyze much larger datasets (up to 1000 species and 1000 genes) and has substantially better accuracy under some conditions. ASTRAL's running time is O(n2k|X|2), and ASTRAL-II's running time is O(nk|X|2), where n is the number of species, k is the number of loci and X is the set of allowed bipartitions for the search space.

AB - Motivation: The estimation of species phylogenies requires multiple loci, since different loci can have different trees due to incomplete lineage sorting, modeled by the multi-species coalescent model. We recently developed a coalescent-based method, ASTRAL, which is statistically consistent under the multi-species coalescent model and which is more accurate than other coalescent-based methods on the datasets we examined. ASTRAL runs in polynomial time, by constraining the search space using a set of allowed 'bipartitions'. Despite the limitation to allowed bipartitions, ASTRAL is statistically consistent. Results: We present a new version of ASTRAL, which we call ASTRAL-II. We show that ASTRAL-II has substantial advantages over ASTRAL: it is faster, can analyze much larger datasets (up to 1000 species and 1000 genes) and has substantially better accuracy under some conditions. ASTRAL's running time is O(n2k|X|2), and ASTRAL-II's running time is O(nk|X|2), where n is the number of species, k is the number of loci and X is the set of allowed bipartitions for the search space.

UR - http://www.scopus.com/inward/record.url?scp=84931034856&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84931034856&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btv234

DO - 10.1093/bioinformatics/btv234

M3 - Article

C2 - 26072508

AN - SCOPUS:84931034856

VL - 31

SP - i44-i52

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 12

ER -