Abstract

Background: Incomplete lineage sorting (ILS), modelled by the multi-species coalescent (MSC), is known to create discordance between gene trees and species trees, and lead to inaccurate species tree estimations unless appropriate methods are used to estimate the species tree. While many statistically consistent methods have been developed to estimate the species tree in the presence of ILS, only ASTRAL-2 and NJst have been shown to have good accuracy on large datasets. Yet, NJst is generally slower and less accurate than ASTRAL-2, and cannot run on some datasets. Results: We have redesigned NJst to enable it to run on all datasets, and we have expanded its design space so that it can be used with different distance-based tree estimation methods. The resultant method, ASTRID, is statistically consistent under the MSC model, and has accuracy that is competitive with ASTRAL-2. Furthermore, ASTRID is much faster than ASTRAL-2, completing in minutes on some datasets for which ASTRAL-2 used hours. Conclusions: ASTRID is a new coalescent-based method for species tree estimation that is competitive with the best current method in terms of accuracy, while being much faster. ASTRID is available in open source form on github.

Original languageEnglish (US)
Pages (from-to)1-13
Number of pages13
JournalBMC genomics
Volume16
DOIs
StatePublished - Jan 1 2015

Fingerprint

Datasets
Genes

ASJC Scopus subject areas

  • Biotechnology
  • Genetics

Cite this

ASTRID : Accurate species TRees from internode distances. / Vachaspati, Pranjal; Warnow, Tandy.

In: BMC genomics, Vol. 16, 01.01.2015, p. 1-13.

Research output: Contribution to journalArticle

Vachaspati, Pranjal ; Warnow, Tandy. / ASTRID : Accurate species TRees from internode distances. In: BMC genomics. 2015 ; Vol. 16. pp. 1-13.
@article{72969ebbfaeb4315834b70d8e913be8c,
title = "ASTRID: Accurate species TRees from internode distances",
abstract = "Background: Incomplete lineage sorting (ILS), modelled by the multi-species coalescent (MSC), is known to create discordance between gene trees and species trees, and lead to inaccurate species tree estimations unless appropriate methods are used to estimate the species tree. While many statistically consistent methods have been developed to estimate the species tree in the presence of ILS, only ASTRAL-2 and NJst have been shown to have good accuracy on large datasets. Yet, NJst is generally slower and less accurate than ASTRAL-2, and cannot run on some datasets. Results: We have redesigned NJst to enable it to run on all datasets, and we have expanded its design space so that it can be used with different distance-based tree estimation methods. The resultant method, ASTRID, is statistically consistent under the MSC model, and has accuracy that is competitive with ASTRAL-2. Furthermore, ASTRID is much faster than ASTRAL-2, completing in minutes on some datasets for which ASTRAL-2 used hours. Conclusions: ASTRID is a new coalescent-based method for species tree estimation that is competitive with the best current method in terms of accuracy, while being much faster. ASTRID is available in open source form on github.",
author = "Pranjal Vachaspati and Tandy Warnow",
year = "2015",
month = "1",
day = "1",
doi = "10.1186/1471-2164-16-S10-S3",
language = "English (US)",
volume = "16",
pages = "1--13",
journal = "BMC Genomics",
issn = "1471-2164",
publisher = "BioMed Central",

}

TY - JOUR

T1 - ASTRID

T2 - Accurate species TRees from internode distances

AU - Vachaspati, Pranjal

AU - Warnow, Tandy

PY - 2015/1/1

Y1 - 2015/1/1

N2 - Background: Incomplete lineage sorting (ILS), modelled by the multi-species coalescent (MSC), is known to create discordance between gene trees and species trees, and lead to inaccurate species tree estimations unless appropriate methods are used to estimate the species tree. While many statistically consistent methods have been developed to estimate the species tree in the presence of ILS, only ASTRAL-2 and NJst have been shown to have good accuracy on large datasets. Yet, NJst is generally slower and less accurate than ASTRAL-2, and cannot run on some datasets. Results: We have redesigned NJst to enable it to run on all datasets, and we have expanded its design space so that it can be used with different distance-based tree estimation methods. The resultant method, ASTRID, is statistically consistent under the MSC model, and has accuracy that is competitive with ASTRAL-2. Furthermore, ASTRID is much faster than ASTRAL-2, completing in minutes on some datasets for which ASTRAL-2 used hours. Conclusions: ASTRID is a new coalescent-based method for species tree estimation that is competitive with the best current method in terms of accuracy, while being much faster. ASTRID is available in open source form on github.

AB - Background: Incomplete lineage sorting (ILS), modelled by the multi-species coalescent (MSC), is known to create discordance between gene trees and species trees, and lead to inaccurate species tree estimations unless appropriate methods are used to estimate the species tree. While many statistically consistent methods have been developed to estimate the species tree in the presence of ILS, only ASTRAL-2 and NJst have been shown to have good accuracy on large datasets. Yet, NJst is generally slower and less accurate than ASTRAL-2, and cannot run on some datasets. Results: We have redesigned NJst to enable it to run on all datasets, and we have expanded its design space so that it can be used with different distance-based tree estimation methods. The resultant method, ASTRID, is statistically consistent under the MSC model, and has accuracy that is competitive with ASTRAL-2. Furthermore, ASTRID is much faster than ASTRAL-2, completing in minutes on some datasets for which ASTRAL-2 used hours. Conclusions: ASTRID is a new coalescent-based method for species tree estimation that is competitive with the best current method in terms of accuracy, while being much faster. ASTRID is available in open source form on github.

UR - http://www.scopus.com/inward/record.url?scp=84944723546&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84944723546&partnerID=8YFLogxK

U2 - 10.1186/1471-2164-16-S10-S3

DO - 10.1186/1471-2164-16-S10-S3

M3 - Article

C2 - 26449326

AN - SCOPUS:84944723546

VL - 16

SP - 1

EP - 13

JO - BMC Genomics

JF - BMC Genomics

SN - 1471-2164

ER -