SVDquest: Improving SVDquartets species tree estimation using exact optimization within a constrained search space

Pranjal Vachaspati, Tandy Warnow

Research output: Contribution to journalArticle

Abstract

Species tree estimation from multi-locus datasets is complicated by processes such as incomplete lineage sorting (ILS) that result in different loci having different trees. Summary methods, which estimate species trees by combining gene trees, are popular but their accuracy is impaired by gene tree estimation error. Other approaches have been developed that only use the site patterns to estimate the species tree, and so are not impacted by gene tree estimation issues. In particular, PAUP provides a method in which SVDquartets is used to compute a set Q of quartet trees (i.e., trees on four leaves), and then a heuristic search is used to combine the quartet trees into a species tree T, seeking to maximize the number of quartet trees in Q that agree with T. The PAUP method based on SVDquartets (henceforth referred to as SVDquartets + PAUP) is increasingly used in phylogenomic studies due to its ability to reconstruct species trees without needing to estimate accurate gene trees. We present SVDquest, a new method for constructing species trees using site patterns that is guaranteed to produce species trees that satisfy at least as many quartet trees as SVDquartets + PAUP. We show that SVDquest is competitive with ASTRAL and ASTRID (two leading summary methods) in terms of topological accuracy, and tends to be more accurate than ASTRAL and ASTRID under conditions with relatively high gene tree estimation error. SVDquest is available in open source form at https://github.com/pranjalv123/SVDquest.

Original languageEnglish (US)
Pages (from-to)122-136
Number of pages15
JournalMolecular Phylogenetics and Evolution
Volume124
DOIs
StatePublished - Jul 2018

Fingerprint

gene
Genes
genes
methodology
loci
Aptitude
heuristics
sorting
method
leaves
Heuristics
Datasets

ASJC Scopus subject areas

  • Ecology, Evolution, Behavior and Systematics
  • Molecular Biology
  • Genetics

Cite this

SVDquest : Improving SVDquartets species tree estimation using exact optimization within a constrained search space. / Vachaspati, Pranjal; Warnow, Tandy.

In: Molecular Phylogenetics and Evolution, Vol. 124, 07.2018, p. 122-136.

Research output: Contribution to journalArticle

@article{e80f03ee0bc84652a40e143041bb8d37,
title = "SVDquest: Improving SVDquartets species tree estimation using exact optimization within a constrained search space",
abstract = "Species tree estimation from multi-locus datasets is complicated by processes such as incomplete lineage sorting (ILS) that result in different loci having different trees. Summary methods, which estimate species trees by combining gene trees, are popular but their accuracy is impaired by gene tree estimation error. Other approaches have been developed that only use the site patterns to estimate the species tree, and so are not impacted by gene tree estimation issues. In particular, PAUP∗ provides a method in which SVDquartets is used to compute a set Q of quartet trees (i.e., trees on four leaves), and then a heuristic search is used to combine the quartet trees into a species tree T, seeking to maximize the number of quartet trees in Q that agree with T. The PAUP∗ method based on SVDquartets (henceforth referred to as SVDquartets + PAUP∗) is increasingly used in phylogenomic studies due to its ability to reconstruct species trees without needing to estimate accurate gene trees. We present SVDquest∗, a new method for constructing species trees using site patterns that is guaranteed to produce species trees that satisfy at least as many quartet trees as SVDquartets + PAUP∗. We show that SVDquest∗ is competitive with ASTRAL and ASTRID (two leading summary methods) in terms of topological accuracy, and tends to be more accurate than ASTRAL and ASTRID under conditions with relatively high gene tree estimation error. SVDquest∗ is available in open source form at https://github.com/pranjalv123/SVDquest.",
author = "Pranjal Vachaspati and Tandy Warnow",
year = "2018",
month = "7",
doi = "10.1016/j.ympev.2018.03.006",
language = "English (US)",
volume = "124",
pages = "122--136",
journal = "Molecular Phylogenetics and Evolution",
issn = "1055-7903",
publisher = "Academic Press Inc.",

}

TY - JOUR

T1 - SVDquest

T2 - Improving SVDquartets species tree estimation using exact optimization within a constrained search space

AU - Vachaspati, Pranjal

AU - Warnow, Tandy

PY - 2018/7

Y1 - 2018/7

N2 - Species tree estimation from multi-locus datasets is complicated by processes such as incomplete lineage sorting (ILS) that result in different loci having different trees. Summary methods, which estimate species trees by combining gene trees, are popular but their accuracy is impaired by gene tree estimation error. Other approaches have been developed that only use the site patterns to estimate the species tree, and so are not impacted by gene tree estimation issues. In particular, PAUP∗ provides a method in which SVDquartets is used to compute a set Q of quartet trees (i.e., trees on four leaves), and then a heuristic search is used to combine the quartet trees into a species tree T, seeking to maximize the number of quartet trees in Q that agree with T. The PAUP∗ method based on SVDquartets (henceforth referred to as SVDquartets + PAUP∗) is increasingly used in phylogenomic studies due to its ability to reconstruct species trees without needing to estimate accurate gene trees. We present SVDquest∗, a new method for constructing species trees using site patterns that is guaranteed to produce species trees that satisfy at least as many quartet trees as SVDquartets + PAUP∗. We show that SVDquest∗ is competitive with ASTRAL and ASTRID (two leading summary methods) in terms of topological accuracy, and tends to be more accurate than ASTRAL and ASTRID under conditions with relatively high gene tree estimation error. SVDquest∗ is available in open source form at https://github.com/pranjalv123/SVDquest.

AB - Species tree estimation from multi-locus datasets is complicated by processes such as incomplete lineage sorting (ILS) that result in different loci having different trees. Summary methods, which estimate species trees by combining gene trees, are popular but their accuracy is impaired by gene tree estimation error. Other approaches have been developed that only use the site patterns to estimate the species tree, and so are not impacted by gene tree estimation issues. In particular, PAUP∗ provides a method in which SVDquartets is used to compute a set Q of quartet trees (i.e., trees on four leaves), and then a heuristic search is used to combine the quartet trees into a species tree T, seeking to maximize the number of quartet trees in Q that agree with T. The PAUP∗ method based on SVDquartets (henceforth referred to as SVDquartets + PAUP∗) is increasingly used in phylogenomic studies due to its ability to reconstruct species trees without needing to estimate accurate gene trees. We present SVDquest∗, a new method for constructing species trees using site patterns that is guaranteed to produce species trees that satisfy at least as many quartet trees as SVDquartets + PAUP∗. We show that SVDquest∗ is competitive with ASTRAL and ASTRID (two leading summary methods) in terms of topological accuracy, and tends to be more accurate than ASTRAL and ASTRID under conditions with relatively high gene tree estimation error. SVDquest∗ is available in open source form at https://github.com/pranjalv123/SVDquest.

UR - http://www.scopus.com/inward/record.url?scp=85044125131&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85044125131&partnerID=8YFLogxK

U2 - 10.1016/j.ympev.2018.03.006

DO - 10.1016/j.ympev.2018.03.006

M3 - Article

C2 - 29530498

AN - SCOPUS:85044125131

VL - 124

SP - 122

EP - 136

JO - Molecular Phylogenetics and Evolution

JF - Molecular Phylogenetics and Evolution

SN - 1055-7903

ER -