New Divide-and-Conquer Techniques for Large-Scale Phylogenetic Estimation

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Over the last years, the availability of genomic sequence data from thousands of different species has led to hopes that a phylogenetic tree of all life might be achievable. Yet, the most accurate methods for estimating phylogenies are heuristics for NP-hard optimization problems, many of which are too computationally intensive to use on large datasets. Divide-and-conquer approaches have been proposed to address scalability to large datasets that divide the species into subsets, construct trees on subsets, and then merge the trees together. Prior approaches have divided species sets into overlapping subsets and used supertree methods to merge the subset trees, but limitations in supertree methods suggest this kind of divide-and-conquer approach is unlikely to provide scalability to ultra-large datasets. Recently, a new approach has been developed that divides the species dataset into disjoint subsets, computes trees on subsets, and then combines the subset trees using auxiliary information (e.g., a distance matrix). Here, we describe these strategies and their theoretical properties, present open problems, and discuss opportunities for impact in large-scale phylogenetic estimation using these and similar approaches.

Original languageEnglish (US)
Title of host publicationAlgorithms for Computational Biology - 6th International Conference, AlCoB 2019, Proceedings
EditorsMiguel A. Vega-Rodríguez, Ian Holmes, Carlos Martín-Vide
PublisherSpringer-Verlag
Pages3-21
Number of pages19
ISBN (Print)9783030181734
DOIs
StatePublished - Jan 1 2019
Event6th International Conference on Algorithms for Computational Biology, AlCoB 2019 - Berkeley, United States
Duration: May 28 2019May 30 2019

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11488 LNBI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference6th International Conference on Algorithms for Computational Biology, AlCoB 2019
CountryUnited States
CityBerkeley
Period5/28/195/30/19

Fingerprint

Divide and conquer
Phylogenetics
Scalability
Subset
Large Data Sets
Availability
Divides
Auxiliary Information
Distance Matrix
Phylogeny
Phylogenetic Tree
NP-hard Problems
Genomics
Overlapping
Open Problems
Disjoint
Heuristics
Optimization Problem

Keywords

  • Absolute fast converging methods
  • Divide-and-conquer
  • Gene trees
  • Incomplete lineage sorting
  • Inferring the evolutionary phylogeny of species
  • Species trees
  • Statistical consistency

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Warnow, T. (2019). New Divide-and-Conquer Techniques for Large-Scale Phylogenetic Estimation. In M. A. Vega-Rodríguez, I. Holmes, & C. Martín-Vide (Eds.), Algorithms for Computational Biology - 6th International Conference, AlCoB 2019, Proceedings (pp. 3-21). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11488 LNBI). Springer-Verlag. https://doi.org/10.1007/978-3-030-18174-1_1

New Divide-and-Conquer Techniques for Large-Scale Phylogenetic Estimation. / Warnow, Tandy.

Algorithms for Computational Biology - 6th International Conference, AlCoB 2019, Proceedings. ed. / Miguel A. Vega-Rodríguez; Ian Holmes; Carlos Martín-Vide. Springer-Verlag, 2019. p. 3-21 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11488 LNBI).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Warnow, T 2019, New Divide-and-Conquer Techniques for Large-Scale Phylogenetic Estimation. in MA Vega-Rodríguez, I Holmes & C Martín-Vide (eds), Algorithms for Computational Biology - 6th International Conference, AlCoB 2019, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11488 LNBI, Springer-Verlag, pp. 3-21, 6th International Conference on Algorithms for Computational Biology, AlCoB 2019, Berkeley, United States, 5/28/19. https://doi.org/10.1007/978-3-030-18174-1_1
Warnow T. New Divide-and-Conquer Techniques for Large-Scale Phylogenetic Estimation. In Vega-Rodríguez MA, Holmes I, Martín-Vide C, editors, Algorithms for Computational Biology - 6th International Conference, AlCoB 2019, Proceedings. Springer-Verlag. 2019. p. 3-21. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-030-18174-1_1
Warnow, Tandy. / New Divide-and-Conquer Techniques for Large-Scale Phylogenetic Estimation. Algorithms for Computational Biology - 6th International Conference, AlCoB 2019, Proceedings. editor / Miguel A. Vega-Rodríguez ; Ian Holmes ; Carlos Martín-Vide. Springer-Verlag, 2019. pp. 3-21 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{990d9c83622743bca5b96162e79d71a7,
title = "New Divide-and-Conquer Techniques for Large-Scale Phylogenetic Estimation",
abstract = "Over the last years, the availability of genomic sequence data from thousands of different species has led to hopes that a phylogenetic tree of all life might be achievable. Yet, the most accurate methods for estimating phylogenies are heuristics for NP-hard optimization problems, many of which are too computationally intensive to use on large datasets. Divide-and-conquer approaches have been proposed to address scalability to large datasets that divide the species into subsets, construct trees on subsets, and then merge the trees together. Prior approaches have divided species sets into overlapping subsets and used supertree methods to merge the subset trees, but limitations in supertree methods suggest this kind of divide-and-conquer approach is unlikely to provide scalability to ultra-large datasets. Recently, a new approach has been developed that divides the species dataset into disjoint subsets, computes trees on subsets, and then combines the subset trees using auxiliary information (e.g., a distance matrix). Here, we describe these strategies and their theoretical properties, present open problems, and discuss opportunities for impact in large-scale phylogenetic estimation using these and similar approaches.",
keywords = "Absolute fast converging methods, Divide-and-conquer, Gene trees, Incomplete lineage sorting, Inferring the evolutionary phylogeny of species, Species trees, Statistical consistency",
author = "Tandy Warnow",
year = "2019",
month = "1",
day = "1",
doi = "10.1007/978-3-030-18174-1_1",
language = "English (US)",
isbn = "9783030181734",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer-Verlag",
pages = "3--21",
editor = "Vega-Rodr{\'i}guez, {Miguel A.} and Ian Holmes and Carlos Mart{\'i}n-Vide",
booktitle = "Algorithms for Computational Biology - 6th International Conference, AlCoB 2019, Proceedings",

}

TY - GEN

T1 - New Divide-and-Conquer Techniques for Large-Scale Phylogenetic Estimation

AU - Warnow, Tandy

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Over the last years, the availability of genomic sequence data from thousands of different species has led to hopes that a phylogenetic tree of all life might be achievable. Yet, the most accurate methods for estimating phylogenies are heuristics for NP-hard optimization problems, many of which are too computationally intensive to use on large datasets. Divide-and-conquer approaches have been proposed to address scalability to large datasets that divide the species into subsets, construct trees on subsets, and then merge the trees together. Prior approaches have divided species sets into overlapping subsets and used supertree methods to merge the subset trees, but limitations in supertree methods suggest this kind of divide-and-conquer approach is unlikely to provide scalability to ultra-large datasets. Recently, a new approach has been developed that divides the species dataset into disjoint subsets, computes trees on subsets, and then combines the subset trees using auxiliary information (e.g., a distance matrix). Here, we describe these strategies and their theoretical properties, present open problems, and discuss opportunities for impact in large-scale phylogenetic estimation using these and similar approaches.

AB - Over the last years, the availability of genomic sequence data from thousands of different species has led to hopes that a phylogenetic tree of all life might be achievable. Yet, the most accurate methods for estimating phylogenies are heuristics for NP-hard optimization problems, many of which are too computationally intensive to use on large datasets. Divide-and-conquer approaches have been proposed to address scalability to large datasets that divide the species into subsets, construct trees on subsets, and then merge the trees together. Prior approaches have divided species sets into overlapping subsets and used supertree methods to merge the subset trees, but limitations in supertree methods suggest this kind of divide-and-conquer approach is unlikely to provide scalability to ultra-large datasets. Recently, a new approach has been developed that divides the species dataset into disjoint subsets, computes trees on subsets, and then combines the subset trees using auxiliary information (e.g., a distance matrix). Here, we describe these strategies and their theoretical properties, present open problems, and discuss opportunities for impact in large-scale phylogenetic estimation using these and similar approaches.

KW - Absolute fast converging methods

KW - Divide-and-conquer

KW - Gene trees

KW - Incomplete lineage sorting

KW - Inferring the evolutionary phylogeny of species

KW - Species trees

KW - Statistical consistency

UR - http://www.scopus.com/inward/record.url?scp=85066151325&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85066151325&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-18174-1_1

DO - 10.1007/978-3-030-18174-1_1

M3 - Conference contribution

SN - 9783030181734

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 3

EP - 21

BT - Algorithms for Computational Biology - 6th International Conference, AlCoB 2019, Proceedings

A2 - Vega-Rodríguez, Miguel A.

A2 - Holmes, Ian

A2 - Martín-Vide, Carlos

PB - Springer-Verlag

ER -