Abstract

Over the last years, the availability of genomic sequence data from thousands of different species has led to hopes that a phylogenetic tree of all life might be achievable. Yet, the most accurate methods for estimating phylogenies are heuristics for NP-hard optimization problems, many of which are too computationally intensive to use on large datasets. Divide-and-conquer approaches have been proposed to address scalability to large datasets that divide the species into subsets, construct trees on subsets, and then merge the trees together. Prior approaches have divided species sets into overlapping subsets and used supertree methods to merge the subset trees, but limitations in supertree methods suggest this kind of divide-and-conquer approach is unlikely to provide scalability to ultra-large datasets. Recently, a new approach has been developed that divides the species dataset into disjoint subsets, computes trees on subsets, and then combines the subset trees using auxiliary information (e.g., a distance matrix). Here, we describe these strategies and their theoretical properties, present open problems, and discuss opportunities for impact in large-scale phylogenetic estimation using these and similar approaches.

Original languageEnglish (US)
Title of host publicationAlgorithms for Computational Biology - 6th International Conference, AlCoB 2019, Proceedings
EditorsMiguel A. Vega-Rodríguez, Ian Holmes, Carlos Martín-Vide
PublisherSpringer
Pages3-21
Number of pages19
ISBN (Print)9783030181734
DOIs
StatePublished - 2019
Event6th International Conference on Algorithms for Computational Biology, AlCoB 2019 - Berkeley, United States
Duration: May 28 2019May 30 2019

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11488 LNBI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference6th International Conference on Algorithms for Computational Biology, AlCoB 2019
Country/TerritoryUnited States
CityBerkeley
Period5/28/195/30/19

Keywords

  • Absolute fast converging methods
  • Divide-and-conquer
  • Gene trees
  • Incomplete lineage sorting
  • Inferring the evolutionary phylogeny of species
  • Species trees
  • Statistical consistency

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'New Divide-and-Conquer Techniques for Large-Scale Phylogenetic Estimation'. Together they form a unique fingerprint.

Cite this