Abstract

Phylogenomics—the estimation of species trees from multi-locus datasets—is a common step in many biological studies. However, this estimation is challenged by the fact that genes can evolve under processes, including incomplete lineage sorting (ILS) and gene duplication and loss (GDL), that make their trees different from the species tree. In this paper, we address the challenge of estimating the species tree under GDL. We show that species trees are identifiable under a standard stochastic model for GDL, and that the polynomial-time algorithm ASTRAL-multi, a recent development in the ASTRAL suite of methods, is statistically consistent under this GDL model. We also provide a simulation study evaluating ASTRAL-multi for species tree estimation under GDL. All scripts and datasets used in this study are available on the Illinois Data Bank: https://doi.org/10.13012/B2IDB-2626814_V1.

Original languageEnglish (US)
Title of host publicationResearch in Computational Molecular Biology - 24th Annual International Conference, RECOMB 2020, Proceedings
EditorsRussell Schwartz
PublisherSpringer
Pages120-135
Number of pages16
ISBN (Print)9783030452568
DOIs
StatePublished - 2020
Event24th Annual Conference on Research in Computational Molecular Biology, RECOMB 2020 - Padua, Italy
Duration: May 10 2020May 13 2020

Publication series

NameLecture Notes in Computer Science
Volume12074 LNBI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference24th Annual Conference on Research in Computational Molecular Biology, RECOMB 2020
Country/TerritoryItaly
CityPadua
Period5/10/205/13/20

Keywords

  • ASTRAL
  • Estimation
  • Gene duplication and loss
  • Identifiability
  • Species trees
  • Statistical consistency

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Polynomial-time statistical estimation of species trees under gene duplication and loss'. Together they form a unique fingerprint.

Cite this