Abstract
Phylogenomics—the estimation of species trees from multi-locus datasets—is a common step in many biological studies. However, this estimation is challenged by the fact that genes can evolve under processes, including incomplete lineage sorting (ILS) and gene duplication and loss (GDL), that make their trees different from the species tree. In this paper, we address the challenge of estimating the species tree under GDL. We show that species trees are identifiable under a standard stochastic model for GDL, and that the polynomial-time algorithm ASTRAL-multi, a recent development in the ASTRAL suite of methods, is statistically consistent under this GDL model. We also provide a simulation study evaluating ASTRAL-multi for species tree estimation under GDL. All scripts and datasets used in this study are available on the Illinois Data Bank: https://doi.org/10.13012/B2IDB-2626814_V1.
Original language | English (US) |
---|---|
Title of host publication | Research in Computational Molecular Biology - 24th Annual International Conference, RECOMB 2020, Proceedings |
Editors | Russell Schwartz |
Publisher | Springer |
Pages | 120-135 |
Number of pages | 16 |
ISBN (Print) | 9783030452568 |
DOIs | |
State | Published - 2020 |
Event | 24th Annual Conference on Research in Computational Molecular Biology, RECOMB 2020 - Padua, Italy Duration: May 10 2020 → May 13 2020 |
Publication series
Name | Lecture Notes in Computer Science |
---|---|
Volume | 12074 LNBI |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | 24th Annual Conference on Research in Computational Molecular Biology, RECOMB 2020 |
---|---|
Country/Territory | Italy |
City | Padua |
Period | 5/10/20 → 5/13/20 |
Keywords
- ASTRAL
- Estimation
- Gene duplication and loss
- Identifiability
- Species trees
- Statistical consistency
ASJC Scopus subject areas
- Theoretical Computer Science
- General Computer Science
Fingerprint
Dive into the research topics of 'Polynomial-time statistical estimation of species trees under gene duplication and loss'. Together they form a unique fingerprint.Datasets
-
Data from: Polynomial-Time Statistical Estimation of Species Trees under Gene Duplication and Loss
Legried, B. (Creator), Molloy, E. K. (Creator), Warnow, T. (Creator) & Roch, S. (Creator), University of Illinois Urbana-Champaign, Jul 15 2020
DOI: 10.13012/B2IDB-2626814_V3
Dataset