Abstract

Background: Determining the evolutionary relationships among the major lineages of extant birds has been one of the biggest challenges in systematic biology. To address this challenge, we assembled or collected the genomes of 48 avian species spanning most orders of birds, including all Neognathae and two of the five Palaeognathae orders. We used these genomes to construct a genome-scale avian phylogenetic tree and perform comparative genomic analyses.Findings: Here we present the datasets associated with the phylogenomic analyses, which include sequence alignment files consisting of nucleotides, amino acids, indels, and transposable elements, as well as tree files containing gene trees and species trees. Inferring an accurate phylogeny required generating: 1) A well annotated data set across species based on genome synteny; 2) Alignments with unaligned or incorrectly overaligned sequences filtered out; and 3) Diverse data sets, including genes and their inferred trees, indels, and transposable elements. Our total evidence nucleotide tree (TENT) data set (consisting of exons, introns, and UCEs) gave what we consider our most reliable species tree when using the concatenation-based ExaML algorithm or when using statistical binning with the coalescence-based MP-EST algorithm (which we refer to as MP-EST*). Other data sets, such as the coding sequence of some exons, revealed other properties of genome evolution, namely convergence.Conclusions: The Avian Phylogenomics Project is the largest vertebrate phylogenomics project to date that we are aware of. The sequence, alignment, and tree data are expected to accelerate analyses in phylogenomics and other related areas.

Original languageEnglish (US)
Article number4
JournalGigaScience
Volume4
Issue number1
DOIs
StatePublished - Feb 12 2015

Fingerprint

Genes
Genome
DNA Transposable Elements
Sequence Alignment
Expressed Sequence Tags
Birds
Exons
Palaeognathae
Nucleotides
Synteny
Phylogeny
Introns
Vertebrates
Datasets
Coalescence
Amino Acids
Amino acids

Keywords

  • Avian genomes
  • Gene trees
  • Indels
  • Phylogenomics
  • Sequence alignments
  • Species tree
  • Transposable elements

ASJC Scopus subject areas

  • Health Informatics
  • Computer Science Applications

Cite this

Phylogenomic analyses data of the avian phylogenomics project. / The Avian Phylogenomics Consortium.

In: GigaScience, Vol. 4, No. 1, 4, 12.02.2015.

Research output: Contribution to journalArticle

The Avian Phylogenomics Consortium. / Phylogenomic analyses data of the avian phylogenomics project. In: GigaScience. 2015 ; Vol. 4, No. 1.
@article{d1bfa8b409c143d8be94737e859fea57,
title = "Phylogenomic analyses data of the avian phylogenomics project",
abstract = "Background: Determining the evolutionary relationships among the major lineages of extant birds has been one of the biggest challenges in systematic biology. To address this challenge, we assembled or collected the genomes of 48 avian species spanning most orders of birds, including all Neognathae and two of the five Palaeognathae orders. We used these genomes to construct a genome-scale avian phylogenetic tree and perform comparative genomic analyses.Findings: Here we present the datasets associated with the phylogenomic analyses, which include sequence alignment files consisting of nucleotides, amino acids, indels, and transposable elements, as well as tree files containing gene trees and species trees. Inferring an accurate phylogeny required generating: 1) A well annotated data set across species based on genome synteny; 2) Alignments with unaligned or incorrectly overaligned sequences filtered out; and 3) Diverse data sets, including genes and their inferred trees, indels, and transposable elements. Our total evidence nucleotide tree (TENT) data set (consisting of exons, introns, and UCEs) gave what we consider our most reliable species tree when using the concatenation-based ExaML algorithm or when using statistical binning with the coalescence-based MP-EST algorithm (which we refer to as MP-EST*). Other data sets, such as the coding sequence of some exons, revealed other properties of genome evolution, namely convergence.Conclusions: The Avian Phylogenomics Project is the largest vertebrate phylogenomics project to date that we are aware of. The sequence, alignment, and tree data are expected to accelerate analyses in phylogenomics and other related areas.",
keywords = "Avian genomes, Gene trees, Indels, Phylogenomics, Sequence alignments, Species tree, Transposable elements",
author = "{The Avian Phylogenomics Consortium} and Jarvis, {Erich D.} and Siavash Mirarab and Aberer, {Andre J.} and Bo Li and Peter Houde and Cai Li and Ho, {Simon Y.W.} and Faircloth, {Brant C.} and Benoit Nabholz and Howard, {Jason T.} and Alexander Suh and Weber, {Claudia C.} and {da Fonseca}, {Rute R.} and Alonzo Alfaro-N{\'u}{\~n}ez and Nitish Narula and Liang Liu and Dave Burt and Hans Ellegren and Edwards, {Scott V.} and Alexandros Stamatakis and Mindell, {David P.} and Joel Cracraft and Braun, {Edward L.} and Tandy Warnow and Wang Jun and Gilbert, {M. Thomas Pius} and Guojie Zhang",
year = "2015",
month = "2",
day = "12",
doi = "10.1186/s13742-014-0038-1",
language = "English (US)",
volume = "4",
journal = "GigaScience",
issn = "2047-217X",
publisher = "BioMed Central",
number = "1",

}

TY - JOUR

T1 - Phylogenomic analyses data of the avian phylogenomics project

AU - The Avian Phylogenomics Consortium

AU - Jarvis, Erich D.

AU - Mirarab, Siavash

AU - Aberer, Andre J.

AU - Li, Bo

AU - Houde, Peter

AU - Li, Cai

AU - Ho, Simon Y.W.

AU - Faircloth, Brant C.

AU - Nabholz, Benoit

AU - Howard, Jason T.

AU - Suh, Alexander

AU - Weber, Claudia C.

AU - da Fonseca, Rute R.

AU - Alfaro-Núñez, Alonzo

AU - Narula, Nitish

AU - Liu, Liang

AU - Burt, Dave

AU - Ellegren, Hans

AU - Edwards, Scott V.

AU - Stamatakis, Alexandros

AU - Mindell, David P.

AU - Cracraft, Joel

AU - Braun, Edward L.

AU - Warnow, Tandy

AU - Jun, Wang

AU - Gilbert, M. Thomas Pius

AU - Zhang, Guojie

PY - 2015/2/12

Y1 - 2015/2/12

N2 - Background: Determining the evolutionary relationships among the major lineages of extant birds has been one of the biggest challenges in systematic biology. To address this challenge, we assembled or collected the genomes of 48 avian species spanning most orders of birds, including all Neognathae and two of the five Palaeognathae orders. We used these genomes to construct a genome-scale avian phylogenetic tree and perform comparative genomic analyses.Findings: Here we present the datasets associated with the phylogenomic analyses, which include sequence alignment files consisting of nucleotides, amino acids, indels, and transposable elements, as well as tree files containing gene trees and species trees. Inferring an accurate phylogeny required generating: 1) A well annotated data set across species based on genome synteny; 2) Alignments with unaligned or incorrectly overaligned sequences filtered out; and 3) Diverse data sets, including genes and their inferred trees, indels, and transposable elements. Our total evidence nucleotide tree (TENT) data set (consisting of exons, introns, and UCEs) gave what we consider our most reliable species tree when using the concatenation-based ExaML algorithm or when using statistical binning with the coalescence-based MP-EST algorithm (which we refer to as MP-EST*). Other data sets, such as the coding sequence of some exons, revealed other properties of genome evolution, namely convergence.Conclusions: The Avian Phylogenomics Project is the largest vertebrate phylogenomics project to date that we are aware of. The sequence, alignment, and tree data are expected to accelerate analyses in phylogenomics and other related areas.

AB - Background: Determining the evolutionary relationships among the major lineages of extant birds has been one of the biggest challenges in systematic biology. To address this challenge, we assembled or collected the genomes of 48 avian species spanning most orders of birds, including all Neognathae and two of the five Palaeognathae orders. We used these genomes to construct a genome-scale avian phylogenetic tree and perform comparative genomic analyses.Findings: Here we present the datasets associated with the phylogenomic analyses, which include sequence alignment files consisting of nucleotides, amino acids, indels, and transposable elements, as well as tree files containing gene trees and species trees. Inferring an accurate phylogeny required generating: 1) A well annotated data set across species based on genome synteny; 2) Alignments with unaligned or incorrectly overaligned sequences filtered out; and 3) Diverse data sets, including genes and their inferred trees, indels, and transposable elements. Our total evidence nucleotide tree (TENT) data set (consisting of exons, introns, and UCEs) gave what we consider our most reliable species tree when using the concatenation-based ExaML algorithm or when using statistical binning with the coalescence-based MP-EST algorithm (which we refer to as MP-EST*). Other data sets, such as the coding sequence of some exons, revealed other properties of genome evolution, namely convergence.Conclusions: The Avian Phylogenomics Project is the largest vertebrate phylogenomics project to date that we are aware of. The sequence, alignment, and tree data are expected to accelerate analyses in phylogenomics and other related areas.

KW - Avian genomes

KW - Gene trees

KW - Indels

KW - Phylogenomics

KW - Sequence alignments

KW - Species tree

KW - Transposable elements

UR - http://www.scopus.com/inward/record.url?scp=84929606489&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84929606489&partnerID=8YFLogxK

U2 - 10.1186/s13742-014-0038-1

DO - 10.1186/s13742-014-0038-1

M3 - Article

C2 - 25741440

AN - SCOPUS:84929606489

VL - 4

JO - GigaScience

JF - GigaScience

SN - 2047-217X

IS - 1

M1 - 4

ER -