Nearly complete rRNA genes from 371 Animalia: Updated structure-based alignment and detailed phylogenetic analysis

Jon Mallatt, Catherine Waggoner Craig, Matthew Jon Yoder

Research output: Contribution to journalArticle

Abstract

This study presents a manually constructed alignment of nearly complete rRNA genes from most animal clades (371 taxa from ~33 of the ~36 metazoan phyla), expanded from the 197 sequences in a previous study. This thorough, taxon-rich alignment, available at http://www.wsu.edu/~jmallatt/research/rRNAalignment.html and in the Dryad Repository (doi: http://dx.doi.org/10.5061/dryad.1v62kr3q), is based rigidly on the secondary structure of the SSU and LSU rRNA molecules, and is annotated in detail, including labeling of the erroneous sequences (contaminants). The alignment can be used for future studies of the molecular evolution of rRNA. Here, we use it to explore if the larger number of sequences produces an improved phylogenetic tree of animal relationships. Disappointingly, the resolution did not improve, neither when the standard maximum-likelihood method was used, nor with more sophisticated methods that partitioned the rRNA into paired and unpaired sites (stem, loop, bulge, junction), or accounted for the evolution of the paired sites. For example, no doublet model of paired-site substitutions (16-state, 16A and 16B, 7A-F, or 6A-C models) corrected the placement of any rogue taxa or increased resolution. The following findings are from the simplest, standard, ML analysis. The 371-taxon tree only imperfectly supported the bilaterian clades of Lophotrochozoa and Ecdysozoa, and this problem remained after 17 taxa with unstably positioned sequences were omitted from the analysis. The problem seems to stem from base-compositional heterogeneity across taxa and from an overrepresentation of highly divergent sequences among the newly added taxa (e.g., sequences from Cephalopoda, Rotifera, Acoela, and Myxozoa). The rogue taxa continue to concentrate in two locations in the rRNA tree: near the base of Arthropoda and of Bilateria. The approximately uncertain (AU) test refuted the monophyly of Mollusca and of Chordata, probably due to long-branch attraction of the highly divergent cephalopod and urochordate sequences out of those clades. Unlikely to be correct, these refutations show for the first time that rRNA phylogeny can support some 'wrong' clades. Along with its weaknesses, the rRNA tree has strengths: It recovers many clades that are supported by independent evidence (e.g., Metazoa, Bilateria, Hexapoda, Nonoculata, Ambulacraria, Syndermata, and Thecostraca with Malacostraca) and shows good resolution within certain groups (e.g., in Platyhelminthes, Insecta, Cnidaria). As another strength, the newly added rRNA sequences yielded the first rRNA-based support for Carnivora and Cetartiodactyla (dolphin. +. llama) in Mammalia, for basic subdivisions of Bryozoa ('Gymnolaemata. +. Stenolaemata' versus Phylactolaemata), and for Oligostraca (ostracods. +. branchiurans. +. pentastomids. +. mystacocarids). Future improvement could come from better sequence-evolution models that account for base-compositional heterogeneity, and from combining rRNA with protein-coding genes in phylogenetic reconstruction.

Original languageEnglish (US)
Pages (from-to)603-617
Number of pages15
JournalMolecular Phylogenetics and Evolution
Volume64
Issue number3
DOIs
StatePublished - Sep 1 2012
Externally publishedYes

Fingerprint

Animalia
rRNA Genes
Cephalopoda
ribosomal RNA
phylogenetics
gene
Bryozoa
Rotifera
phylogeny
Myxozoa
Carnivora
Cnidaria
Chordata
New World Camelids
Dolphins
Crustacea
Urochordata
genes
Molecular Evolution
secondary structure

Keywords

  • Alignment
  • Animalia
  • LSU rRNA
  • PHASE program
  • SSU (18S) rRNA
  • Secondary structure substitution models

ASJC Scopus subject areas

  • Ecology, Evolution, Behavior and Systematics
  • Molecular Biology
  • Genetics

Cite this

Nearly complete rRNA genes from 371 Animalia : Updated structure-based alignment and detailed phylogenetic analysis. / Mallatt, Jon; Craig, Catherine Waggoner; Yoder, Matthew Jon.

In: Molecular Phylogenetics and Evolution, Vol. 64, No. 3, 01.09.2012, p. 603-617.

Research output: Contribution to journalArticle

@article{dc78a54adc53487387a49d6ced309fae,
title = "Nearly complete rRNA genes from 371 Animalia: Updated structure-based alignment and detailed phylogenetic analysis",
abstract = "This study presents a manually constructed alignment of nearly complete rRNA genes from most animal clades (371 taxa from ~33 of the ~36 metazoan phyla), expanded from the 197 sequences in a previous study. This thorough, taxon-rich alignment, available at http://www.wsu.edu/~jmallatt/research/rRNAalignment.html and in the Dryad Repository (doi: http://dx.doi.org/10.5061/dryad.1v62kr3q), is based rigidly on the secondary structure of the SSU and LSU rRNA molecules, and is annotated in detail, including labeling of the erroneous sequences (contaminants). The alignment can be used for future studies of the molecular evolution of rRNA. Here, we use it to explore if the larger number of sequences produces an improved phylogenetic tree of animal relationships. Disappointingly, the resolution did not improve, neither when the standard maximum-likelihood method was used, nor with more sophisticated methods that partitioned the rRNA into paired and unpaired sites (stem, loop, bulge, junction), or accounted for the evolution of the paired sites. For example, no doublet model of paired-site substitutions (16-state, 16A and 16B, 7A-F, or 6A-C models) corrected the placement of any rogue taxa or increased resolution. The following findings are from the simplest, standard, ML analysis. The 371-taxon tree only imperfectly supported the bilaterian clades of Lophotrochozoa and Ecdysozoa, and this problem remained after 17 taxa with unstably positioned sequences were omitted from the analysis. The problem seems to stem from base-compositional heterogeneity across taxa and from an overrepresentation of highly divergent sequences among the newly added taxa (e.g., sequences from Cephalopoda, Rotifera, Acoela, and Myxozoa). The rogue taxa continue to concentrate in two locations in the rRNA tree: near the base of Arthropoda and of Bilateria. The approximately uncertain (AU) test refuted the monophyly of Mollusca and of Chordata, probably due to long-branch attraction of the highly divergent cephalopod and urochordate sequences out of those clades. Unlikely to be correct, these refutations show for the first time that rRNA phylogeny can support some 'wrong' clades. Along with its weaknesses, the rRNA tree has strengths: It recovers many clades that are supported by independent evidence (e.g., Metazoa, Bilateria, Hexapoda, Nonoculata, Ambulacraria, Syndermata, and Thecostraca with Malacostraca) and shows good resolution within certain groups (e.g., in Platyhelminthes, Insecta, Cnidaria). As another strength, the newly added rRNA sequences yielded the first rRNA-based support for Carnivora and Cetartiodactyla (dolphin. +. llama) in Mammalia, for basic subdivisions of Bryozoa ('Gymnolaemata. +. Stenolaemata' versus Phylactolaemata), and for Oligostraca (ostracods. +. branchiurans. +. pentastomids. +. mystacocarids). Future improvement could come from better sequence-evolution models that account for base-compositional heterogeneity, and from combining rRNA with protein-coding genes in phylogenetic reconstruction.",
keywords = "Alignment, Animalia, LSU rRNA, PHASE program, SSU (18S) rRNA, Secondary structure substitution models",
author = "Jon Mallatt and Craig, {Catherine Waggoner} and Yoder, {Matthew Jon}",
year = "2012",
month = "9",
day = "1",
doi = "10.1016/j.ympev.2012.05.016",
language = "English (US)",
volume = "64",
pages = "603--617",
journal = "Molecular Phylogenetics and Evolution",
issn = "1055-7903",
publisher = "Academic Press Inc.",
number = "3",

}

TY - JOUR

T1 - Nearly complete rRNA genes from 371 Animalia

T2 - Updated structure-based alignment and detailed phylogenetic analysis

AU - Mallatt, Jon

AU - Craig, Catherine Waggoner

AU - Yoder, Matthew Jon

PY - 2012/9/1

Y1 - 2012/9/1

N2 - This study presents a manually constructed alignment of nearly complete rRNA genes from most animal clades (371 taxa from ~33 of the ~36 metazoan phyla), expanded from the 197 sequences in a previous study. This thorough, taxon-rich alignment, available at http://www.wsu.edu/~jmallatt/research/rRNAalignment.html and in the Dryad Repository (doi: http://dx.doi.org/10.5061/dryad.1v62kr3q), is based rigidly on the secondary structure of the SSU and LSU rRNA molecules, and is annotated in detail, including labeling of the erroneous sequences (contaminants). The alignment can be used for future studies of the molecular evolution of rRNA. Here, we use it to explore if the larger number of sequences produces an improved phylogenetic tree of animal relationships. Disappointingly, the resolution did not improve, neither when the standard maximum-likelihood method was used, nor with more sophisticated methods that partitioned the rRNA into paired and unpaired sites (stem, loop, bulge, junction), or accounted for the evolution of the paired sites. For example, no doublet model of paired-site substitutions (16-state, 16A and 16B, 7A-F, or 6A-C models) corrected the placement of any rogue taxa or increased resolution. The following findings are from the simplest, standard, ML analysis. The 371-taxon tree only imperfectly supported the bilaterian clades of Lophotrochozoa and Ecdysozoa, and this problem remained after 17 taxa with unstably positioned sequences were omitted from the analysis. The problem seems to stem from base-compositional heterogeneity across taxa and from an overrepresentation of highly divergent sequences among the newly added taxa (e.g., sequences from Cephalopoda, Rotifera, Acoela, and Myxozoa). The rogue taxa continue to concentrate in two locations in the rRNA tree: near the base of Arthropoda and of Bilateria. The approximately uncertain (AU) test refuted the monophyly of Mollusca and of Chordata, probably due to long-branch attraction of the highly divergent cephalopod and urochordate sequences out of those clades. Unlikely to be correct, these refutations show for the first time that rRNA phylogeny can support some 'wrong' clades. Along with its weaknesses, the rRNA tree has strengths: It recovers many clades that are supported by independent evidence (e.g., Metazoa, Bilateria, Hexapoda, Nonoculata, Ambulacraria, Syndermata, and Thecostraca with Malacostraca) and shows good resolution within certain groups (e.g., in Platyhelminthes, Insecta, Cnidaria). As another strength, the newly added rRNA sequences yielded the first rRNA-based support for Carnivora and Cetartiodactyla (dolphin. +. llama) in Mammalia, for basic subdivisions of Bryozoa ('Gymnolaemata. +. Stenolaemata' versus Phylactolaemata), and for Oligostraca (ostracods. +. branchiurans. +. pentastomids. +. mystacocarids). Future improvement could come from better sequence-evolution models that account for base-compositional heterogeneity, and from combining rRNA with protein-coding genes in phylogenetic reconstruction.

AB - This study presents a manually constructed alignment of nearly complete rRNA genes from most animal clades (371 taxa from ~33 of the ~36 metazoan phyla), expanded from the 197 sequences in a previous study. This thorough, taxon-rich alignment, available at http://www.wsu.edu/~jmallatt/research/rRNAalignment.html and in the Dryad Repository (doi: http://dx.doi.org/10.5061/dryad.1v62kr3q), is based rigidly on the secondary structure of the SSU and LSU rRNA molecules, and is annotated in detail, including labeling of the erroneous sequences (contaminants). The alignment can be used for future studies of the molecular evolution of rRNA. Here, we use it to explore if the larger number of sequences produces an improved phylogenetic tree of animal relationships. Disappointingly, the resolution did not improve, neither when the standard maximum-likelihood method was used, nor with more sophisticated methods that partitioned the rRNA into paired and unpaired sites (stem, loop, bulge, junction), or accounted for the evolution of the paired sites. For example, no doublet model of paired-site substitutions (16-state, 16A and 16B, 7A-F, or 6A-C models) corrected the placement of any rogue taxa or increased resolution. The following findings are from the simplest, standard, ML analysis. The 371-taxon tree only imperfectly supported the bilaterian clades of Lophotrochozoa and Ecdysozoa, and this problem remained after 17 taxa with unstably positioned sequences were omitted from the analysis. The problem seems to stem from base-compositional heterogeneity across taxa and from an overrepresentation of highly divergent sequences among the newly added taxa (e.g., sequences from Cephalopoda, Rotifera, Acoela, and Myxozoa). The rogue taxa continue to concentrate in two locations in the rRNA tree: near the base of Arthropoda and of Bilateria. The approximately uncertain (AU) test refuted the monophyly of Mollusca and of Chordata, probably due to long-branch attraction of the highly divergent cephalopod and urochordate sequences out of those clades. Unlikely to be correct, these refutations show for the first time that rRNA phylogeny can support some 'wrong' clades. Along with its weaknesses, the rRNA tree has strengths: It recovers many clades that are supported by independent evidence (e.g., Metazoa, Bilateria, Hexapoda, Nonoculata, Ambulacraria, Syndermata, and Thecostraca with Malacostraca) and shows good resolution within certain groups (e.g., in Platyhelminthes, Insecta, Cnidaria). As another strength, the newly added rRNA sequences yielded the first rRNA-based support for Carnivora and Cetartiodactyla (dolphin. +. llama) in Mammalia, for basic subdivisions of Bryozoa ('Gymnolaemata. +. Stenolaemata' versus Phylactolaemata), and for Oligostraca (ostracods. +. branchiurans. +. pentastomids. +. mystacocarids). Future improvement could come from better sequence-evolution models that account for base-compositional heterogeneity, and from combining rRNA with protein-coding genes in phylogenetic reconstruction.

KW - Alignment

KW - Animalia

KW - LSU rRNA

KW - PHASE program

KW - SSU (18S) rRNA

KW - Secondary structure substitution models

UR - http://www.scopus.com/inward/record.url?scp=84863108240&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84863108240&partnerID=8YFLogxK

U2 - 10.1016/j.ympev.2012.05.016

DO - 10.1016/j.ympev.2012.05.016

M3 - Article

C2 - 22641172

AN - SCOPUS:84863108240

VL - 64

SP - 603

EP - 617

JO - Molecular Phylogenetics and Evolution

JF - Molecular Phylogenetics and Evolution

SN - 1055-7903

IS - 3

ER -