Rapid Targeted Assembly of the Proteome Reveals Evolutionary Variation of GC Content in Avian Lice

Avery R. Grant, Kevin P. Johnson, Edward L. Stanley, James Baldwin-Brown, Stanislav Kolenčík, Julie M. Allen

Research output: Contribution to journalArticlepeer-review

Abstract

Nucleotide base composition plays an influential role in the molecular mechanisms involved in gene function, phenotype, and amino acid composition. GC content (proportion of guanine and cytosine in DNA sequences) shows a high level of variation within and among species. Many studies measure GC content in a small number of genes, which may not be representative of genome-wide GC variation. One challenge when assembling extensive genomic data sets for these studies is the significant amount of resources (monetary and computational) associated with data processing, and many bioinformatic tools have not been optimized for resource efficiency. Using a high-performance computing (HPC) cluster, we manipulated resources provided to the targeted gene assembly program, automated target restricted assembly method (aTRAM), to determine an optimum way to run the program to maximize resource use. Using our optimum assembly approach, we assembled and measured GC content of all of the protein-coding genes of a diverse group of parasitic feather lice. Of the 499 426 genes assembled across 57 species, feather lice were GC-poor (mean GC = 42.96%) with a significant amount of variation within and between species (GC range = 19.57%-73.33%). We found a significant correlation between GC content and standard deviation per taxon for overall GC and GC3, which could indicate selection for G and C nucleotides in some species. Phylogenetic signal of GC content was detected in both GC and GC3. This research provides a large-scale investigation of GC content in parasitic lice laying the foundation for understanding the basis of variation in base composition across species.

Original languageEnglish (US)
JournalBioinformatics and Biology Insights
Volume18
DOIs
StatePublished - Jan 1 2024
Externally publishedYes

Keywords

  • AT (adenine/thymine) rich
  • base composition
  • Bioinformatics
  • computational resource efficiency
  • feather lice
  • phylogenetic signal
  • protein-coding genes

ASJC Scopus subject areas

  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Mathematics
  • Applied Mathematics

Fingerprint

Dive into the research topics of 'Rapid Targeted Assembly of the Proteome Reveals Evolutionary Variation of GC Content in Avian Lice'. Together they form a unique fingerprint.

Cite this