TY - JOUR
T1 - Rapid Targeted Assembly of the Proteome Reveals Evolutionary Variation of GC Content in Avian Lice
AU - Grant, Avery R.
AU - Johnson, Kevin P.
AU - Stanley, Edward L.
AU - Baldwin-Brown, James
AU - Kolenčík, Stanislav
AU - Allen, Julie M.
N1 - The authors thank Sebastian Smith and John Anderson for their help with the computational components of this publication. The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by NSF grants 1925312 to JMA, NSF DEB 1925487 and DEB 1926919 to KPJ. The computational work in this publication was made possible by a grant from the National Institute of General Medical Sciences (GM103440) from the National Institutes of Health. The authors would like to acknowledge the support of Research & Innovation and the Cyberinfrastructure Team in the Office of Information Technology at the University of Nevada, Reno for facilitation and access to the Pronghorn High-Performance Computing Cluster.
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by NSF grants 1925312 to JMA, NSF DEB 1925487 and DEB 1926919 to KPJ. The computational work in this publication was made possible by a grant from the National Institute of General Medical Sciences (GM103440) from the National Institutes of Health. The authors would like to acknowledge the support of Research & Innovation and the Cyberinfrastructure Team in the Office of Information Technology at the University of Nevada, Reno for facilitation and access to the Pronghorn High-Performance Computing Cluster.
PY - 2024/1/1
Y1 - 2024/1/1
N2 - Nucleotide base composition plays an influential role in the molecular mechanisms involved in gene function, phenotype, and amino acid composition. GC content (proportion of guanine and cytosine in DNA sequences) shows a high level of variation within and among species. Many studies measure GC content in a small number of genes, which may not be representative of genome-wide GC variation. One challenge when assembling extensive genomic data sets for these studies is the significant amount of resources (monetary and computational) associated with data processing, and many bioinformatic tools have not been optimized for resource efficiency. Using a high-performance computing (HPC) cluster, we manipulated resources provided to the targeted gene assembly program, automated target restricted assembly method (aTRAM), to determine an optimum way to run the program to maximize resource use. Using our optimum assembly approach, we assembled and measured GC content of all of the protein-coding genes of a diverse group of parasitic feather lice. Of the 499 426 genes assembled across 57 species, feather lice were GC-poor (mean GC = 42.96%) with a significant amount of variation within and between species (GC range = 19.57%-73.33%). We found a significant correlation between GC content and standard deviation per taxon for overall GC and GC3, which could indicate selection for G and C nucleotides in some species. Phylogenetic signal of GC content was detected in both GC and GC3. This research provides a large-scale investigation of GC content in parasitic lice laying the foundation for understanding the basis of variation in base composition across species.
AB - Nucleotide base composition plays an influential role in the molecular mechanisms involved in gene function, phenotype, and amino acid composition. GC content (proportion of guanine and cytosine in DNA sequences) shows a high level of variation within and among species. Many studies measure GC content in a small number of genes, which may not be representative of genome-wide GC variation. One challenge when assembling extensive genomic data sets for these studies is the significant amount of resources (monetary and computational) associated with data processing, and many bioinformatic tools have not been optimized for resource efficiency. Using a high-performance computing (HPC) cluster, we manipulated resources provided to the targeted gene assembly program, automated target restricted assembly method (aTRAM), to determine an optimum way to run the program to maximize resource use. Using our optimum assembly approach, we assembled and measured GC content of all of the protein-coding genes of a diverse group of parasitic feather lice. Of the 499 426 genes assembled across 57 species, feather lice were GC-poor (mean GC = 42.96%) with a significant amount of variation within and between species (GC range = 19.57%-73.33%). We found a significant correlation between GC content and standard deviation per taxon for overall GC and GC3, which could indicate selection for G and C nucleotides in some species. Phylogenetic signal of GC content was detected in both GC and GC3. This research provides a large-scale investigation of GC content in parasitic lice laying the foundation for understanding the basis of variation in base composition across species.
KW - AT (adenine/thymine) rich
KW - base composition
KW - Bioinformatics
KW - computational resource efficiency
KW - feather lice
KW - phylogenetic signal
KW - protein-coding genes
UR - https://www.scopus.com/pages/publications/85195483693
UR - https://www.scopus.com/pages/publications/85195483693#tab=citedBy
U2 - 10.1177/11779322241257991
DO - 10.1177/11779322241257991
M3 - Article
C2 - 38860163
AN - SCOPUS:85195483693
SN - 1177-9322
VL - 18
JO - Bioinformatics and Biology Insights
JF - Bioinformatics and Biology Insights
ER -