TY - JOUR
T1 - Nitrogen Cycling Microbial Diversity and Operational Taxonomic Unit Clustering
T2 - When to Prioritize Accuracy Over Speed
AU - Egenriether, Sada
AU - Sanford, Robert
AU - Yang, Wendy H.
AU - Kent, Angela D.
N1 - Publisher Copyright:
Copyright © 2022 Egenriether, Sanford, Yang and Kent.
PY - 2022/5/26
Y1 - 2022/5/26
N2 - Background: Assessments of the soil microbiome provide valuable insight to ecosystem function due to the integral role microorganisms play in biogeochemical cycling of carbon and nutrients. For example, treatment effects on nitrogen cycling functional groups are often presented alongside one another to demonstrate how agricultural management practices affect various nitrogen cycling processes. However, the functional groups commonly evaluated in nitrogen cycling microbiome studies range from phylogenetically narrow (e.g., N-fixation, nitrification) to broad [e.g., denitrification, dissimilatory nitrate reduction to ammonium (DNRA)]. The bioinformatics methods used in such studies were developed for 16S rRNA gene sequence data, and how these tools perform across functional genes of different phylogenetic diversity has not been established. For example, an OTU clustering method that can accurately characterize sequences harboring comparatively little diversity may not accurately resolve the diversity within a gene comprised of a large number of clades. This study uses two nitrogen cycling genes, nifH, a gene which segregates into only three distinct clades, and nrfA, a gene which is comprised of at least eighteen clades, to investigate differences which may arise when using heuristic OTU clustering (abundance-based greedy clustering, AGC) vs. true hierarchical OTU clustering (Matthews Correlation Coefficient optimizing algorithm, Opti-MCC). Detection of treatment differences for each gene were evaluated to demonstrate how conclusions drawn from a given dataset may differ depending on clustering method used. Results: The heuristic and hierarchical methods performed comparably for the more conserved gene, nifH. The hierarchical method outperformed the heuristic method for the more diverse gene, nrfA; this included both the ability to detect treatment differences using PERMANOVA, as well as higher resolution in taxonomic classification. The difference in performance between the two methods may be traced to the AGC method’s preferential assignment of sequences to the most abundant OTUs: when analysis was limited to only the largest 100 OTUs, results from the AGC-assembled OTU table more closely resembled those of the Opti-MCC OTU table. Additionally, both AGC and Opti-MCC OTU tables detected comparable treatment differences using the rank-based ANOSIM test. This demonstrates that treatment differences were preserved using both clustering methods but were structured differently within the OTU tables produced using each method. Conclusion: For questions which can be answered using tests agnostic to clustering method (e.g., ANOSIM), or for genes of relatively low phylogenetic diversity (e.g., nifH), most upstream processing methods should lead to similar conclusions from downstream analyses. For studies involving more diverse genes, however, care should be exercised to choose methods that ensure accurate clustering for all genes. This will mitigate the risk of introducing Type II errors by allowing for detection of comparable treatment differences for all genes assessed, rather than disproportionately detecting treatment differences in only low-diversity genes.
AB - Background: Assessments of the soil microbiome provide valuable insight to ecosystem function due to the integral role microorganisms play in biogeochemical cycling of carbon and nutrients. For example, treatment effects on nitrogen cycling functional groups are often presented alongside one another to demonstrate how agricultural management practices affect various nitrogen cycling processes. However, the functional groups commonly evaluated in nitrogen cycling microbiome studies range from phylogenetically narrow (e.g., N-fixation, nitrification) to broad [e.g., denitrification, dissimilatory nitrate reduction to ammonium (DNRA)]. The bioinformatics methods used in such studies were developed for 16S rRNA gene sequence data, and how these tools perform across functional genes of different phylogenetic diversity has not been established. For example, an OTU clustering method that can accurately characterize sequences harboring comparatively little diversity may not accurately resolve the diversity within a gene comprised of a large number of clades. This study uses two nitrogen cycling genes, nifH, a gene which segregates into only three distinct clades, and nrfA, a gene which is comprised of at least eighteen clades, to investigate differences which may arise when using heuristic OTU clustering (abundance-based greedy clustering, AGC) vs. true hierarchical OTU clustering (Matthews Correlation Coefficient optimizing algorithm, Opti-MCC). Detection of treatment differences for each gene were evaluated to demonstrate how conclusions drawn from a given dataset may differ depending on clustering method used. Results: The heuristic and hierarchical methods performed comparably for the more conserved gene, nifH. The hierarchical method outperformed the heuristic method for the more diverse gene, nrfA; this included both the ability to detect treatment differences using PERMANOVA, as well as higher resolution in taxonomic classification. The difference in performance between the two methods may be traced to the AGC method’s preferential assignment of sequences to the most abundant OTUs: when analysis was limited to only the largest 100 OTUs, results from the AGC-assembled OTU table more closely resembled those of the Opti-MCC OTU table. Additionally, both AGC and Opti-MCC OTU tables detected comparable treatment differences using the rank-based ANOSIM test. This demonstrates that treatment differences were preserved using both clustering methods but were structured differently within the OTU tables produced using each method. Conclusion: For questions which can be answered using tests agnostic to clustering method (e.g., ANOSIM), or for genes of relatively low phylogenetic diversity (e.g., nifH), most upstream processing methods should lead to similar conclusions from downstream analyses. For studies involving more diverse genes, however, care should be exercised to choose methods that ensure accurate clustering for all genes. This will mitigate the risk of introducing Type II errors by allowing for detection of comparable treatment differences for all genes assessed, rather than disproportionately detecting treatment differences in only low-diversity genes.
KW - OTU clustering
KW - bioinformatics
KW - dissimilatory nitrate reduction to ammonium
KW - microbial ecology
KW - microbiome
KW - mother
KW - nitrogen cycling
KW - nitrogen fixation
UR - http://www.scopus.com/inward/record.url?scp=85132794932&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85132794932&partnerID=8YFLogxK
U2 - 10.3389/fmicb.2022.730340
DO - 10.3389/fmicb.2022.730340
M3 - Article
C2 - 35722279
AN - SCOPUS:85132794932
SN - 1664-302X
VL - 13
JO - Frontiers in Microbiology
JF - Frontiers in Microbiology
M1 - 730340
ER -