Fecal Metagenomics to Identify Biomarkers of Food Intake in Healthy Adults: Findings from Randomized, Controlled, Nutrition Trials

Leila M. Shinn, Aditya Mansharamani, David J. Baer, Janet A. Novotny, Craig S. Charron, Naiman A. Khan, Ruoqing Zhu, Hannah D. Holscher

Research output: Contribution to journalArticlepeer-review


Background: Undigested components of the human diet affect the composition and function of the microorganisms present in the gastrointestinal tract. Techniques like metagenomic analyses allow researchers to study functional capacity, thus revealing the potential of using metagenomic data for developing objective biomarkers of food intake. Objectives: As a continuation of our previous work using 16S and metabolomic datasets, we aimed to utilize a computationally intensive, multivariate, machine-learning approach to identify fecal KEGG (Kyoto encyclopedia of genes and genomes) Orthology (KO) categories as biomarkers that accurately classify food intake. Methods: Data were aggregated from 5 controlled feeding studies that studied the individual impact of almonds, avocados, broccoli, walnuts, barley, and oats on the adult gastrointestinal microbiota. Deoxyribonucleic acid from preintervention and postintervention fecal samples underwent shotgun genomic sequencing. After preprocessing, sequences were aligned and functionally annotated with Double Index AlignMent Of Next-generation sequencing Data v2.0.11.149 and MEtaGenome ANalyzer v6.12.2, respectively. After the count normalization, the log of the fold change ratio for resulting KOs between pre- and postintervention of the treatment group against its corresponding control was utilized to conduct differential abundance analysis. Differentially abundant KOs were used to train machine-learning models examining potential biomarkers in both single-food and multi-food models. Results: We identified differentially abundant KOs in the almond (n = 54), broccoli (n = 2474), and walnut (n = 732) groups (q < 0.20), which demonstrated classification accuracies of 80%, 87%, and 86% for the almond, broccoli, and walnut groups using a random forest model to classify food intake into each food group's respective treatment and control arms, respectively. The mixed-food random forest achieved 81% accuracy. Conclusions: Our findings reveal promise in utilizing fecal metagenomics to objectively complement self-reported measures of food intake. Future research on various foods and dietary patterns will expand these exploratory analyses for eventual use in feeding study compliance and clinical settings.

Original languageEnglish (US)
Pages (from-to)271-283
Number of pages13
JournalJournal of Nutrition
Issue number1
StatePublished - Jan 2024


  • KEGG
  • dietary intake biomarkers
  • gastrointestinal microbiome
  • genomic sequencing
  • machine learning

ASJC Scopus subject areas

  • Nutrition and Dietetics
  • Medicine (miscellaneous)


Dive into the research topics of 'Fecal Metagenomics to Identify Biomarkers of Food Intake in Healthy Adults: Findings from Randomized, Controlled, Nutrition Trials'. Together they form a unique fingerprint.

Cite this