Machine learning-aided analyses of thousands of draft genomes reveal specific features of activated sludge processes

Lin Ye, Ran Mei, Wen Tso Liu, Hongqiang Ren, Xu Xiang Zhang

Research output: Contribution to journalArticlepeer-review


Background: Microorganisms in activated sludge (AS) play key roles in the wastewater treatment processes. However, their ecological behaviors and differences from microorganisms in other environments have mainly been studied using the 16S rRNA gene that may not truly represent in situ functions. Results: Here, we present 2045 archaeal and bacterial metagenome-assembled genomes (MAGs) recovered from 1.35 Tb of metagenomic data generated from 114 AS samples of 23 full-scale wastewater treatment plants (WWTPs). We found that the AS MAGs have obvious plant-specific features and that few proteins are shared by different WWTPs, especially for WWTPs located in geographically distant areas. Further, we developed a novel machine learning approach that can distinguish between AS MAGs and MAGs from other environments based on the clusters of orthologous groups of proteins with an accuracy of 96%. With the aid of machine learning, we also identified some functional features (e.g., functions related to aerobic metabolism, nutrient sensing/acquisition, and biofilm formation) that are likely vital for AS bacteria to adapt themselves in wastewater treatment bioreactors. Conclusions: Our work reveals that, although the bacterial species in different municipal WWTPs could be different, they may have similar deterministic functional features that allow them to adapt to the AS systems. Also, we provide valuable genome resources and a novel approach for future investigation and better understanding of the microbiome of AS and other ecosystems.

Original languageEnglish (US)
Article number16
Issue number1
StatePublished - Feb 11 2020


  • Activated sludge
  • Machine learning
  • Metagenomics

ASJC Scopus subject areas

  • Microbiology
  • Microbiology (medical)


Dive into the research topics of 'Machine learning-aided analyses of thousands of draft genomes reveal specific features of activated sludge processes'. Together they form a unique fingerprint.

Cite this