Machine Learning Based Prediction of Enzymatic Degradation of Plastics Using Encoded Protein Sequence and Effective Feature Representation

Renjing Jiang, Lanyu Shang, Ruohan Wang, Dong Wang, Na Wei

Research output: Contribution to journalArticlepeer-review

Abstract

Enzyme biocatalysis for plastic treatment and recycling is an emerging field of growing interest. However, it is challenging and time-consuming to identify plastic-degrading enzymes with desirable functionality, given the large number of putative enzyme sequences. There is a critical need to develop an effective approach to accurately predict the enzyme activity in degrading different types of plastics. In this study, we developed a machine-learning-based plastic enzymatic degradation (PED) framework to predict the ability of an enzyme to degrade plastics of interest by exploring and recognizing hidden patterns in protein sequences. A data set integrating information from a wide range of experimentally verified enzymes and various common plastic substrates was created. A new context-aware enzyme sequence representation (CESR) mechanism was developed to learn the abundant contextual information in enzyme sequences, and feature extraction was performed for enzymes at both the amino acid level and global sequence level. Thirteen machine learning classification algorithms were compared, and XGBoost was identified as the best-performing algorithm. PED achieved an overall accuracy of 90.2% and outperformed sequence-based protein classification models from the existing literature. Furthermore, important enzyme features in plastic degradation were identified and comprehensively interpreted. This study demonstrated a new tool for the prediction and discovery of plastic-degrading enzymes.

Original languageEnglish (US)
Pages (from-to)557-564
Number of pages8
JournalEnvironmental Science and Technology Letters
Volume10
Issue number7
DOIs
StatePublished - Jul 11 2023

Keywords

  • Machine learning
  • enzymatic degradation
  • enzyme function
  • plastic waste
  • sequence representation

ASJC Scopus subject areas

  • Environmental Chemistry
  • Ecology
  • Water Science and Technology
  • Waste Management and Disposal
  • Pollution
  • Health, Toxicology and Mutagenesis

Fingerprint

Dive into the research topics of 'Machine Learning Based Prediction of Enzymatic Degradation of Plastics Using Encoded Protein Sequence and Effective Feature Representation'. Together they form a unique fingerprint.

Cite this