Abstract
Enzyme biocatalysis for plastic treatment and recycling is an emerging field of growing interest. However, it is challenging and time-consuming to identify plastic-degrading enzymes with desirable functionality, given the large number of putative enzyme sequences. There is a critical need to develop an effective approach to accurately predict the enzyme activity in degrading different types of plastics. In this study, we developed a machine-learning-based plastic enzymatic degradation (PED) framework to predict the ability of an enzyme to degrade plastics of interest by exploring and recognizing hidden patterns in protein sequences. A data set integrating information from a wide range of experimentally verified enzymes and various common plastic substrates was created. A new context-aware enzyme sequence representation (CESR) mechanism was developed to learn the abundant contextual information in enzyme sequences, and feature extraction was performed for enzymes at both the amino acid level and global sequence level. Thirteen machine learning classification algorithms were compared, and XGBoost was identified as the best-performing algorithm. PED achieved an overall accuracy of 90.2% and outperformed sequence-based protein classification models from the existing literature. Furthermore, important enzyme features in plastic degradation were identified and comprehensively interpreted. This study demonstrated a new tool for the prediction and discovery of plastic-degrading enzymes.
Original language | English (US) |
---|---|
Pages (from-to) | 557-564 |
Number of pages | 8 |
Journal | Environmental Science and Technology Letters |
Volume | 10 |
Issue number | 7 |
DOIs | |
State | Published - Jul 11 2023 |
Keywords
- Machine learning
- enzymatic degradation
- enzyme function
- plastic waste
- sequence representation
ASJC Scopus subject areas
- Environmental Chemistry
- Ecology
- Water Science and Technology
- Waste Management and Disposal
- Pollution
- Health, Toxicology and Mutagenesis