TY - JOUR
T1 - ECNet is an evolutionary context-integrated deep learning framework for protein engineering
AU - Luo, Yunan
AU - Jiang, Guangde
AU - Yu, Tianhao
AU - Liu, Yang
AU - Vo, Lam
AU - Ding, Hantian
AU - Su, Yufeng
AU - Qian, Wesley Wei
AU - Zhao, Huimin
AU - Peng, Jian
N1 - Funding Information:
This work was supported by U.S. National Science Foundation under grant no. 2019897 (H.Z. and J.P.) and U.S. Department of Energy award DE-SC0018420 (H.Z. and J.P.). J.P. acknowledges the support from the Sloan Research Fellowship and the NSF CAREER Award. Y. Luo acknowledges the support from the CompGen Fellowship. BioRender.com was used to generate part of Fig. 5a.
Publisher Copyright:
© 2021, The Author(s).
PY - 2021/12/1
Y1 - 2021/12/1
N2 - Machine learning has been increasingly used for protein engineering. However, because the general sequence contexts they capture are not specific to the protein being engineered, the accuracy of existing machine learning algorithms is rather limited. Here, we report ECNet (evolutionary context-integrated neural network), a deep-learning algorithm that exploits evolutionary contexts to predict functional fitness for protein engineering. This algorithm integrates local evolutionary context from homologous sequences that explicitly model residue-residue epistasis for the protein of interest with the global evolutionary context that encodes rich semantic and structural features from the enormous protein sequence universe. As such, it enables accurate mapping from sequence to function and provides generalization from low-order mutants to higher-order mutants. We show that ECNet predicts the sequence-function relationship more accurately as compared to existing machine learning algorithms by using ~50 deep mutational scanning and random mutagenesis datasets. Moreover, we used ECNet to guide the engineering of TEM-1 β-lactamase and identified variants with improved ampicillin resistance with high success rates.
AB - Machine learning has been increasingly used for protein engineering. However, because the general sequence contexts they capture are not specific to the protein being engineered, the accuracy of existing machine learning algorithms is rather limited. Here, we report ECNet (evolutionary context-integrated neural network), a deep-learning algorithm that exploits evolutionary contexts to predict functional fitness for protein engineering. This algorithm integrates local evolutionary context from homologous sequences that explicitly model residue-residue epistasis for the protein of interest with the global evolutionary context that encodes rich semantic and structural features from the enormous protein sequence universe. As such, it enables accurate mapping from sequence to function and provides generalization from low-order mutants to higher-order mutants. We show that ECNet predicts the sequence-function relationship more accurately as compared to existing machine learning algorithms by using ~50 deep mutational scanning and random mutagenesis datasets. Moreover, we used ECNet to guide the engineering of TEM-1 β-lactamase and identified variants with improved ampicillin resistance with high success rates.
UR - http://www.scopus.com/inward/record.url?scp=85116327653&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85116327653&partnerID=8YFLogxK
U2 - 10.1038/s41467-021-25976-8
DO - 10.1038/s41467-021-25976-8
M3 - Article
C2 - 34593817
AN - SCOPUS:85116327653
SN - 2041-1723
VL - 12
JO - Nature communications
JF - Nature communications
IS - 1
M1 - 5743
ER -