TY - JOUR
T1 - Recent Advances in Machine Learning Variant Effect Prediction Tools for Protein Engineering
AU - Horne, Jesse
AU - Shukla, Diwakar
N1 - Publisher Copyright:
© 2022 American Chemical Society
PY - 2022/5/18
Y1 - 2022/5/18
N2 - Proteins are Nature’s molecular machinery and comprise diverse roles while consisting of chemically similar building blocks. In recent years, protein engineering and design have become important research areas, with many applications in the pharmaceutical, energy, and biocatalysis fields, among others─where the aim is to ultimately create a protein given desired structural and functional properties. It is often critical to model the relationship between a protein’s sequence, folded structure, and biological function to assist in such protein engineering pursuits. However, significant challenges remain in concretely mapping an amino acid sequence to specific protein properties and biological activities. Mutations may enhance or diminish molecular protein function, and the epistatic interactions between mutations result in an inherently complex mapping between genetic modifications and protein function. Therefore, estimating the quantitative effects of mutations on protein function(s) remains a grand challenge of biology, bioinformatics, and many related fields and would rapidly accelerate protein engineering tasks when successful. Such estimation is often known as variant effect prediction (VEP). However, progress has been demonstrated in recent years with the development of machine learning (ML) methods in modeling the relationship between mutations and protein function. In this Review, recent advances in variant effect prediction (VEP) are discussed as tools for protein engineering, focusing on techniques incorporating gains from the broader ML community and challenges in estimating biomolecular functional differences. Primary developments highlighted include convolutional neural networks, graph neural networks, and natural language embeddings for protein sequences.
AB - Proteins are Nature’s molecular machinery and comprise diverse roles while consisting of chemically similar building blocks. In recent years, protein engineering and design have become important research areas, with many applications in the pharmaceutical, energy, and biocatalysis fields, among others─where the aim is to ultimately create a protein given desired structural and functional properties. It is often critical to model the relationship between a protein’s sequence, folded structure, and biological function to assist in such protein engineering pursuits. However, significant challenges remain in concretely mapping an amino acid sequence to specific protein properties and biological activities. Mutations may enhance or diminish molecular protein function, and the epistatic interactions between mutations result in an inherently complex mapping between genetic modifications and protein function. Therefore, estimating the quantitative effects of mutations on protein function(s) remains a grand challenge of biology, bioinformatics, and many related fields and would rapidly accelerate protein engineering tasks when successful. Such estimation is often known as variant effect prediction (VEP). However, progress has been demonstrated in recent years with the development of machine learning (ML) methods in modeling the relationship between mutations and protein function. In this Review, recent advances in variant effect prediction (VEP) are discussed as tools for protein engineering, focusing on techniques incorporating gains from the broader ML community and challenges in estimating biomolecular functional differences. Primary developments highlighted include convolutional neural networks, graph neural networks, and natural language embeddings for protein sequences.
UR - http://www.scopus.com/inward/record.url?scp=85128746307&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85128746307&partnerID=8YFLogxK
U2 - 10.1021/acs.iecr.1c04943
DO - 10.1021/acs.iecr.1c04943
M3 - Review article
C2 - 36051311
AN - SCOPUS:85128746307
SN - 0888-5885
VL - 61
SP - 6235
EP - 6245
JO - Industrial and Engineering Chemistry Research
JF - Industrial and Engineering Chemistry Research
IS - 19
ER -