TY - JOUR
T1 - TLmutation
T2 - Predicting the Effects of Mutations Using Transfer Learning
AU - Shamsi, Zahra
AU - Chan, Matthew
AU - Shukla, Diwakar
N1 - Publisher Copyright:
Copyright © 2020 American Chemical Society.
PY - 2020/5/14
Y1 - 2020/5/14
N2 - A reccurring challenge in bioinformatics is predicting the phenotypic consequence of amino acid variation in proteins. With the recent advancements in sequencing techniques, sufficient genomic data has become available to train models that predict the evolutionary statistical energies, but there is still inadequate experimental data to directly predict functional effects. One approach to overcome this data scarcity is to apply transfer learning and train more models with available data sets. In this study, we propose a set of transfer learning algorithms we call TLmutation, which implements a supervised transfer learning algorithm that transfers knowledge from survival data of a protein to a particular function of that protein. This is followed by an unsupervised transfer learning algorithm that extends the knowledge to a homologous protein. We explore the application of our algorithms in three cases. First, we test the supervised transfer on 17 previously published deep mutagenesis data sets to complete and refine missing data points. We further investigate these data sets to identify which mutations build better predictors of variant functions. In the second case, we apply the algorithm to predict higher-order mutations solely from single point mutagenesis data. Finally, we perform the unsupervised transfer learning algorithm to predict mutational effects of homologous proteins from experimental data sets. These algorithms are generalized to transfer knowledge between Markov random field models. We show the benefit of our transfer learning algorithms to utilize informative deep mutational data and provide new insights into protein variant functions. As these algorithms are generalized to transfer knowledge between Markov random field models, we expect these algorithms to be applicable to other disciplines.
AB - A reccurring challenge in bioinformatics is predicting the phenotypic consequence of amino acid variation in proteins. With the recent advancements in sequencing techniques, sufficient genomic data has become available to train models that predict the evolutionary statistical energies, but there is still inadequate experimental data to directly predict functional effects. One approach to overcome this data scarcity is to apply transfer learning and train more models with available data sets. In this study, we propose a set of transfer learning algorithms we call TLmutation, which implements a supervised transfer learning algorithm that transfers knowledge from survival data of a protein to a particular function of that protein. This is followed by an unsupervised transfer learning algorithm that extends the knowledge to a homologous protein. We explore the application of our algorithms in three cases. First, we test the supervised transfer on 17 previously published deep mutagenesis data sets to complete and refine missing data points. We further investigate these data sets to identify which mutations build better predictors of variant functions. In the second case, we apply the algorithm to predict higher-order mutations solely from single point mutagenesis data. Finally, we perform the unsupervised transfer learning algorithm to predict mutational effects of homologous proteins from experimental data sets. These algorithms are generalized to transfer knowledge between Markov random field models. We show the benefit of our transfer learning algorithms to utilize informative deep mutational data and provide new insights into protein variant functions. As these algorithms are generalized to transfer knowledge between Markov random field models, we expect these algorithms to be applicable to other disciplines.
UR - http://www.scopus.com/inward/record.url?scp=85084696111&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85084696111&partnerID=8YFLogxK
U2 - 10.1021/acs.jpcb.0c00197
DO - 10.1021/acs.jpcb.0c00197
M3 - Article
C2 - 32308006
AN - SCOPUS:85084696111
SN - 1520-6106
VL - 124
SP - 3845
EP - 3854
JO - Journal of Physical Chemistry B
JF - Journal of Physical Chemistry B
IS - 19
ER -