TY - GEN
T1 - Transforming the Language of Life
T2 - 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, BCB 2020
AU - Nambiar, Ananthan
AU - Heflin, Maeve
AU - Liu, Simon
AU - Maslov, Sergei
AU - Hopkins, Mark
AU - Ritz, Anna
N1 - Funding Information:
This work has been supported by the National Science Foundation (awards #1750981 and #1725729). This work has also been partially supported by the Google Cloud Platform research credits program (to AR, MH, and AN). AN would like to thank Mark Bedau, Norman Packard and the Reed College Artificial Life Lab for insightful discussions and Desiree Odgers for inspiring the idea of taking a linguistic approach to a biological problem.
PY - 2020/9/21
Y1 - 2020/9/21
N2 - The scientific community is rapidly generating protein sequence information, but only a fraction of these proteins can be experimentally characterized. While promising deep learning approaches for protein prediction tasks have emerged, they have computational limitations or are designed to solve a specific task. We present a Transformer neural network that pre-Trains task-Agnostic sequence representations. This model is fine-Tuned to solve two different protein prediction tasks: protein family classification and protein interaction prediction. Our method is comparable to existing state-of-The-Art approaches for protein family classification while being much more general than other architectures. Further, our method outperforms all other approaches for protein interaction prediction. These results offer a promising framework for fine-Tuning the pre-Trained sequence representations for other protein prediction tasks.
AB - The scientific community is rapidly generating protein sequence information, but only a fraction of these proteins can be experimentally characterized. While promising deep learning approaches for protein prediction tasks have emerged, they have computational limitations or are designed to solve a specific task. We present a Transformer neural network that pre-Trains task-Agnostic sequence representations. This model is fine-Tuned to solve two different protein prediction tasks: protein family classification and protein interaction prediction. Our method is comparable to existing state-of-The-Art approaches for protein family classification while being much more general than other architectures. Further, our method outperforms all other approaches for protein interaction prediction. These results offer a promising framework for fine-Tuning the pre-Trained sequence representations for other protein prediction tasks.
KW - Neural networks
KW - protein family classification
KW - protein-protein interaction prediction
KW - COVID-19
UR - http://www.scopus.com/inward/record.url?scp=85096966907&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85096966907&partnerID=8YFLogxK
U2 - 10.1101/2020.06.15.153643
DO - 10.1101/2020.06.15.153643
M3 - Conference contribution
AN - SCOPUS:85096966907
T3 - Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, BCB 2020
BT - Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, BCB 2020
PB - Association for Computing Machinery, Inc
Y2 - 21 September 2020 through 24 September 2020
ER -