Abstract
Despite their lack of a rigid structure, intrinsically disordered regions (IDRs) in proteins play important roles in cellular functions, including mediating protein-protein interactions. Therefore, it is important to computationally annotate IDRs with high accuracy. In this study, we present Disordered Region prediction using Bidirectional Encoder Representations from Transformers (DR-BERT), a compact protein language model. Unlike most popular tools, DR-BERT is pretrained on unannotated proteins and trained to predict IDRs without relying on explicit evolutionary or biophysical data. Despite this, DR-BERT demonstrates significant improvement over existing methods on the Critical Assessment of protein Intrinsic Disorder (CAID) evaluation dataset and outperforms competitors on two out of four test cases in the CAID 2 dataset, while maintaining competitiveness in the others. This performance is due to the information learned during pretraining and DR-BERT's ability to use contextual information.
Original language | English (US) |
---|---|
Pages (from-to) | 1260-1268.e3 |
Journal | Structure |
Volume | 32 |
Issue number | 8 |
DOIs | |
State | Published - Aug 8 2024 |
Keywords
- IDP
- IDR
- deep learning
- disorder
- machine learning
- protein language model
- protein structure prediction
ASJC Scopus subject areas
- Structural Biology
- Molecular Biology