DR-BERT: A protein language model to annotate disordered regions

Ananthan Nambiar, John Malcolm Forsyth, Simon Liu, Sergei Maslov

Research output: Contribution to journalArticlepeer-review

Abstract

Despite their lack of a rigid structure, intrinsically disordered regions (IDRs) in proteins play important roles in cellular functions, including mediating protein-protein interactions. Therefore, it is important to computationally annotate IDRs with high accuracy. In this study, we present Disordered Region prediction using Bidirectional Encoder Representations from Transformers (DR-BERT), a compact protein language model. Unlike most popular tools, DR-BERT is pretrained on unannotated proteins and trained to predict IDRs without relying on explicit evolutionary or biophysical data. Despite this, DR-BERT demonstrates significant improvement over existing methods on the Critical Assessment of protein Intrinsic Disorder (CAID) evaluation dataset and outperforms competitors on two out of four test cases in the CAID 2 dataset, while maintaining competitiveness in the others. This performance is due to the information learned during pretraining and DR-BERT's ability to use contextual information.

Original languageEnglish (US)
JournalStructure
DOIs
StateAccepted/In press - 2024

Keywords

  • deep learning
  • disorder
  • IDP
  • IDR
  • machine learning
  • protein language model
  • protein structure prediction

ASJC Scopus subject areas

  • Structural Biology
  • Molecular Biology

Fingerprint

Dive into the research topics of 'DR-BERT: A protein language model to annotate disordered regions'. Together they form a unique fingerprint.

Cite this