TY - JOUR
T1 - OpenFold
T2 - retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization
AU - Ahdritz, Gustaf
AU - Bouatta, Nazim
AU - Floristean, Christina
AU - Kadyan, Sachin
AU - Xia, Qinghui
AU - Gerecke, William
AU - O’Donnell, Timothy J.
AU - Berenberg, Daniel
AU - Fisk, Ian
AU - Zanichelli, Niccolò
AU - Zhang, Bo
AU - Nowaczynski, Arkadiusz
AU - Wang, Bei
AU - Stepniewska-Dziubinska, Marta M.
AU - Zhang, Shang
AU - Ojewole, Adegoke
AU - Guney, Murat Efe
AU - Biderman, Stella
AU - Watkins, Andrew M.
AU - Ra, Stephen
AU - Lorenzo, Pablo Ribalta
AU - Nivon, Lucas
AU - Weitzner, Brian
AU - Ban, Yih En Andrew
AU - Chen, Shiyang
AU - Zhang, Minjia
AU - Li, Conglong
AU - Song, Shuaiwen Leon
AU - He, Yuxiong
AU - Sorger, Peter K.
AU - Mostaque, Emad
AU - Zhang, Zhao
AU - Bonneau, Richard
AU - AlQuraishi, Mohammed
N1 - Publisher Copyright:
© The Author(s), under exclusive licence to Springer Nature America, Inc. 2024.
PY - 2024/8
Y1 - 2024/8
N2 - AlphaFold2 revolutionized structural biology with the ability to predict protein structures with exceptionally high accuracy. Its implementation, however, lacks the code and data required to train new models. These are necessary to (1) tackle new tasks, like protein–ligand complex structure prediction, (2) investigate the process by which the model learns and (3) assess the model’s capacity to generalize to unseen regions of fold space. Here we report OpenFold, a fast, memory efficient and trainable implementation of AlphaFold2. We train OpenFold from scratch, matching the accuracy of AlphaFold2. Having established parity, we find that OpenFold is remarkably robust at generalizing even when the size and diversity of its training set is deliberately limited, including near-complete elisions of classes of secondary structure elements. By analyzing intermediate structures produced during training, we also gain insights into the hierarchical manner in which OpenFold learns to fold. In sum, our studies demonstrate the power and utility of OpenFold, which we believe will prove to be a crucial resource for the protein modeling community.
AB - AlphaFold2 revolutionized structural biology with the ability to predict protein structures with exceptionally high accuracy. Its implementation, however, lacks the code and data required to train new models. These are necessary to (1) tackle new tasks, like protein–ligand complex structure prediction, (2) investigate the process by which the model learns and (3) assess the model’s capacity to generalize to unseen regions of fold space. Here we report OpenFold, a fast, memory efficient and trainable implementation of AlphaFold2. We train OpenFold from scratch, matching the accuracy of AlphaFold2. Having established parity, we find that OpenFold is remarkably robust at generalizing even when the size and diversity of its training set is deliberately limited, including near-complete elisions of classes of secondary structure elements. By analyzing intermediate structures produced during training, we also gain insights into the hierarchical manner in which OpenFold learns to fold. In sum, our studies demonstrate the power and utility of OpenFold, which we believe will prove to be a crucial resource for the protein modeling community.
UR - http://www.scopus.com/inward/record.url?scp=85193007550&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85193007550&partnerID=8YFLogxK
U2 - 10.1038/s41592-024-02272-z
DO - 10.1038/s41592-024-02272-z
M3 - Article
C2 - 38744917
AN - SCOPUS:85193007550
SN - 1548-7091
VL - 21
SP - 1514
EP - 1524
JO - Nature Methods
JF - Nature Methods
IS - 8
ER -