TY - JOUR
T1 - Confronting pitfalls of AI-augmented molecular dynamics using statistical physics
AU - Pant, Shashank
AU - Smith, Zachary
AU - Wang, Yihang
AU - Tajkhorshid, Emad
AU - Tiwary, Pratyush
PY - 2020/12/21
Y1 - 2020/12/21
N2 - Artificial intelligence (AI)-based approaches have had indubitable impact across the sciences through the ability to extract relevant information from raw data. Recently, AI has also found use in enhancing the efficiency of molecular simulations, wherein AI derived slow modes are used to accelerate the simulation in targeted ways. However, while typical fields where AI is used are characterized by a plethora of data, molecular simulations, per construction, suffer from limited sampling and thus limited data. As such, the use of AI in molecular simulations can suffer from a dangerous situation where the AI-optimization could get stuck in spurious regimes, leading to incorrect characterization of the reaction coordinate (RC) for the problem at hand. When such an incorrect RC is then used to perform additional simulations, one could start to deviate progressively from the ground truth. To deal with this problem of spurious AI-solutions, here, we report a novel and automated algorithm using ideas from statistical mechanics. It is based on the notion that a more reliable AI-solution will be one that maximizes the timescale separation between slow and fast processes. To learn this timescale separation even from limited data, we use a maximum caliber-based framework. We show the applicability of this automatic protocol for three classic benchmark problems, namely, the conformational dynamics of a model peptide, ligand-unbinding from a protein, and folding/unfolding energy landscape of the C-terminal domain of protein G. We believe that our work will lead to increased and robust use of trustworthy AI in molecular simulations of complex systems.
AB - Artificial intelligence (AI)-based approaches have had indubitable impact across the sciences through the ability to extract relevant information from raw data. Recently, AI has also found use in enhancing the efficiency of molecular simulations, wherein AI derived slow modes are used to accelerate the simulation in targeted ways. However, while typical fields where AI is used are characterized by a plethora of data, molecular simulations, per construction, suffer from limited sampling and thus limited data. As such, the use of AI in molecular simulations can suffer from a dangerous situation where the AI-optimization could get stuck in spurious regimes, leading to incorrect characterization of the reaction coordinate (RC) for the problem at hand. When such an incorrect RC is then used to perform additional simulations, one could start to deviate progressively from the ground truth. To deal with this problem of spurious AI-solutions, here, we report a novel and automated algorithm using ideas from statistical mechanics. It is based on the notion that a more reliable AI-solution will be one that maximizes the timescale separation between slow and fast processes. To learn this timescale separation even from limited data, we use a maximum caliber-based framework. We show the applicability of this automatic protocol for three classic benchmark problems, namely, the conformational dynamics of a model peptide, ligand-unbinding from a protein, and folding/unfolding energy landscape of the C-terminal domain of protein G. We believe that our work will lead to increased and robust use of trustworthy AI in molecular simulations of complex systems.
UR - http://www.scopus.com/inward/record.url?scp=85099075451&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85099075451&partnerID=8YFLogxK
U2 - 10.1063/5.0030931
DO - 10.1063/5.0030931
M3 - Article
C2 - 33353347
AN - SCOPUS:85099075451
SN - 0021-9606
VL - 153
SP - 234118
JO - The Journal of Chemical Physics
JF - The Journal of Chemical Physics
IS - 23
ER -