TY - CONF
T1 - PHYSICS-REGULATED DEEP REINFORCEMENT LEARNING
T2 - 12th International Conference on Learning Representations, ICLR 2024
AU - Cao, Hongpeng
AU - Mao, Yanbing
AU - Sha, Lui
AU - Caccamo, Marco
N1 - We would like to first thank the anonymous reviewers for their helpful feedback, thoughtful reviews, and insightful comments. We appreciate Mirco Theile's helpful suggestions regarding the technical details, which inspire our future research directions. We also thank Yihao Cai for his help in deploying Phy-DRL on a physical quadruped robot. This work was partly supported by the National Science Foundation under Grant CPS-2311084 and Grant CPS-2311085 and the Alexander von Humboldt Professorship Endowed by the German Federal Ministry of Education and Research.
PY - 2024
Y1 - 2024
N2 - This paper proposes the Phy-DRL: a physics-regulated deep reinforcement learning (DRL) framework for safety-critical autonomous systems. The Phy-DRL has three distinguished invariant-embedding designs: i) residual action policy (i.e., integrating data-driven-DRL action policy and physics-model-based action policy), ii) automatically constructed safety-embedded reward, and iii) physics-model-guided neural network (NN) editing, including link editing and activation editing. Theoretically, the Phy-DRL exhibits 1) a mathematically provable safety guarantee and 2) strict compliance of critic and actor networks with physics knowledge about the action-value function and action policy. Finally, we evaluate the Phy-DRL on a cart-pole system and a quadruped robot. The experiments validate our theoretical results and demonstrate that Phy-DRL features guaranteed safety compared to purely data-driven DRL and solely model-based design while offering remarkably fewer learning parameters and fast training towards safety guarantee.
AB - This paper proposes the Phy-DRL: a physics-regulated deep reinforcement learning (DRL) framework for safety-critical autonomous systems. The Phy-DRL has three distinguished invariant-embedding designs: i) residual action policy (i.e., integrating data-driven-DRL action policy and physics-model-based action policy), ii) automatically constructed safety-embedded reward, and iii) physics-model-guided neural network (NN) editing, including link editing and activation editing. Theoretically, the Phy-DRL exhibits 1) a mathematically provable safety guarantee and 2) strict compliance of critic and actor networks with physics knowledge about the action-value function and action policy. Finally, we evaluate the Phy-DRL on a cart-pole system and a quadruped robot. The experiments validate our theoretical results and demonstrate that Phy-DRL features guaranteed safety compared to purely data-driven DRL and solely model-based design while offering remarkably fewer learning parameters and fast training towards safety guarantee.
UR - http://www.scopus.com/inward/record.url?scp=85200599294&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85200599294&partnerID=8YFLogxK
M3 - Paper
AN - SCOPUS:85200599294
Y2 - 7 May 2024 through 11 May 2024
ER -