TY - JOUR
T1 - Towards Robust Model-Based Reinforcement Learning Against Adversarial Corruption
AU - Ye, Chenlu
AU - He, Jiafan
AU - Gu, Quanquan
AU - Zhang, Tong
N1 - The authors would like to thank the anonymous reviewers for many insightful comments and suggestions. JH and QG are supported by the National Science Foundation CAREER Award 1906169 and research fund from UCLA-Amazon Science Hub. The views and conclusions contained in this paper are those of the authors and should not be interpreted as representing any funding agencies.
PY - 2024
Y1 - 2024
N2 - This study tackles the challenges of adversarial corruption in model-based reinforcement learning (RL), where the transition dynamics can be corrupted by an adversary. Existing studies on corruption-robust RL mostly focus on the setting of model-free RL, where robust least-square regression is often employed for value function estimation. However, the uncertainty weighting techniques cannot be directly applied to model-based RL. In this paper, we focus on model-based RL and take the maximum likelihood estimation (MLE) approach to learn transition model. Our work encompasses both online and offline settings. In the online setting, we introduce an algorithm called corruption-robust optimistic MLE (CR-OMLE), which leverages total-variation (TV)-based information ratios as uncertainty weights for MLE. We prove that CR-OMLE achieves a regret of Õ(√T + C), where C denotes the cumulative corruption level after T episodes. We also prove a lower bound to show that the additive dependence on C is optimal. We extend our weighting technique to the offline setting, and propose an algorithm named corruption-robust pessimistic MLE (CR-PMLE). Under a uniform coverage condition, CR-PMLE exhibits suboptimality worsened by O(C/n), nearly matching the lower bound. To the best of our knowledge, this is the first work on corruption-robust model-based RL algorithms with provable guarantees.
AB - This study tackles the challenges of adversarial corruption in model-based reinforcement learning (RL), where the transition dynamics can be corrupted by an adversary. Existing studies on corruption-robust RL mostly focus on the setting of model-free RL, where robust least-square regression is often employed for value function estimation. However, the uncertainty weighting techniques cannot be directly applied to model-based RL. In this paper, we focus on model-based RL and take the maximum likelihood estimation (MLE) approach to learn transition model. Our work encompasses both online and offline settings. In the online setting, we introduce an algorithm called corruption-robust optimistic MLE (CR-OMLE), which leverages total-variation (TV)-based information ratios as uncertainty weights for MLE. We prove that CR-OMLE achieves a regret of Õ(√T + C), where C denotes the cumulative corruption level after T episodes. We also prove a lower bound to show that the additive dependence on C is optimal. We extend our weighting technique to the offline setting, and propose an algorithm named corruption-robust pessimistic MLE (CR-PMLE). Under a uniform coverage condition, CR-PMLE exhibits suboptimality worsened by O(C/n), nearly matching the lower bound. To the best of our knowledge, this is the first work on corruption-robust model-based RL algorithms with provable guarantees.
UR - http://www.scopus.com/inward/record.url?scp=85203836897&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85203836897&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85203836897
SN - 2640-3498
VL - 235
SP - 56982
EP - 57017
JO - Proceedings of Machine Learning Research
JF - Proceedings of Machine Learning Research
T2 - 41st International Conference on Machine Learning, ICML 2024
Y2 - 21 July 2024 through 27 July 2024
ER -