Abstract
Explicit uncertainty estimation is an effective method for addressing the overestimation problem caused by distribution shifts in offline reinforcement learning (RL). However, the common bootstrapped ensemble network method fails to obtain reliable uncertainty estimation, which will decrease the performance of offline RL. Compared with model-free offline RL, model-based offline RL provides better generalizability although it is limited by the model-bias problem. The adverse effects of model bias will be aggravated by the state mismatch phenomenon that will ultimately disrupt policy learning. In this article, we propose the model-based offline RL with uncertainty estimation and policy constraint (MOUP) algorithm to obtain reliable uncertainty estimation and bounded state mismatch. First, we introduce Monte Carlo (MC) dropout to ensemble networks and propose ensemble dropout networks for uncertainty estimation. Second, a novel policy constraint method is given that incorporates the maximum mean discrepancy constraint into policy optimization, and we prove that such a method can generate bounded state mismatch. Finally, we evaluate the MOUP algorithm on the MuJoCo control toolkit. Experimental results show that the proposed MOUP algorithm is competitive compared with existing offline RL algorithms.
| Original language | English (US) |
|---|---|
| Pages (from-to) | 6066-6079 |
| Number of pages | 14 |
| Journal | IEEE Transactions on Artificial Intelligence |
| Volume | 5 |
| Issue number | 12 |
| DOIs | |
| State | Published - 2024 |
Keywords
- MC dropout
- model-based offline reinforcement learning (RL)
- policy constraint
- uncertainty estimation
ASJC Scopus subject areas
- Computer Science Applications
- Artificial Intelligence
Fingerprint
Dive into the research topics of 'Model-Based Offline Reinforcement Learning With Uncertainty Estimation and Policy Constraint'. Together they form a unique fingerprint.Cite this
- APA
- Standard
- Harvard
- Vancouver
- Author
- BIBTEX
- RIS