Abstract
Explicit uncertainty estimation is an effective method for addressing the overestimation problem caused by distribution shifts in offline RL. However, the common bootstrapped ensemble network method fails to obtain reliable uncertainty estimation, which will decrease the performance of offline RL. Compared with model-free offline RL, model-based offline RL provides better generalizability although it is limited by the model-bias problem. The adverse effects of model bias will be aggravated by the state mismatch phenomenon which will ultimately disrupt policy learning. In this paper, we propose the Model-based Offline RL with Uncertainty estimation and Policy constraint (MOUP) algorithm to obtain reliable uncertainty estimation and bounded state mismatch. Firstly, we introduce MC dropout to ensemble networks and propose ensemble dropout networks for uncertainty estimation. Secondly, a novel policy constraint method is given that incorporates the maximum mean discrepancy constraint into policy optimization, and we prove that such a method can generate bounded state mismatch. Finally, we evaluate the MOUP algorithm on the MuJoCo control toolkit. Experimental results show that the proposed MOUP algorithm is competitive compared with existing offline RL algorithms.
Original language | English (US) |
---|---|
Pages (from-to) | 1-13 |
Number of pages | 13 |
Journal | IEEE Transactions on Artificial Intelligence |
DOIs | |
State | Accepted/In press - 2024 |
Externally published | Yes |
Keywords
- Artificial intelligence
- Data models
- Estimation
- Heuristic algorithms
- MC dropout
- Model-based offline reinforcement learning
- Reliability
- Trajectory
- Uncertainty
- policy constraint
- uncertainty estimation
ASJC Scopus subject areas
- Computer Science Applications
- Artificial Intelligence