TY - JOUR
T1 - Marginalized Importance Sampling for Off-Environment Policy Evaluation
AU - Katdare, Pulkit
AU - Jiang, Nan
AU - Driggs-Campbell, Katherine
N1 - The authors thank Neeloy Chakroborty and Shuijing Liu for their valuable suggestions on this paper draft. This work was supported in part by ZJU-UIUC Joint Research Center Project No. DREMES 202003, funded by Zhejiang University. Additionally, Nan Jiang would also like to acknowledge funding support from NSF IIS-2112471 and NSF CAREER IIS-2141781.
PY - 2023
Y1 - 2023
N2 - Reinforcement Learning (RL) methods are typically sample-inefficient, making it challenging to train and deploy RL-policies in real world robots. Even a robust policy trained in simulation requires a real-world deployment to assess their performance. This paper proposes a new approach to evaluate the real-world performance of agent policies prior to deploying them in the real world. Our approach incorporates a simulator along with real-world offline data to evaluate the performance of any policy using the framework of Marginalized Importance Sampling (MIS). Existing MIS methods face two challenges: (1) large density ratios that deviate from a reasonable range and (2) indirect supervision, where the ratio needs to be inferred indirectly, thus exacerbating estimation error. Our approach addresses these challenges by introducing the target policy's occupancy in the simulator as an intermediate variable and learning the density ratio as the product of two terms that can be learned separately. The first term is learned with direct supervision and the second term has a small magnitude, thus making it computationally efficient. We analyze the sample complexity as well as error propagation of our two step-procedure. Furthermore, we empirically evaluate our approach on Sim2Sim environments such as Cartpole, Reacher, and Half-Cheetah. Our results show that our method generalizes well across a variety of Sim2Sim gap, target policies and offline data collection policies. We also demonstrate the performance of our algorithm on a Sim2Real task of validating the performance of a 7 DoF robotic arm using offline data along with the Gazebo simulator.
AB - Reinforcement Learning (RL) methods are typically sample-inefficient, making it challenging to train and deploy RL-policies in real world robots. Even a robust policy trained in simulation requires a real-world deployment to assess their performance. This paper proposes a new approach to evaluate the real-world performance of agent policies prior to deploying them in the real world. Our approach incorporates a simulator along with real-world offline data to evaluate the performance of any policy using the framework of Marginalized Importance Sampling (MIS). Existing MIS methods face two challenges: (1) large density ratios that deviate from a reasonable range and (2) indirect supervision, where the ratio needs to be inferred indirectly, thus exacerbating estimation error. Our approach addresses these challenges by introducing the target policy's occupancy in the simulator as an intermediate variable and learning the density ratio as the product of two terms that can be learned separately. The first term is learned with direct supervision and the second term has a small magnitude, thus making it computationally efficient. We analyze the sample complexity as well as error propagation of our two step-procedure. Furthermore, we empirically evaluate our approach on Sim2Sim environments such as Cartpole, Reacher, and Half-Cheetah. Our results show that our method generalizes well across a variety of Sim2Sim gap, target policies and offline data collection policies. We also demonstrate the performance of our algorithm on a Sim2Real task of validating the performance of a 7 DoF robotic arm using offline data along with the Gazebo simulator.
KW - Policy Evaluation
KW - Robot Validation
KW - Sim2Real
UR - http://www.scopus.com/inward/record.url?scp=85184346070&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85184346070&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85184346070
SN - 2640-3498
VL - 229
JO - Proceedings of Machine Learning Research
JF - Proceedings of Machine Learning Research
T2 - 7th Conference on Robot Learning, CoRL 2023
Y2 - 6 November 2023 through 9 November 2023
ER -