TY - GEN
T1 - Maximum entropy inverse reinforcement learning in continuous state spaces with path integrals
AU - Aghasadeghi, Navid
AU - Bretl, Timothy
PY - 2011
Y1 - 2011
N2 - In this paper, we consider the problem of inverse reinforcement learning for a particular class of continuous-time stochastic systems with continuous state and action spaces, under the assumption that both the cost function and the optimal control policy are parametric with known basis functions. Our goal is to produce a cost function for which a given policy, observed in experiment, is optimal. We proceed by enforcing a constraint on the relationship between input noise and input cost that produces a maximum entropy distribution over the space of all sample paths. We apply maximum likelihood estimation to approximate the parameters of this distribution (hence, of the cost function) given a finite set of sample paths. We iteratively improve our approximation by adding to this set the sample path that would be optimal given our current estimate of the cost function. Preliminary results in simulation provide empirical evidence that our algorithm converges.
AB - In this paper, we consider the problem of inverse reinforcement learning for a particular class of continuous-time stochastic systems with continuous state and action spaces, under the assumption that both the cost function and the optimal control policy are parametric with known basis functions. Our goal is to produce a cost function for which a given policy, observed in experiment, is optimal. We proceed by enforcing a constraint on the relationship between input noise and input cost that produces a maximum entropy distribution over the space of all sample paths. We apply maximum likelihood estimation to approximate the parameters of this distribution (hence, of the cost function) given a finite set of sample paths. We iteratively improve our approximation by adding to this set the sample path that would be optimal given our current estimate of the cost function. Preliminary results in simulation provide empirical evidence that our algorithm converges.
UR - http://www.scopus.com/inward/record.url?scp=84455160661&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84455160661&partnerID=8YFLogxK
U2 - 10.1109/IROS.2011.6048804
DO - 10.1109/IROS.2011.6048804
M3 - Conference contribution
AN - SCOPUS:84455160661
SN - 9781612844541
T3 - IEEE International Conference on Intelligent Robots and Systems
SP - 1561
EP - 1566
BT - IROS'11 - 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems
T2 - 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems: Celebrating 50 Years of Robotics, IROS'11
Y2 - 25 September 2011 through 30 September 2011
ER -