TY - GEN

T1 - Qualitative reinforcement learning

AU - Epshteyn, Arkady

AU - DeJong, Gerald

PY - 2006

Y1 - 2006

N2 - When the transition probabilities and rewards of a Markov Decision Process are specified exactly, the problem can be solved without any interaction with the environment. When no such specification is available, the agent's only recourse is a long and potentially dangerous exploration. We present a framework which allows the expert to specify imprecise knowledge of transition probabilities in terms of stochastic dominance constraints. Our algorithm can be used to find optimal policies for qualitatively specified problems, or, when no such solution is available, to decrease the required amount of exploration. The algorithm's behavior is demonstrated on simulations of two classic problems: mountain car ascent and cart pole balancing.

AB - When the transition probabilities and rewards of a Markov Decision Process are specified exactly, the problem can be solved without any interaction with the environment. When no such specification is available, the agent's only recourse is a long and potentially dangerous exploration. We present a framework which allows the expert to specify imprecise knowledge of transition probabilities in terms of stochastic dominance constraints. Our algorithm can be used to find optimal policies for qualitatively specified problems, or, when no such solution is available, to decrease the required amount of exploration. The algorithm's behavior is demonstrated on simulations of two classic problems: mountain car ascent and cart pole balancing.

UR - http://www.scopus.com/inward/record.url?scp=33749235726&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33749235726&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:33749235726

SN - 1595933832

SN - 9781595933836

T3 - ICML 2006 - Proceedings of the 23rd International Conference on Machine Learning

SP - 305

EP - 312

BT - ICML 2006 - Proceedings of the 23rd International Conference on Machine Learning

T2 - ICML 2006: 23rd International Conference on Machine Learning

Y2 - 25 June 2006 through 29 June 2006

ER -