TY - GEN
T1 - Qualitative reinforcement learning
AU - Epshteyn, Arkady
AU - DeJong, Gerald
PY - 2006/10/6
Y1 - 2006/10/6
N2 - When the transition probabilities and rewards of a Markov Decision Process are specified exactly, the problem can be solved without any interaction with the environment. When no such specification is available, the agent's only recourse is a long and potentially dangerous exploration. We present a framework which allows the expert to specify imprecise knowledge of transition probabilities in terms of stochastic dominance constraints. Our algorithm can be used to find optimal policies for qualitatively specified problems, or, when no such solution is available, to decrease the required amount of exploration. The algorithm's behavior is demonstrated on simulations of two classic problems: mountain car ascent and cart pole balancing.
AB - When the transition probabilities and rewards of a Markov Decision Process are specified exactly, the problem can be solved without any interaction with the environment. When no such specification is available, the agent's only recourse is a long and potentially dangerous exploration. We present a framework which allows the expert to specify imprecise knowledge of transition probabilities in terms of stochastic dominance constraints. Our algorithm can be used to find optimal policies for qualitatively specified problems, or, when no such solution is available, to decrease the required amount of exploration. The algorithm's behavior is demonstrated on simulations of two classic problems: mountain car ascent and cart pole balancing.
UR - http://www.scopus.com/inward/record.url?scp=33749235726&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33749235726&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:33749235726
SN - 1595933832
SN - 9781595933836
T3 - ICML 2006 - Proceedings of the 23rd International Conference on Machine Learning
SP - 305
EP - 312
BT - ICML 2006 - Proceedings of the 23rd International Conference on Machine Learning
T2 - ICML 2006: 23rd International Conference on Machine Learning
Y2 - 25 June 2006 through 29 June 2006
ER -