TY - GEN
T1 - Qualitative reinforcement learning
AU - Epshteyn, Arkady
AU - DeJong, Gerald
PY - 2006
Y1 - 2006
N2 - When the transition probabilities and rewards of a Markov Decision Process are specified exactly, the problem can be solved without any interaction with the environment. When no such specification is available, the agent's only recourse is a long and potentially dangerous exploration. We present a framework which allows the expert to specify imprecise knowledge of transition probabilities in terms of stochastic dominance constraints. Our algorithm can be used to find optimal policies for qualitatively specified problems, or, when no such solution is available, to decrease the required amount of exploration. The algorithm's behavior is demonstrated on simulations of two classic problems: mountain car ascent and cart pole balancing.
AB - When the transition probabilities and rewards of a Markov Decision Process are specified exactly, the problem can be solved without any interaction with the environment. When no such specification is available, the agent's only recourse is a long and potentially dangerous exploration. We present a framework which allows the expert to specify imprecise knowledge of transition probabilities in terms of stochastic dominance constraints. Our algorithm can be used to find optimal policies for qualitatively specified problems, or, when no such solution is available, to decrease the required amount of exploration. The algorithm's behavior is demonstrated on simulations of two classic problems: mountain car ascent and cart pole balancing.
UR - https://www.scopus.com/pages/publications/33749235726
UR - https://www.scopus.com/pages/publications/33749235726#tab=citedBy
M3 - Conference contribution
AN - SCOPUS:33749235726
SN - 1595933832
SN - 9781595933836
T3 - ICML 2006 - Proceedings of the 23rd International Conference on Machine Learning
SP - 305
EP - 312
BT - ICML 2006 - Proceedings of the 23rd International Conference on Machine Learning
T2 - ICML 2006: 23rd International Conference on Machine Learning
Y2 - 25 June 2006 through 29 June 2006
ER -