Active reinforcement learning

Arkady Epshteyn, Adam Vogel, Gerald Dejong

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

When the transition probabilities and rewards of a Markov Decision Process (MDP) are known, an agent can obtain the optimal policy without any interaction with the environment. However, exact transition probabilities are difficult for experts to specify. One option left to an agent is a long and potentially costly exploration of the environment. In this paper, we propose another alternative: given initial (possibly inaccurate) specification of the MDP, the agent determines the sensitivity of the optimal policy to changes in transitions and rewards. It then focuses its exploration on the regions of space to which the optimal policy is most sensitive. We show that the proposed exploration strategy performs well on several control and planning problems.

Original languageEnglish (US)
Title of host publicationProceedings of the 25th International Conference on Machine Learning
Pages296-303
Number of pages8
StatePublished - Nov 26 2008
Event25th International Conference on Machine Learning - Helsinki, Finland
Duration: Jul 5 2008Jul 9 2008

Publication series

NameProceedings of the 25th International Conference on Machine Learning

Other

Other25th International Conference on Machine Learning
CountryFinland
CityHelsinki
Period7/5/087/9/08

Fingerprint

Reinforcement learning
Specifications
Planning

ASJC Scopus subject areas

  • Artificial Intelligence
  • Human-Computer Interaction
  • Software

Cite this

Epshteyn, A., Vogel, A., & Dejong, G. (2008). Active reinforcement learning. In Proceedings of the 25th International Conference on Machine Learning (pp. 296-303). (Proceedings of the 25th International Conference on Machine Learning).

Active reinforcement learning. / Epshteyn, Arkady; Vogel, Adam; Dejong, Gerald.

Proceedings of the 25th International Conference on Machine Learning. 2008. p. 296-303 (Proceedings of the 25th International Conference on Machine Learning).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Epshteyn, A, Vogel, A & Dejong, G 2008, Active reinforcement learning. in Proceedings of the 25th International Conference on Machine Learning. Proceedings of the 25th International Conference on Machine Learning, pp. 296-303, 25th International Conference on Machine Learning, Helsinki, Finland, 7/5/08.
Epshteyn A, Vogel A, Dejong G. Active reinforcement learning. In Proceedings of the 25th International Conference on Machine Learning. 2008. p. 296-303. (Proceedings of the 25th International Conference on Machine Learning).
Epshteyn, Arkady ; Vogel, Adam ; Dejong, Gerald. / Active reinforcement learning. Proceedings of the 25th International Conference on Machine Learning. 2008. pp. 296-303 (Proceedings of the 25th International Conference on Machine Learning).
@inproceedings{e3821f30bcb14f86bd22c8cac973de0e,
title = "Active reinforcement learning",
abstract = "When the transition probabilities and rewards of a Markov Decision Process (MDP) are known, an agent can obtain the optimal policy without any interaction with the environment. However, exact transition probabilities are difficult for experts to specify. One option left to an agent is a long and potentially costly exploration of the environment. In this paper, we propose another alternative: given initial (possibly inaccurate) specification of the MDP, the agent determines the sensitivity of the optimal policy to changes in transitions and rewards. It then focuses its exploration on the regions of space to which the optimal policy is most sensitive. We show that the proposed exploration strategy performs well on several control and planning problems.",
author = "Arkady Epshteyn and Adam Vogel and Gerald Dejong",
year = "2008",
month = "11",
day = "26",
language = "English (US)",
isbn = "9781605582054",
series = "Proceedings of the 25th International Conference on Machine Learning",
pages = "296--303",
booktitle = "Proceedings of the 25th International Conference on Machine Learning",

}

TY - GEN

T1 - Active reinforcement learning

AU - Epshteyn, Arkady

AU - Vogel, Adam

AU - Dejong, Gerald

PY - 2008/11/26

Y1 - 2008/11/26

N2 - When the transition probabilities and rewards of a Markov Decision Process (MDP) are known, an agent can obtain the optimal policy without any interaction with the environment. However, exact transition probabilities are difficult for experts to specify. One option left to an agent is a long and potentially costly exploration of the environment. In this paper, we propose another alternative: given initial (possibly inaccurate) specification of the MDP, the agent determines the sensitivity of the optimal policy to changes in transitions and rewards. It then focuses its exploration on the regions of space to which the optimal policy is most sensitive. We show that the proposed exploration strategy performs well on several control and planning problems.

AB - When the transition probabilities and rewards of a Markov Decision Process (MDP) are known, an agent can obtain the optimal policy without any interaction with the environment. However, exact transition probabilities are difficult for experts to specify. One option left to an agent is a long and potentially costly exploration of the environment. In this paper, we propose another alternative: given initial (possibly inaccurate) specification of the MDP, the agent determines the sensitivity of the optimal policy to changes in transitions and rewards. It then focuses its exploration on the regions of space to which the optimal policy is most sensitive. We show that the proposed exploration strategy performs well on several control and planning problems.

UR - http://www.scopus.com/inward/record.url?scp=56449114181&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=56449114181&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:56449114181

SN - 9781605582054

T3 - Proceedings of the 25th International Conference on Machine Learning

SP - 296

EP - 303

BT - Proceedings of the 25th International Conference on Machine Learning

ER -