Policy Search in Infinite-Horizon Discounted Reinforcement Learning: Advances through Connections to Non-Convex Optimization: Invited Presentation

Kaiqing Zhang, Alec Koppel, Hao Zhu, Tamer Başar

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In reinforcement learning (RL), an agent moving through a state space, selects actions which cause a transition to a new state according to an unknown Markov transition density that depends on the previous state and action. After each transition, a reward that informs the quality of being in a particular state is revealed. The goal is to select the action sequence to maximize the long term accumulation of rewards, or value. We focus on the case where the policy that determines how actions are chosen is a fixed stationary distribution parameterized by a vector, the problem horizon is infinite, and the states and actions belong to continuous Euclidean subsets.

Original languageEnglish (US)
Title of host publication2019 53rd Annual Conference on Information Sciences and Systems, CISS 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728111513
DOIs
StatePublished - Apr 16 2019
Event53rd Annual Conference on Information Sciences and Systems, CISS 2019 - Baltimore, United States
Duration: Mar 20 2019Mar 22 2019

Publication series

Name2019 53rd Annual Conference on Information Sciences and Systems, CISS 2019

Conference

Conference53rd Annual Conference on Information Sciences and Systems, CISS 2019
Country/TerritoryUnited States
CityBaltimore
Period3/20/193/22/19

ASJC Scopus subject areas

  • Information Systems

Fingerprint

Dive into the research topics of 'Policy Search in Infinite-Horizon Discounted Reinforcement Learning: Advances through Connections to Non-Convex Optimization: Invited Presentation'. Together they form a unique fingerprint.

Cite this