TY - GEN
T1 - SBEED
T2 - 35th International Conference on Machine Learning, ICML 2018
AU - Dai, Bo
AU - Shaw, Albert
AU - Li, Lihong
AU - Xiao, Lin
AU - He, Niao
AU - Liu, Zhen
AU - Chen, Jlanshu
AU - Song, Le
N1 - Part of this work was done during BD's internship at Microsoft Research, Redmond. Part of the work was done when LL and JC were with Microsoft Research, Redmond. We thank Mohammad Ghavamzadeh, Nan Jiang, Csaba Szepesvari, and Greg Tucker for their insightful comments and discussions. NH is supported by NSF CCF-1755829. LS is supported in part by NSF IIS-1218749, NIH BIG- DATA 1R01GM108341, NSF CAREER IIS-1350983, NSF IIS-1639792 EAGER, NSF CNS-1704701, ONR N00014- 15-1-2340, Intel ISTC, NVIDIA and Amazon AWS.
PY - 2018
Y1 - 2018
N2 - When function approximation is used, solving the Bellman optimality equation with stability guarantees has remained a major open problem in reinforcement learning for decades. The fun-damental difficulty is that the Bellman operator may bccomc an expansion in general, resulting in oscillating and even divergent behavior of popular algorithms like Q-learning. In this paper, we revisit the Bellman equation, and reformulate it into a novel primal-dual optimization problem using Nesterov's smoothing technique and the Lcgcndre-Fenchel transformation. We then develop a new algorithm, called Smoothed Bellman Error Embedding, to solve this optimization problem where any differentiate function class may be used. We provide what we believe to be the first convergence guarantee for general nonlinear function approximation, and analyze the algorithm's sample complexity. Empirically, our algorithm compares favorably to state-of-the-art baselines in several benchmark control problems.
AB - When function approximation is used, solving the Bellman optimality equation with stability guarantees has remained a major open problem in reinforcement learning for decades. The fun-damental difficulty is that the Bellman operator may bccomc an expansion in general, resulting in oscillating and even divergent behavior of popular algorithms like Q-learning. In this paper, we revisit the Bellman equation, and reformulate it into a novel primal-dual optimization problem using Nesterov's smoothing technique and the Lcgcndre-Fenchel transformation. We then develop a new algorithm, called Smoothed Bellman Error Embedding, to solve this optimization problem where any differentiate function class may be used. We provide what we believe to be the first convergence guarantee for general nonlinear function approximation, and analyze the algorithm's sample complexity. Empirically, our algorithm compares favorably to state-of-the-art baselines in several benchmark control problems.
UR - https://www.scopus.com/pages/publications/85055819903
UR - https://www.scopus.com/pages/publications/85055819903#tab=citedBy
M3 - Conference contribution
AN - SCOPUS:85055819903
T3 - 35th International Conference on Machine Learning, ICML 2018
SP - 1809
EP - 1818
BT - 35th International Conference on Machine Learning, ICML 2018
A2 - Krause, Andreas
A2 - Dy, Jennifer
PB - International Machine Learning Society (IMLS)
Y2 - 10 July 2018 through 15 July 2018
ER -