Neuro-dynamic programming encompasses techniques from both reinforcement learning and approximate dynamic programming. Feature selection refers to the choice of basis that defines the function class that is required in the application of these techniques. This chapter reviews two popular approaches to neuro-dynamic programming, TD- and Q-Learning. The main goal of the chapter is to demonstrate how insight from idealized models can be used as a guide for feature selection for these algorithms. Several approaches are surveyed, including fluid and diffusion models, and the application of idealized models arising from mean-field game approximations. The theory is illustrated with several examples.
- Feature selection for neuro DP
- Neuro DP, RL and DP
- Neuro-dynamic, TD-/Q-Learning for MDPs
- Optimal for stochastic/deterministic, SARSA
- Parameterized RL, via LP approaches
ASJC Scopus subject areas