Abstract
In reinforcement learning (RL), problems with long planning horizons are perceived as very challenging. The recent advances in PAC RL, however, show that the sample complexity of RL does not depend on planning horizon except at a superficial level. How can we explain such a difference? Noting that the technical assumptions in these upper bounds might have hidden away the challenges of long horizons, we ask the question: can we prove a lower bound with a horizon dependence when such assumptions are removed? We also provide a few observations on the desired characteristics of the lower bound construction.
Original language | English (US) |
---|---|
Pages (from-to) | 3395-3398 |
Number of pages | 4 |
Journal | Proceedings of Machine Learning Research |
Volume | 75 |
State | Published - 2018 |
Externally published | Yes |
Event | 31st Annual Conference on Learning Theory, COLT 2018 - Stockholm, Sweden Duration: Jul 6 2018 → Jul 9 2018 |
Keywords
- planning horizon
- reinforcement learning
- sample complexity
ASJC Scopus subject areas
- Artificial Intelligence
- Software
- Control and Systems Engineering
- Statistics and Probability