TY - GEN
T1 - The Epoch-Greedy algorithm for contextual multi-armed bandits
AU - Langford, John
AU - Zhang, Tong
PY - 2008
Y1 - 2008
N2 - We present Epoch-Greedy, an algorithm for contextual multi-armed bandits (also known as bandits with side information). Epoch-Greedy has the following properties: 1. No knowledge of a time horizon T is necessary. 2. The regret incurred by Epoch-Greedy is controlled by a sample complexity bound for a hypothesis class. 3. The regret scales as O(T2/3S 1/3) or better (sometimes, much better). Here S is the complexity term in a sample complexity bound for standard supervised learning.
AB - We present Epoch-Greedy, an algorithm for contextual multi-armed bandits (also known as bandits with side information). Epoch-Greedy has the following properties: 1. No knowledge of a time horizon T is necessary. 2. The regret incurred by Epoch-Greedy is controlled by a sample complexity bound for a hypothesis class. 3. The regret scales as O(T2/3S 1/3) or better (sometimes, much better). Here S is the complexity term in a sample complexity bound for standard supervised learning.
UR - http://www.scopus.com/inward/record.url?scp=85162018594&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85162018594&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85162018594
SN - 160560352X
SN - 9781605603520
T3 - Advances in Neural Information Processing Systems 20 - Proceedings of the 2007 Conference
BT - Advances in Neural Information Processing Systems 20 - Proceedings of the 2007 Conference
PB - Neural Information Processing Systems
T2 - 21st Annual Conference on Neural Information Processing Systems, NIPS 2007
Y2 - 3 December 2007 through 6 December 2007
ER -