The Epoch-Greedy algorithm for contextual multi-armed bandits

John Langford, Tong Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present Epoch-Greedy, an algorithm for contextual multi-armed bandits (also known as bandits with side information). Epoch-Greedy has the following properties: 1. No knowledge of a time horizon T is necessary. 2. The regret incurred by Epoch-Greedy is controlled by a sample complexity bound for a hypothesis class. 3. The regret scales as O(T2/3S 1/3) or better (sometimes, much better). Here S is the complexity term in a sample complexity bound for standard supervised learning.

Original languageEnglish (US)
Title of host publicationAdvances in Neural Information Processing Systems 20 - Proceedings of the 2007 Conference
PublisherNeural Information Processing Systems
ISBN (Print)160560352X, 9781605603520
StatePublished - 2008
Externally publishedYes
Event21st Annual Conference on Neural Information Processing Systems, NIPS 2007 - Vancouver, BC, Canada
Duration: Dec 3 2007Dec 6 2007

Publication series

NameAdvances in Neural Information Processing Systems 20 - Proceedings of the 2007 Conference

Conference

Conference21st Annual Conference on Neural Information Processing Systems, NIPS 2007
Country/TerritoryCanada
CityVancouver, BC
Period12/3/0712/6/07

ASJC Scopus subject areas

  • Information Systems

Fingerprint

Dive into the research topics of 'The Epoch-Greedy algorithm for contextual multi-armed bandits'. Together they form a unique fingerprint.

Cite this