TY - GEN
T1 - Efficient optimal learning for contextual bandits
AU - Dudik, Miroslav
AU - Hsu, Daniel
AU - Kale, Satyen
AU - Karampatziakis, Nikos
AU - Langford, John
AU - Reyzin, Lev
AU - Zhang, Tong
PY - 2011
Y1 - 2011
N2 - We address the problem of learning in an online setting where the learner repeatedly observes features, selects among a set of actions, and receives reward for the action taken. We provide the first efficient algorithm with an optimal regret. Our algorithm uses a cost sensitive classification learner as an oracle and has a running time polylog(N), where N is the number of classification rules among which the oracle might choose. This is exponentially faster than all previous algorithms that achieve optimal regret in this setting. Our formulation also enables us to create an algorithm with regret that is additive rather than multiplicative in feedback delay as in all previous work.
AB - We address the problem of learning in an online setting where the learner repeatedly observes features, selects among a set of actions, and receives reward for the action taken. We provide the first efficient algorithm with an optimal regret. Our algorithm uses a cost sensitive classification learner as an oracle and has a running time polylog(N), where N is the number of classification rules among which the oracle might choose. This is exponentially faster than all previous algorithms that achieve optimal regret in this setting. Our formulation also enables us to create an algorithm with regret that is additive rather than multiplicative in feedback delay as in all previous work.
UR - http://www.scopus.com/inward/record.url?scp=80053154335&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=80053154335&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:80053154335
T3 - Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence, UAI 2011
SP - 169
EP - 178
BT - Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence, UAI 2011
PB - AUAI Press
ER -