Bandits with budgets: Regret lower bounds and optimal algorithms

Richard Combes, Chong Jiang, Rayadurgam Srikant

Research output: Contribution to journalConference article

Abstract

We investigate multi-armed bandits with budgets, a natural model for ad-display optimization encountered in search engines. We provide asymptotic regret lower bounds satisfied by any algorithm, and propose algorithms which match those lower bounds. We consider different types of budgets: scenarios where the advertiser has a fixed budget over a time horizon, and scenarios where the amount of money that is available to spend is incremented in each time slot. Further, we consider two different pricing models, one in which an advertiser is charged for each time her ad is shown (i.e., for each impression) and one in which the advertiser is charged only if a user clicks on the ad. For all of these cases, we show that it is possible to achieve O(log(T)) regret. For both the cost-per-impression and cost-per-click models, with a fixed budget, we provide regret lower bounds that apply to any uniformly good algorithm. Further, we show that B-KL-UCB, a natural variant of KL-UCB, is asymptotically optimal for these cases. Numerical experiments (based on a real-world data set) further suggest that B-KL-UCB also has the same or better finite-time performance when compared to various previously proposed (UCB-like) algorithms, which is important when applying such algorithms to a real-world problem.

Original languageEnglish (US)
Pages (from-to)245-257
Number of pages13
JournalPerformance Evaluation Review
Volume43
Issue number1
DOIs
StatePublished - Jun 24 2015
EventACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS 2015 - Portland, United States
Duration: Jun 15 2015Jun 19 2015

Fingerprint

Costs
Search engines
Display devices
Experiments

Keywords

  • Ad-display optimization
  • Budgets
  • KL-UCB
  • Learning
  • Multi-armed bandits
  • Search engines
  • UCB

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Computer Networks and Communications

Cite this

Bandits with budgets : Regret lower bounds and optimal algorithms. / Combes, Richard; Jiang, Chong; Srikant, Rayadurgam.

In: Performance Evaluation Review, Vol. 43, No. 1, 24.06.2015, p. 245-257.

Research output: Contribution to journalConference article

@article{fbfde03ff3aa4923af1b4ff8491db524,
title = "Bandits with budgets: Regret lower bounds and optimal algorithms",
abstract = "We investigate multi-armed bandits with budgets, a natural model for ad-display optimization encountered in search engines. We provide asymptotic regret lower bounds satisfied by any algorithm, and propose algorithms which match those lower bounds. We consider different types of budgets: scenarios where the advertiser has a fixed budget over a time horizon, and scenarios where the amount of money that is available to spend is incremented in each time slot. Further, we consider two different pricing models, one in which an advertiser is charged for each time her ad is shown (i.e., for each impression) and one in which the advertiser is charged only if a user clicks on the ad. For all of these cases, we show that it is possible to achieve O(log(T)) regret. For both the cost-per-impression and cost-per-click models, with a fixed budget, we provide regret lower bounds that apply to any uniformly good algorithm. Further, we show that B-KL-UCB, a natural variant of KL-UCB, is asymptotically optimal for these cases. Numerical experiments (based on a real-world data set) further suggest that B-KL-UCB also has the same or better finite-time performance when compared to various previously proposed (UCB-like) algorithms, which is important when applying such algorithms to a real-world problem.",
keywords = "Ad-display optimization, Budgets, KL-UCB, Learning, Multi-armed bandits, Search engines, UCB",
author = "Richard Combes and Chong Jiang and Rayadurgam Srikant",
year = "2015",
month = "6",
day = "24",
doi = "10.1145/2796314.2745847",
language = "English (US)",
volume = "43",
pages = "245--257",
journal = "Performance Evaluation Review",
issn = "0163-5999",
publisher = "Association for Computing Machinery (ACM)",
number = "1",

}

TY - JOUR

T1 - Bandits with budgets

T2 - Regret lower bounds and optimal algorithms

AU - Combes, Richard

AU - Jiang, Chong

AU - Srikant, Rayadurgam

PY - 2015/6/24

Y1 - 2015/6/24

N2 - We investigate multi-armed bandits with budgets, a natural model for ad-display optimization encountered in search engines. We provide asymptotic regret lower bounds satisfied by any algorithm, and propose algorithms which match those lower bounds. We consider different types of budgets: scenarios where the advertiser has a fixed budget over a time horizon, and scenarios where the amount of money that is available to spend is incremented in each time slot. Further, we consider two different pricing models, one in which an advertiser is charged for each time her ad is shown (i.e., for each impression) and one in which the advertiser is charged only if a user clicks on the ad. For all of these cases, we show that it is possible to achieve O(log(T)) regret. For both the cost-per-impression and cost-per-click models, with a fixed budget, we provide regret lower bounds that apply to any uniformly good algorithm. Further, we show that B-KL-UCB, a natural variant of KL-UCB, is asymptotically optimal for these cases. Numerical experiments (based on a real-world data set) further suggest that B-KL-UCB also has the same or better finite-time performance when compared to various previously proposed (UCB-like) algorithms, which is important when applying such algorithms to a real-world problem.

AB - We investigate multi-armed bandits with budgets, a natural model for ad-display optimization encountered in search engines. We provide asymptotic regret lower bounds satisfied by any algorithm, and propose algorithms which match those lower bounds. We consider different types of budgets: scenarios where the advertiser has a fixed budget over a time horizon, and scenarios where the amount of money that is available to spend is incremented in each time slot. Further, we consider two different pricing models, one in which an advertiser is charged for each time her ad is shown (i.e., for each impression) and one in which the advertiser is charged only if a user clicks on the ad. For all of these cases, we show that it is possible to achieve O(log(T)) regret. For both the cost-per-impression and cost-per-click models, with a fixed budget, we provide regret lower bounds that apply to any uniformly good algorithm. Further, we show that B-KL-UCB, a natural variant of KL-UCB, is asymptotically optimal for these cases. Numerical experiments (based on a real-world data set) further suggest that B-KL-UCB also has the same or better finite-time performance when compared to various previously proposed (UCB-like) algorithms, which is important when applying such algorithms to a real-world problem.

KW - Ad-display optimization

KW - Budgets

KW - KL-UCB

KW - Learning

KW - Multi-armed bandits

KW - Search engines

KW - UCB

UR - http://www.scopus.com/inward/record.url?scp=84955578179&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84955578179&partnerID=8YFLogxK

U2 - 10.1145/2796314.2745847

DO - 10.1145/2796314.2745847

M3 - Conference article

AN - SCOPUS:84955578179

VL - 43

SP - 245

EP - 257

JO - Performance Evaluation Review

JF - Performance Evaluation Review

SN - 0163-5999

IS - 1

ER -