Budget-Constrained Bandits over General Cost and Reward Distributions

Semih Cayci, Atilla Eryilmaz, R. Srikant

Research output: Contribution to journalConference articlepeer-review

Abstract

We consider a budget-constrained bandit problem where each arm pull incurs a random cost, and yields a random reward in return. The objective is to maximize the total expected reward under a budget constraint on the total cost. The model is general in the sense that it allows correlated and potentially heavy-tailed cost-reward pairs that can take on negative values as required by many applications. We show that if moments of order (2+γ) for some γ > 0 exist for all cost-reward pairs, O(log B) regret is achievable for a budget B > 0. In order to achieve tight regret bounds, we propose algorithms that exploit the correlation between the cost and reward of each arm by extracting the common information via linear minimum mean-square error estimation. We prove a regret lower bound for this problem, and show that the proposed algorithms achieve tight problem-dependent regret bounds, which are optimal up to a universal constant factor in the case of jointly Gaussian cost and reward pairs.

Original languageEnglish (US)
Pages (from-to)4388-4398
Number of pages11
JournalProceedings of Machine Learning Research
Volume108
StatePublished - 2020
Externally publishedYes
Event23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020 - Virtual, Online
Duration: Aug 26 2020Aug 28 2020

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'Budget-Constrained Bandits over General Cost and Reward Distributions'. Together they form a unique fingerprint.

Cite this