TY - JOUR

T1 - Budget-Constrained Bandits over General Cost and Reward Distributions

AU - Cayci, Semih

AU - Eryilmaz, Atilla

AU - Srikant, R.

N1 - Funding Information:
This research was supported in part by the NSF grants: CNS-NeTS-1514260, CNS-NeTS-1717045, CMMI-SMOR-1562065, CNS-ICN-WEN-1719371, NSF CCF 1934986, NSF NeTS 1718203, and CNS-SpecEES-1824337; ONR grants: ONR N00014-19-1-2621, ONR N00014-19-1-2566; the DTRA grant HDTRA1-18-1-0050; ARO W911NF-19-1-0379.
Publisher Copyright:
Copyright © 2020 by the author(s)

PY - 2020

Y1 - 2020

N2 - We consider a budget-constrained bandit problem where each arm pull incurs a random cost, and yields a random reward in return. The objective is to maximize the total expected reward under a budget constraint on the total cost. The model is general in the sense that it allows correlated and potentially heavy-tailed cost-reward pairs that can take on negative values as required by many applications. We show that if moments of order (2+γ) for some γ > 0 exist for all cost-reward pairs, O(log B) regret is achievable for a budget B > 0. In order to achieve tight regret bounds, we propose algorithms that exploit the correlation between the cost and reward of each arm by extracting the common information via linear minimum mean-square error estimation. We prove a regret lower bound for this problem, and show that the proposed algorithms achieve tight problem-dependent regret bounds, which are optimal up to a universal constant factor in the case of jointly Gaussian cost and reward pairs.

AB - We consider a budget-constrained bandit problem where each arm pull incurs a random cost, and yields a random reward in return. The objective is to maximize the total expected reward under a budget constraint on the total cost. The model is general in the sense that it allows correlated and potentially heavy-tailed cost-reward pairs that can take on negative values as required by many applications. We show that if moments of order (2+γ) for some γ > 0 exist for all cost-reward pairs, O(log B) regret is achievable for a budget B > 0. In order to achieve tight regret bounds, we propose algorithms that exploit the correlation between the cost and reward of each arm by extracting the common information via linear minimum mean-square error estimation. We prove a regret lower bound for this problem, and show that the proposed algorithms achieve tight problem-dependent regret bounds, which are optimal up to a universal constant factor in the case of jointly Gaussian cost and reward pairs.

UR - http://www.scopus.com/inward/record.url?scp=85161871522&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85161871522&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:85161871522

SN - 2640-3498

VL - 108

SP - 4388

EP - 4398

JO - Proceedings of Machine Learning Research

JF - Proceedings of Machine Learning Research

T2 - 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020

Y2 - 26 August 2020 through 28 August 2020

ER -