Smoothed Adversarial Linear Contextual Bandits with Knapsacks

Vidyashankar Sivakumar, Shiliang Zuo, Arindam Banerjee

Research output: Contribution to journalConference articlepeer-review

Abstract

Many bandit problems are characterized by the learner making decisions under constraints. The learner in Linear Contextual Bandits with Knapsacks (LinCBwK) receives a resource consumption vector in addition to a scalar reward in each time step which are both linear functions of the context corresponding to the chosen arm. For a fixed time horizon T, the goal of the learner is to maximize rewards while ensuring resource consumptions do not exceed a pre-specified budget. We present algorithms and characterize regret for LinCBwK in the smoothed setting where base context vectors are assumed to be perturbed by Gaussian noise. We consider both the stochastic and adversarial settings for the base contexts, and our analysis of stochastic LinCBwK can be viewed as a warm-up to the more challenging adversarial LinCBwK. For the stochastic setting, we obtain Op?Tq additive regret bounds compared to the best context dependent fixed policy. The analysis combines ideas for greedy parameter estimation in (Kannan et al., 2018; Sivakumar et al., 2020) and the primal-dual paradigm first explored in (Agrawal & Devanur, 2016; 2014a). Our main contribution is an algorithm with Oplog Tq competitive ratio relative to the best context dependent fixed policy for the adversarial setting. The algorithm for the adversarial setting employs ideas from the primal-dual framework (Agrawal & Devanur, 2016; 2014a) and a novel adaptation of the doubling trick (Immorlica et al., 2019).

Original languageEnglish (US)
Pages (from-to)20253-20277
Number of pages25
JournalProceedings of Machine Learning Research
Volume162
StatePublished - 2022
Event39th International Conference on Machine Learning, ICML 2022 - Baltimore, United States
Duration: Jul 17 2022Jul 23 2022

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'Smoothed Adversarial Linear Contextual Bandits with Knapsacks'. Together they form a unique fingerprint.

Cite this