Structured linear contextual bandits: A sharp and geometric smoothed analysis

Vidyashankar Sivakumar, Zhiwei Steven Wu, Arindam Banerjee

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Bandit learning algorithms typically involve the balance of exploration and exploitation. However, in many practical applications, worst-case scenarios needing systematic exploration are seldom encountered. In this work, we consider a smoothed setting for structured linear contextual bandits where the adversarial contexts are perturbed by Gaussian noise and the unknown parameter has structure, e.g., sparsity, group sparsity, low rank, etc. We propose simple greedy algorithms for both the single- A nd multi-parameter (i.e., different parameter for each context) settings and provide a unified regret analysis for ? with any assumed structure. The regret bounds are expressed in terms of geometric quantities such as Gaussian widths associated with the structure of ?. We also obtain sharper regret bounds compared to earlier work for the unstructured setting as a consequence of our improved analysis. We show there is implicit exploration in the smoothed setting where a simple greedy algorithm works.

Original languageEnglish (US)
Title of host publication37th International Conference on Machine Learning, ICML 2020
EditorsHal Daume, Aarti Singh
PublisherInternational Machine Learning Society (IMLS)
Pages8973-8982
Number of pages10
ISBN (Electronic)9781713821120
StatePublished - 2020
Event37th International Conference on Machine Learning, ICML 2020 - Virtual, Online
Duration: Jul 13 2020Jul 18 2020

Publication series

Name37th International Conference on Machine Learning, ICML 2020
VolumePartF168147-12

Conference

Conference37th International Conference on Machine Learning, ICML 2020
CityVirtual, Online
Period7/13/207/18/20

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Human-Computer Interaction
  • Software

Fingerprint Dive into the research topics of 'Structured linear contextual bandits: A sharp and geometric smoothed analysis'. Together they form a unique fingerprint.

Cite this