Linear bandits with limited adaptivity and learning distributional optimal design

Yufei Ruan, Jiaqi Yang, Yuan Zhou

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Motivated by practical needs such as large-scale learning, we study the impact of adaptivity constraints to linear contextual bandits, a central problem in online learning and decision making. We consider two popular limited adaptivity models in literature: batch learning and rare policy switches. We show that, when the context vectors are adversarially chosen in d-dimensional linear contextual bandits, the learner needs O(d logd logT) policy switches to achieve the minimax-optimal regret, and this is optimal up to poly(logd, loglogT) factors; for stochastic context vectors, even in the more restricted batch learning model, only O(loglogT) batches are needed to achieve the optimal regret. Together with the known results in literature, our results present a complete picture about the adaptivity constraints in linear contextual bandits. Along the way, we propose the distributional optimal design, a natural extension of the optimal experiment design, and provide a both statistically and computationally efficient learning algorithm for the problem, which may be of independent interest.

Original languageEnglish (US)
Title of host publicationSTOC 2021 - Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing
EditorsSamir Khuller, Virginia Vassilevska Williams
PublisherAssociation for Computing Machinery
Pages74-87
Number of pages14
ISBN (Electronic)9781450380539
DOIs
StatePublished - Jun 15 2021
Event53rd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2021 - Virtual, Online, Italy
Duration: Jun 21 2021Jun 25 2021

Publication series

NameProceedings of the Annual ACM Symposium on Theory of Computing
ISSN (Print)0737-8017

Conference

Conference53rd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2021
Country/TerritoryItaly
CityVirtual, Online
Period6/21/216/25/21

Keywords

  • Adaptivity constraints
  • Batch learning
  • Linear bandits
  • Optimal design

ASJC Scopus subject areas

  • Software

Fingerprint

Dive into the research topics of 'Linear bandits with limited adaptivity and learning distributional optimal design'. Together they form a unique fingerprint.

Cite this