Thresholding bandit with optimal aggregate regret

Chao Tao, Saúl A. Blanco, Jian Peng, Yuan Zhou

Research output: Contribution to journalConference articlepeer-review

Abstract

We consider the thresholding bandit problem, whose goal is to find arms of mean rewards above a given threshold ?, with a fixed budget of T trials. We introduce LSA, a new, simple and anytime algorithm that aims to minimize the aggregate regret (or the expected number of mis-classified arms). We prove that our algorithm is instance-wise asymptotically optimal. We also provide comprehensive empirical results to demonstrate the algorithm's superior performance over existing algorithms under a variety of different scenarios.

Original languageEnglish (US)
JournalAdvances in Neural Information Processing Systems
Volume32
StatePublished - 2019
Event33rd Annual Conference on Neural Information Processing Systems, NeurIPS 2019 - Vancouver, Canada
Duration: Dec 8 2019Dec 14 2019

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Fingerprint

Dive into the research topics of 'Thresholding bandit with optimal aggregate regret'. Together they form a unique fingerprint.

Cite this