Abstract
We consider the thresholding bandit problem, whose goal is to find arms of mean rewards above a given threshold ?, with a fixed budget of T trials. We introduce LSA, a new, simple and anytime algorithm that aims to minimize the aggregate regret (or the expected number of mis-classified arms). We prove that our algorithm is instance-wise asymptotically optimal. We also provide comprehensive empirical results to demonstrate the algorithm's superior performance over existing algorithms under a variety of different scenarios.
Original language | English (US) |
---|---|
Journal | Advances in Neural Information Processing Systems |
Volume | 32 |
State | Published - 2019 |
Event | 33rd Annual Conference on Neural Information Processing Systems, NeurIPS 2019 - Vancouver, Canada Duration: Dec 8 2019 → Dec 14 2019 |
ASJC Scopus subject areas
- Computer Networks and Communications
- Information Systems
- Signal Processing