Stochastic Top-K Subset Bandits with Linear Space and Non-Linear Feedback

Mridul Agarwal, Vaneet Aggarwal, Christopher J. Quinn, Abhishek K. Umrawal

Research output: Contribution to journalConference articlepeer-review

Abstract

Many real-world problems like Social Influence Maximization face the dilemma of choosing the best K out of N options at a given time instant. This setup can be modeled as a combinatorial bandit which chooses K out of N arms at each time, with an aim to achieve an efficient trade-off between exploration and exploitation. This is the first work for combinatorial bandits where the feedback received can be a non-linear function of the chosen K arms. The direct use of multi-armed bandit requires choosing among N-choose-K options making the state space large. In this paper, we present a novel algorithm which is computationally efficient and the storage is linear in N. The proposed algorithm is a divide-and-conquer based strategy, that we call CMAB-SM. Further, the proposed algorithm achieves a regret bound of Õ(K 1 2 N 1 3 T 2 3) for a time horizon T, which is sub-linear in all parameters T, N, and K.

Original languageEnglish (US)
Pages (from-to)306-339
Number of pages34
JournalProceedings of Machine Learning Research
Volume132
StatePublished - 2021
Externally publishedYes
Event32nd International Conference on Algorithmic Learning Theory, ALT 2021 - Virtual, Online
Duration: Mar 16 2021Mar 19 2021

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'Stochastic Top-K Subset Bandits with Linear Space and Non-Linear Feedback'. Together they form a unique fingerprint.

Cite this