DART: Adaptive Accept Reject Algorithm for Non-Linear Combinatorial Bandits

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We consider the bandit problem of selecting K out of N arms at each time step. The joint reward can be a non-linear function of the rewards of the selected individual arms. The direct use of a multi-armed bandit algorithm requires choosing among all possible combinations, making the action space large. To simplify the problem, existing works on combinatorial bandits typically assume feedback as a linear function of individual rewards. In this paper, we prove the lower bound for top-K subset selection with bandit feedback with possibly correlated rewards. We present a novel algorithm for the combinatorial setting without using individual arm feedback or requiring linearity of the reward function. Additionally, our algorithm works on correlated rewards of individual arms. Our algorithm, aDaptive Accept RejecT (DART), sequentially finds good arms and eliminates bad arms based on confidence bounds. DART is computationally efficient and uses storage linear in N. Further, DART achieves a regret bound of Õ(K√KNT) for a time horizon T, which matches the lower bound in bandit feedback up to a factor of √log 2NT. When applied to the problem of cross-selling optimization and maximizing the mean of individual rewards, the performance of the proposed algorithm surpasses that of state-of-the-art algorithms. We also show that DART significantly outperforms existing methods for both linear and non-linear joint reward environments.

Original languageEnglish (US)
Title of host publication35th AAAI Conference on Artificial Intelligence, AAAI 2021
PublisherAssociation for the Advancement of Artificial Intelligence
Pages6557-6565
Number of pages9
ISBN (Electronic)9781713835974
StatePublished - 2021
Externally publishedYes
Event35th AAAI Conference on Artificial Intelligence, AAAI 2021 - Virtual, Online
Duration: Feb 2 2021Feb 9 2021

Publication series

Name35th AAAI Conference on Artificial Intelligence, AAAI 2021
Volume8A

Conference

Conference35th AAAI Conference on Artificial Intelligence, AAAI 2021
CityVirtual, Online
Period2/2/212/9/21

ASJC Scopus subject areas

  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'DART: Adaptive Accept Reject Algorithm for Non-Linear Combinatorial Bandits'. Together they form a unique fingerprint.

Cite this