Decentralized Heterogeneous Multi-Player Multi-Armed Bandits with Non-Zero Rewards on Collisions

Akshayaa Magesh, Venugopal V. Veeravalli

Research output: Contribution to journalArticlepeer-review


We consider a fully decentralized multi-player stochastic multi-armed bandit setting where the players cannot communicate with each other and can observe only their own actions and rewards. The environment may appear differently to different players, i.e., the reward distributions for a given arm are heterogeneous across players. In the case of a collision (when more than one player plays the same arm), we allow for the colliding players to receive non-zero rewards. The time-horizon T for which the arms are played is not known to the players. Within this setup, where the number of players is allowed to be greater than the number of arms, we present a policy that achieves near order-optimal expected regret of order O(log1+δT) for δ >0 (however small) over a time-horizon of duration $T$.

Original languageEnglish (US)
Pages (from-to)2622-2634
Number of pages13
JournalIEEE Transactions on Information Theory
Issue number4
StatePublished - Apr 1 2022
Externally publishedYes


  • Cognitive radio
  • Decentralized Bandits
  • Decision making
  • Internet of Things
  • Licenses
  • Multi-player
  • Music
  • Non-homogeneous rewards
  • Sensors
  • Spectrum Access
  • Stochastic processes
  • non-homogeneous rewards
  • decentralized bandits
  • spectrum access

ASJC Scopus subject areas

  • Information Systems
  • Library and Information Sciences
  • Computer Science Applications

Cite this