Bandit Learning for Dynamic Colonel Blotto Game with a Budget Constraint

Vincent Leon, S. Rasoul Etesami

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We consider a dynamic Colonel Blotto game (CBG) in which one of the players is the learner and has limited troops (budget) to allocate over a finite time horizon. At each stage, the learner strategically determines the budget distribution among the battlefields based on past observations. The other player is the adversary, who chooses its budget allocation strategies randomly from some fixed unknown distribution. The learner's objective is to minimize its regret, which is the difference between the payoff of the best mixed strategy and the realized payoff by following a learning algorithm. The dynamic CBG is analyzed under the framework of combinatorial bandit and bandit with knapsacks. We first convert the dynamic CBG with budget constraint to a path planning problem on a graph. We then devise an efficient dynamic policy for the learner that uses a combinatorial bandit algorithm Edge on the path planning graph as a subroutine for another algorithm LagrangeBwK. It is shown that under the proposed policy, the learner's regret is bounded with high probability by a term sublinear in time horizon T and polynomial with respect to other parameters.

Original languageEnglish (US)
Title of host publication60th IEEE Conference on Decision and Control, CDC 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages3818-3823
Number of pages6
ISBN (Electronic)9781665436595
DOIs
StatePublished - 2021
Event60th IEEE Conference on Decision and Control, CDC 2021 - Austin, United States
Duration: Dec 13 2021Dec 17 2021

Publication series

NameProceedings of the IEEE Conference on Decision and Control
Volume2021-December
ISSN (Print)0743-1546
ISSN (Electronic)2576-2370

Conference

Conference60th IEEE Conference on Decision and Control, CDC 2021
Country/TerritoryUnited States
CityAustin
Period12/13/2112/17/21

Keywords

  • Colonel Blotto game
  • dynamic game
  • multi-armed bandit
  • online learning
  • regret minimization

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Modeling and Simulation
  • Control and Optimization

Fingerprint

Dive into the research topics of 'Bandit Learning for Dynamic Colonel Blotto Game with a Budget Constraint'. Together they form a unique fingerprint.

Cite this