Higher-Order Gradient Play Leading to Nash Equilibrium in the Bandit Setting

Sarah A. Toonsi, Jeff S. Shamma

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We investigate learning in games in the bandit setting, where players only have access to their own realized payoffs. Players do not observe actions of others and do not know the functional form of their own utility functions. Of particular interest is learning mixed-strategy Nash Equilibria (NE). Prior work has shown that learning mixed-strategy NE can be impossible for broad classes of learning dynamics. Follow-up work showed that higher-order learning can overcome such limitations. In particular, for any isolated completely mixed-strategy NE in a polymatrix game, there exist continuous-time uncoupled higher-order gradient play dynamics that converge locally to that NE. Using the ODE method of stochastic approximation, we leverage these results to address the bandit setting. As an interim step, we first address a stochastic discrete-time setting where players observe actions of others. We then modify the same setup to cover the bandit case. Our primary focus will be on isolated mixed-strategy NE that can be stabilized by higher-order learning dynamics that are internally stable, or what we refer to as strongly stabilizable mixed-strategy NE. For both the action observation and the bandit case, we show that if x* is an isolated completely mixed-strategy NE in a polymatrix game, and if x* is strongly stabilizable, then there exist higher-order uncoupled learning algorithms that guarantee a positive probability of convergence to that NE for the original game and to perturbed NE in nearby games. We then treat the unnatural case where internally unstable dynamics are required for stabilization. We show that the same results hold under minor modifications of the stochastic algorithms. These results do not imply universal convergence by specific dynamics for all games. Rather, the implication is to establish that isolated completely mixed-strategy NE are learnable.

Original languageEnglish (US)
Title of host publication2024 IEEE 63rd Conference on Decision and Control, CDC 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages5984-5989
Number of pages6
ISBN (Electronic)9798350316339
DOIs
StatePublished - 2024
Event63rd IEEE Conference on Decision and Control, CDC 2024 - Milan, Italy
Duration: Dec 16 2024Dec 19 2024

Publication series

NameProceedings of the IEEE Conference on Decision and Control
ISSN (Print)0743-1546
ISSN (Electronic)2576-2370

Conference

Conference63rd IEEE Conference on Decision and Control, CDC 2024
Country/TerritoryItaly
CityMilan
Period12/16/2412/19/24

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Modeling and Simulation
  • Control and Optimization

Fingerprint

Dive into the research topics of 'Higher-Order Gradient Play Leading to Nash Equilibrium in the Bandit Setting'. Together they form a unique fingerprint.

Cite this