TY - GEN
T1 - Higher-Order Gradient Play Leading to Nash Equilibrium in the Bandit Setting
AU - Toonsi, Sarah A.
AU - Shamma, Jeff S.
N1 - This work was supported by the University of Illinois Urbana Champaign.
PY - 2024
Y1 - 2024
N2 - We investigate learning in games in the bandit setting, where players only have access to their own realized payoffs. Players do not observe actions of others and do not know the functional form of their own utility functions. Of particular interest is learning mixed-strategy Nash Equilibria (NE). Prior work has shown that learning mixed-strategy NE can be impossible for broad classes of learning dynamics. Follow-up work showed that higher-order learning can overcome such limitations. In particular, for any isolated completely mixed-strategy NE in a polymatrix game, there exist continuous-time uncoupled higher-order gradient play dynamics that converge locally to that NE. Using the ODE method of stochastic approximation, we leverage these results to address the bandit setting. As an interim step, we first address a stochastic discrete-time setting where players observe actions of others. We then modify the same setup to cover the bandit case. Our primary focus will be on isolated mixed-strategy NE that can be stabilized by higher-order learning dynamics that are internally stable, or what we refer to as strongly stabilizable mixed-strategy NE. For both the action observation and the bandit case, we show that if x* is an isolated completely mixed-strategy NE in a polymatrix game, and if x* is strongly stabilizable, then there exist higher-order uncoupled learning algorithms that guarantee a positive probability of convergence to that NE for the original game and to perturbed NE in nearby games. We then treat the unnatural case where internally unstable dynamics are required for stabilization. We show that the same results hold under minor modifications of the stochastic algorithms. These results do not imply universal convergence by specific dynamics for all games. Rather, the implication is to establish that isolated completely mixed-strategy NE are learnable.
AB - We investigate learning in games in the bandit setting, where players only have access to their own realized payoffs. Players do not observe actions of others and do not know the functional form of their own utility functions. Of particular interest is learning mixed-strategy Nash Equilibria (NE). Prior work has shown that learning mixed-strategy NE can be impossible for broad classes of learning dynamics. Follow-up work showed that higher-order learning can overcome such limitations. In particular, for any isolated completely mixed-strategy NE in a polymatrix game, there exist continuous-time uncoupled higher-order gradient play dynamics that converge locally to that NE. Using the ODE method of stochastic approximation, we leverage these results to address the bandit setting. As an interim step, we first address a stochastic discrete-time setting where players observe actions of others. We then modify the same setup to cover the bandit case. Our primary focus will be on isolated mixed-strategy NE that can be stabilized by higher-order learning dynamics that are internally stable, or what we refer to as strongly stabilizable mixed-strategy NE. For both the action observation and the bandit case, we show that if x* is an isolated completely mixed-strategy NE in a polymatrix game, and if x* is strongly stabilizable, then there exist higher-order uncoupled learning algorithms that guarantee a positive probability of convergence to that NE for the original game and to perturbed NE in nearby games. We then treat the unnatural case where internally unstable dynamics are required for stabilization. We show that the same results hold under minor modifications of the stochastic algorithms. These results do not imply universal convergence by specific dynamics for all games. Rather, the implication is to establish that isolated completely mixed-strategy NE are learnable.
UR - http://www.scopus.com/inward/record.url?scp=86000547192&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=86000547192&partnerID=8YFLogxK
U2 - 10.1109/CDC56724.2024.10886752
DO - 10.1109/CDC56724.2024.10886752
M3 - Conference contribution
AN - SCOPUS:86000547192
T3 - Proceedings of the IEEE Conference on Decision and Control
SP - 5984
EP - 5989
BT - 2024 IEEE 63rd Conference on Decision and Control, CDC 2024
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 63rd IEEE Conference on Decision and Control, CDC 2024
Y2 - 16 December 2024 through 19 December 2024
ER -