Policy improvement for repeated zero-sum games with asymmetric information

Malachi Jones, Jeff S. Shamma

Research output: Contribution to journalConference articlepeer-review

Abstract

In a repeated zero-sum game, two players repeatedly play the same zero-sum game over several stages. We assume that while both players can observe the actions of the other, only one player knows the actual game, which was randomly selected from a set of possible games according to a known distribution. The dilemma faced by the informed player is how to trade off the short-term reward versus long-term consequence of exploiting information, since exploitation also risks revelation. Classic work by Aumann and Maschler derives the recursive value equation, which quantifies this tradeoff and derives a formula for optimal policies by the informed player. However, using this model for explicit computations can be computationally prohibitive as the number of game stages increases. In this paper, we derive a suboptimal policy based on the concept of policy improvement. The baseline policy is a non-revealing policy, i.e., one that completely ignores superior information. The improved policy, which is implemented in a receding horizon manner, strategizes for the current stage while assuming a non-revealing policy for future stages. We show that the improved policy can be computed by solving a linear program, and the computational complexity of this linear program is constant with respect to the length of the game. We derive bounds on the guaranteed performance of the improved policy and establish that the bounds are tight.

Original languageEnglish (US)
Article number6426062
Pages (from-to)7752-7757
Number of pages6
JournalProceedings of the IEEE Conference on Decision and Control
DOIs
StatePublished - 2012
Externally publishedYes
Event51st IEEE Conference on Decision and Control, CDC 2012 - Maui, HI, United States
Duration: Dec 10 2012Dec 13 2012

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Modeling and Simulation
  • Control and Optimization

Fingerprint Dive into the research topics of 'Policy improvement for repeated zero-sum games with asymmetric information'. Together they form a unique fingerprint.

Cite this