TY - GEN

T1 - Computing security strategies in finite horizon repeated Bayesian games

AU - Li, Lichun

AU - Langbort, Cedric

AU - Shamma, Jeff

N1 - Funding Information:
The authors acknowledge the financial support of ARO project #W911NF-09-1-0553 and the AFOSR/MURI project #FA9550-10-1-0573.

PY - 2017/6/29

Y1 - 2017/6/29

N2 - This paper studies security strategies in two-player zero-sum repeated Bayesian games with finite horizon. In such games, each player has a private type which is independently chosen according to a publicly known a priori probability. Players' types are fixed all through the game. The game is played for finite stages. At every stage, players simultaneously choose their actions which are observed by the public. The one-stage payoff of player 1 (or penalty to player 2) depends on both players types and actions, and is not directly observed by any player. While player 1 aims to maximize the total payoff over the game, player 2 wants to minimize it. This paper provides each player two ways to compute the security strategy, i.e. The optimal strategy in the worst case. First, a security strategy that directly depends on both players' history actions is derived by refining the sequence form. Noticing that history action space grows exponentially with respect to the time horizon, this paper further presents a security strategy that depends on player's fixed sized sufficient statistics. The sufficient statistics is shown to consist of the belief on one's own type, the regret on the other player's type, and the stage, and is independent of the other player's strategy.

AB - This paper studies security strategies in two-player zero-sum repeated Bayesian games with finite horizon. In such games, each player has a private type which is independently chosen according to a publicly known a priori probability. Players' types are fixed all through the game. The game is played for finite stages. At every stage, players simultaneously choose their actions which are observed by the public. The one-stage payoff of player 1 (or penalty to player 2) depends on both players types and actions, and is not directly observed by any player. While player 1 aims to maximize the total payoff over the game, player 2 wants to minimize it. This paper provides each player two ways to compute the security strategy, i.e. The optimal strategy in the worst case. First, a security strategy that directly depends on both players' history actions is derived by refining the sequence form. Noticing that history action space grows exponentially with respect to the time horizon, this paper further presents a security strategy that depends on player's fixed sized sufficient statistics. The sufficient statistics is shown to consist of the belief on one's own type, the regret on the other player's type, and the stage, and is independent of the other player's strategy.

UR - http://www.scopus.com/inward/record.url?scp=85027066908&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85027066908&partnerID=8YFLogxK

U2 - 10.23919/ACC.2017.7963514

DO - 10.23919/ACC.2017.7963514

M3 - Conference contribution

AN - SCOPUS:85027066908

T3 - Proceedings of the American Control Conference

SP - 3664

EP - 3669

BT - 2017 American Control Conference, ACC 2017

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2017 American Control Conference, ACC 2017

Y2 - 24 May 2017 through 26 May 2017

ER -