Exp3.P-Based Autonomous Decision Algorithm Against Nonstationary Opponents With Partially Known Policies

  • Jin Zhu
  • , Chunhui Du
  • , Jiacheng Chen
  • , Lei Huang
  • , Geir E. Dullerud

Research output: Contribution to journalArticlepeer-review

Abstract

This article takes into account multiagent games during which the opponents can change policies and their policy sets are partially known. Our goal is to generate an effective policy such that our agent can obtain a higher reward and meanwhile guarantee bounded regret. Considering such games against nonstationary opponents with partially known policies, Exp3.P-based autonomous decision (EAD) algorithm is put forward, which contains three steps. First, we learn the embedding of the opponent’s policy via conditional encoder–decoder and employ conditional RL to generate the targeted policy. Second, we estimate the opponent policy through online Bayesian belief updates. Finally, we select the adversarial and targeted policy via a multiarmed bandit algorithm. Theoretical analysis is performed for the EAD algorithm. We give the lower bound of the expected reward when using the targeted policy and prove that the EAD algorithm has a bounded regret. Experimental results on Kuhn poker and Grid-world Predator–Prey show the effectiveness of the proposed EAD algorithm.

Original languageEnglish (US)
Pages (from-to)975-988
Number of pages14
JournalIEEE Transactions on Games
Volume17
Issue number4
DOIs
StatePublished - 2025

Keywords

  • Exp3.P-based autonomous decision (EAD)
  • multiarmed bandits
  • nonstationary opponents with partially known policies
  • opponent modeling

ASJC Scopus subject areas

  • Software
  • Control and Systems Engineering
  • Electrical and Electronic Engineering
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Exp3.P-Based Autonomous Decision Algorithm Against Nonstationary Opponents With Partially Known Policies'. Together they form a unique fingerprint.

Cite this