Abstract
This paper takes into account a general two-player zero-sum Markov game scenario in which our agent faces multi-type opponents with multiple policies. To enhance our agent’s return against opponent’s diverse policies, a novel Decision-making Framework based on Opponent Distinguishing and Policy Judgment (DF-ODPJ) is proposed. On the basis of the pre-trained Nash equilibrium strategies, DF-ODPJ can distinguish the opponent’s type by sampling from the interaction trajectory. Then a fast criterion is proposed to judge the opponent’s policy which is proven to minimize the misjudgment probability with optimal threshold calculated. According to the identification results, appropriate policies are generated to enhance the return. The proposed DF-ODPJ is more flexible since it is orthogonal to existing Nash equilibrium algorithms and single-agent reinforcement learning algorithms. The experimental results on grid world, video games, and UAV aerial combat environments illustrate the effectiveness of DF-ODPJ. The code is available at https://github.com/ChenXJ295/DF-ODPJ.
| Original language | English (US) |
|---|---|
| Article number | 449 |
| Journal | Applied Intelligence |
| Volume | 55 |
| Issue number | 6 |
| Early online date | Feb 14 2025 |
| DOIs | |
| State | Published - Apr 2025 |
Keywords
- Opponent distinguishing and policy judgment
- Opponents with diverse policies
- Reward enhancement
- Two-player zero-sum Markov games
ASJC Scopus subject areas
- Artificial Intelligence
Fingerprint
Dive into the research topics of 'Enhanced decision framework for two-player zero-sum Markov games with diverse opponent policies'. Together they form a unique fingerprint.Cite this
- APA
- Standard
- Harvard
- Vancouver
- Author
- BIBTEX
- RIS