Evaluation and learning in two-player symmetric games via best and better responses

  • Rui Yan
  • , Weixian Zhang
  • , Ruiliang Deng
  • , Xiaoming Duan
  • , Zongying Shi*
  • , Yisheng Zhong
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

This paper focuses on filling the gap between strategy evaluation and strategy learning in two-player symmetric games, as a learning algorithm may converge to the strategies not preferred by an evaluation metric. When a player determines its strategies, it needs to first evaluate candidate strategies without knowing the opponents' decisions. Then, based on the result of the evaluation, a preferred strategy is selected. On the contrary, many multi-agent reinforcement learning algorithms are constructed provided that the strategies of other players are known in each training episode. In this paper, we first introduce two graph-based metrics grounded on sink equilibrium to characterize the preferred strategies of the players in strategy evaluation. These metrics can be regarded as generalized solution concepts in games. Then, we propose two variants of the classical self-play algorithm, named strictly best-response and weakly better-response self-plays, to learn the strategies for the players. By modeling the learning processes as walks over joint-strategy response digraphs, we prove that under some conditions, the learned strategies by two variants are the preferred strategies under two metrics, respectively, which thus fills the evaluation–learning gap, and ensures that the preferred strategies are learned. We also investigate the relationship between the two metrics.

Original languageEnglish
Article number119459
JournalInformation Sciences
Volume647
DOIs
StatePublished - Nov 2023
Externally publishedYes

Keywords

  • Best and better responses
  • Game theory
  • Multi-agent reinforcement learning
  • Self-play
  • Strategy evaluation

Fingerprint

Dive into the research topics of 'Evaluation and learning in two-player symmetric games via best and better responses'. Together they form a unique fingerprint.

Cite this