TY - GEN
T1 - Understanding Extortion and Fairness in Iterated Prisoner's Dilemma Through Actor-Critic Learning Dynamics
AU - Geng, Yuxin
AU - Chen, Xingru
N1 - Publisher Copyright:
© 2025 Technical Committee on Control Theory, Chinese Association of Automation.
PY - 2025
Y1 - 2025
N2 - Iterated Prisoner's Dilemma (IPD) is widely used to investigate the emergence and stability of cooperative behavior in both social and biological systems. Through repeated interactions, cooperation can be cultivated, enhancing the long-term mutual benefit a principle fundamental to direct reciprocity. However, this mechanism also permits extortionate Zero-Determinant (ZD) strategies to exploit Always Cooperate (ALLC) and other cooperative strategies. In this study, we examine the learning dynamics of actor-critic agents when confronted with extortionate ZD strategies in IPD. We analyze the learning process of actor-critic agents in both stochastic and deterministic settings, exploring the condition under which cooperation can emerge and persist in the presence of extortionate ZD strategies. Furthermore, we scrutinize the balance between fairness and exploitation, examining how extortionate ZD strategies can maximize their rewards without destabilizing cooperative equilibria. Our results offer valuable insights into the reinforcement learning dynamics in the context of IPD and illuminate the interplay between evolutionary game theory and reinforcement learning in understanding the emergence of cooperation, paving the way for developing resilient, cooperative multi-agent systems.
AB - Iterated Prisoner's Dilemma (IPD) is widely used to investigate the emergence and stability of cooperative behavior in both social and biological systems. Through repeated interactions, cooperation can be cultivated, enhancing the long-term mutual benefit a principle fundamental to direct reciprocity. However, this mechanism also permits extortionate Zero-Determinant (ZD) strategies to exploit Always Cooperate (ALLC) and other cooperative strategies. In this study, we examine the learning dynamics of actor-critic agents when confronted with extortionate ZD strategies in IPD. We analyze the learning process of actor-critic agents in both stochastic and deterministic settings, exploring the condition under which cooperation can emerge and persist in the presence of extortionate ZD strategies. Furthermore, we scrutinize the balance between fairness and exploitation, examining how extortionate ZD strategies can maximize their rewards without destabilizing cooperative equilibria. Our results offer valuable insights into the reinforcement learning dynamics in the context of IPD and illuminate the interplay between evolutionary game theory and reinforcement learning in understanding the emergence of cooperation, paving the way for developing resilient, cooperative multi-agent systems.
KW - cooperation
KW - extortion
KW - iterated Prisoner's Dilemma
KW - reinforcement learning
UR - https://www.scopus.com/pages/publications/105020288966
U2 - 10.23919/CCC64809.2025.11179687
DO - 10.23919/CCC64809.2025.11179687
M3 - 会议稿件
AN - SCOPUS:105020288966
T3 - Chinese Control Conference, CCC
SP - 8410
EP - 8415
BT - Proceedings of the 44th Chinese Control Conference, CCC 2025
A2 - Sun, Jian
A2 - Yin, Hongpeng
PB - IEEE Computer Society
T2 - 44th Chinese Control Conference, CCC 2025
Y2 - 28 July 2025 through 30 July 2025
ER -