TY - GEN
T1 - An Efficient MADDPG with Episode-Parallel Interaction and Dual Priority Experience Replay
AU - Zhou, Ping
AU - Lu, Hui
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024.
PY - 2024
Y1 - 2024
N2 - Multi-agent Deep Deterministic Policy Gradient (MADDPG) is a common multi-agent deep reinforcement learning algorithm applied in both cooperative and competitive scenarios. However, the frequent interactions with the environment and indiscriminate sampling for training models will lead to poor training efficiency and low convergence performance. To overcome above limitations, this paper proposes an efficient MADDPG with episode-parallel interaction and dual priority experience replay (EIDPER-MADDPG), which can achieve a better convergence performance in a shorter training time. Firstly, we devise a parallel interaction architecture to utilize multiple processes for collecting experiences and learning from them repeatedly in one sampling. Secondly, considering the contributions of samples from two perspectives in model training and task scenarios, we redesign a dual priority experience replay for evaluating samples’ importance, which provides more valuable samples for training and enhances the convergence performance. Furthermore, we conduct simulations to demonstrate the effectiveness of the proposed algorithm in terms of training efficiency and convergence performance.
AB - Multi-agent Deep Deterministic Policy Gradient (MADDPG) is a common multi-agent deep reinforcement learning algorithm applied in both cooperative and competitive scenarios. However, the frequent interactions with the environment and indiscriminate sampling for training models will lead to poor training efficiency and low convergence performance. To overcome above limitations, this paper proposes an efficient MADDPG with episode-parallel interaction and dual priority experience replay (EIDPER-MADDPG), which can achieve a better convergence performance in a shorter training time. Firstly, we devise a parallel interaction architecture to utilize multiple processes for collecting experiences and learning from them repeatedly in one sampling. Secondly, considering the contributions of samples from two perspectives in model training and task scenarios, we redesign a dual priority experience replay for evaluating samples’ importance, which provides more valuable samples for training and enhances the convergence performance. Furthermore, we conduct simulations to demonstrate the effectiveness of the proposed algorithm in terms of training efficiency and convergence performance.
KW - Dual Priority Experience Replay
KW - MADDPG
KW - Parallel Architecture
UR - https://www.scopus.com/pages/publications/85199362314
U2 - 10.1007/978-981-97-3336-1_45
DO - 10.1007/978-981-97-3336-1_45
M3 - 会议稿件
AN - SCOPUS:85199362314
SN - 9789819733354
T3 - Lecture Notes in Electrical Engineering
SP - 527
EP - 538
BT - Proceedings of 2023 7th Chinese Conference on Swarm Intelligence and Cooperative Control - Swarm Decision and Planning Technologies
A2 - Li, Xiaoduo
A2 - Song, Xun
A2 - Zhou, Yingjiang
PB - Springer Science and Business Media Deutschland GmbH
T2 - 7th Chinese Conference on Swarm Intelligence and Cooperative Control, CCSICC 2023
Y2 - 24 November 2023 through 27 November 2023
ER -