TY - GEN
T1 - Distributed UAV Swarm Confrontation Decision-Making Based on Reinforcement Learning
AU - Liang, Hongya
AU - Fan, Yao
AU - Zhou, Jianwei
AU - Zheng, Shuai
AU - Li, Xiaoduo
AU - Han, Liang
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - With the rapid development of unmanned aerial vehicles (UAV) technology, the decision-making process in UAV swarm confrontation has become a critical research focus both domestically and internationally. To address the computational challenges posed by centralized decision-making methods, this paper introduces a decentralized framework for strategic decision-making in UAV swarm conflicts. First, this paper constructs a confrontation scenario consisting of multiple homogeneous and equal-numbered UAVs, and allocates a specific strike target to each UAV through a target allocation algorithm, transforming the multi-UAV combat into single UAV combat tasks. Then, the Deep Deterministic Policy Gradient (DDPG) algorithm is employed to train the decision-making model for the solitary UAV engagement. To markedly enhance the model's convergence rate, a strategy integrating reward shaping and curriculum learning is implemented. Moreover, the Artificial Potential Field (APF) method is employed to address the issue of collision prevention in multi-UAV operations. Ultimately, numerical simulation validates the effectiveness and scalability of the proposed approach.
AB - With the rapid development of unmanned aerial vehicles (UAV) technology, the decision-making process in UAV swarm confrontation has become a critical research focus both domestically and internationally. To address the computational challenges posed by centralized decision-making methods, this paper introduces a decentralized framework for strategic decision-making in UAV swarm conflicts. First, this paper constructs a confrontation scenario consisting of multiple homogeneous and equal-numbered UAVs, and allocates a specific strike target to each UAV through a target allocation algorithm, transforming the multi-UAV combat into single UAV combat tasks. Then, the Deep Deterministic Policy Gradient (DDPG) algorithm is employed to train the decision-making model for the solitary UAV engagement. To markedly enhance the model's convergence rate, a strategy integrating reward shaping and curriculum learning is implemented. Moreover, the Artificial Potential Field (APF) method is employed to address the issue of collision prevention in multi-UAV operations. Ultimately, numerical simulation validates the effectiveness and scalability of the proposed approach.
KW - UAV swarm
KW - curriculum learning
KW - ddpg
KW - reward reshaping
UR - https://www.scopus.com/pages/publications/105013970766
U2 - 10.1109/CCDC65474.2025.11090578
DO - 10.1109/CCDC65474.2025.11090578
M3 - 会议稿件
AN - SCOPUS:105013970766
T3 - Proceedings of the 37th Chinese Control and Decision Conference, CCDC 2025
SP - 5359
EP - 5364
BT - Proceedings of the 37th Chinese Control and Decision Conference, CCDC 2025
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 37th Chinese Control and Decision Conference, CCDC 2025
Y2 - 16 May 2025 through 19 May 2025
ER -