TY - GEN
T1 - An Improved Deep Reinforcement Learning-Based Method for Optimal Kill Chain Combination Selection
AU - Shi, Yuemeng
AU - Huang, Ning
AU - Sun, Lina
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - This paper addresses key challenges in kill chain optimization, including complex state structures, diverse resource constraints, and inefficient strategy generation. To solve these issues, an improved Advantage Actor-Critic algorithm based on deep reinforcement learning is proposed for selecting optimal kill chain combinations. A simulation environment that incorporates resource and time constraints is developed to model distributed combat scenarios. A feature extraction network integrating a gated recurrent unit with a multi-head attention mechanism is designed to enhance the model's ability to capture state-time dependencies and identify key features. The model is trained using the Advantage ActorCritic algorithm to improve policy exploration efficiency and convergence. Experimental results from a combat scenario show that the proposed method outperforms baseline algorithms, including the basic Advantage Actor-Critic algorithm, Proximal Policy Optimization, as well as genetic algorithms in Heuristic algorithms. The results confirm the method's adaptability and practicality, providing a feasible solution for selecting kill chains in complex combat systems.
AB - This paper addresses key challenges in kill chain optimization, including complex state structures, diverse resource constraints, and inefficient strategy generation. To solve these issues, an improved Advantage Actor-Critic algorithm based on deep reinforcement learning is proposed for selecting optimal kill chain combinations. A simulation environment that incorporates resource and time constraints is developed to model distributed combat scenarios. A feature extraction network integrating a gated recurrent unit with a multi-head attention mechanism is designed to enhance the model's ability to capture state-time dependencies and identify key features. The model is trained using the Advantage ActorCritic algorithm to improve policy exploration efficiency and convergence. Experimental results from a combat scenario show that the proposed method outperforms baseline algorithms, including the basic Advantage Actor-Critic algorithm, Proximal Policy Optimization, as well as genetic algorithms in Heuristic algorithms. The results confirm the method's adaptability and practicality, providing a feasible solution for selecting kill chains in complex combat systems.
KW - Combinatorial Optimization
KW - Deep Reinforcement Learning
KW - Improved A2C Algorithm
KW - Optimal Kill Chain Combination Selection
KW - Resource Constraints
UR - https://www.scopus.com/pages/publications/105030037595
U2 - 10.1109/ICRMS65480.2025.00061
DO - 10.1109/ICRMS65480.2025.00061
M3 - 会议稿件
AN - SCOPUS:105030037595
T3 - Proceedings - 2025 16th International Conference on Reliability, Maintainability and Safety, ICRMS 2025
SP - 316
EP - 321
BT - Proceedings - 2025 16th International Conference on Reliability, Maintainability and Safety, ICRMS 2025
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 16th International Conference on Reliability, Maintainability and Safety, ICRMS 2025
Y2 - 27 July 2025 through 30 July 2025
ER -