TY - GEN
T1 - Robot Path Planning Method Based on an Improved TD3 Algorithm
AU - Lu, Xiao
AU - Yu, Guizhen
AU - Zhou, Bin
AU - Zhang, Junjie
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Traditional path planning methods often rely heavily on environmental maps, while deep reinforcement learning (DRL) algorithms face challenges in achieving stable policy convergence in complex environments. To overcome these challenges, this study introduces an enhanced Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm for autonomous path planning in unknown environments. First, temporally correlated the algorithm integrates Ornstein-Uhlenbeck (OU) noise to improve action exploration. Second, an Uncertainty-Weighted TD3 (UW-TD3) mechanism is introduced, which fuses Q-value estimates based on confidence weights to enhance stability and accelerate convergence. Finally, Hindsight Experience Replay (HER) is integrated to improve sample efficiency, enabling the agent to learn effectively from unsuccessful experiences. Simulation results under static as well as dynamic obstacle scenarios demonstrate that the proposed method significantly improves planning success rates and training efficiency, outperforming conventional TD3 and DDPG algorithms.
AB - Traditional path planning methods often rely heavily on environmental maps, while deep reinforcement learning (DRL) algorithms face challenges in achieving stable policy convergence in complex environments. To overcome these challenges, this study introduces an enhanced Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm for autonomous path planning in unknown environments. First, temporally correlated the algorithm integrates Ornstein-Uhlenbeck (OU) noise to improve action exploration. Second, an Uncertainty-Weighted TD3 (UW-TD3) mechanism is introduced, which fuses Q-value estimates based on confidence weights to enhance stability and accelerate convergence. Finally, Hindsight Experience Replay (HER) is integrated to improve sample efficiency, enabling the agent to learn effectively from unsuccessful experiences. Simulation results under static as well as dynamic obstacle scenarios demonstrate that the proposed method significantly improves planning success rates and training efficiency, outperforming conventional TD3 and DDPG algorithms.
KW - deep reinforcement learning
KW - hindsight experience replay
KW - OU noise
KW - path planning
KW - TD3
KW - uncertainty-based Q-value fusion
UR - https://www.scopus.com/pages/publications/105031080344
U2 - 10.1109/CIR65373.2025.11257242
DO - 10.1109/CIR65373.2025.11257242
M3 - 会议稿件
AN - SCOPUS:105031080344
T3 - 2025 International Conference on Computational Intelligence and Robotics, CIR 2025
SP - 21
EP - 26
BT - 2025 International Conference on Computational Intelligence and Robotics, CIR 2025
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2025 International Conference on Computational Intelligence and Robotics, CIR 2025
Y2 - 12 September 2025 through 14 September 2025
ER -