TY - JOUR
T1 - Hydraulic-Supports Alignment by TD3 with Segmented Experience Pool
AU - Yang, Yi
AU - Dai, Yapeng
AU - Wang, Tian
AU - Qian, Wei
N1 - Publisher Copyright:
© The Author(s) 2025.
PY - 2025/4
Y1 - 2025/4
N2 - Hydraulic-supports alignment is to keep the coal mining face in line and is heavily influenced by the various geological states. The experiences produced by the moving process are unbalanced, which leads to the agent not learning important knowledge from the rare samples. This paper is the first to introduce the reinforcement learning to the hydraulic-supports alignment, and establish the Markov optimal decision model by TD3 algorithm. Aiming at the imbalance issue of the experience, this paper proposes a segmented experience pool and three sampling replay mechanisms according to the characteristics of the moving process with various geological states. Experimental results show that the improved TD3, utilizing a segmented experience pool with three different replay mechanisms, could effectively identify the optimal moving policy and achieve significant convergence in cases involving both normal movement and insufficient movement of hydraulic-supports. In contrast, the TD3 performs inadequately and struggles to find the optimal policy.
AB - Hydraulic-supports alignment is to keep the coal mining face in line and is heavily influenced by the various geological states. The experiences produced by the moving process are unbalanced, which leads to the agent not learning important knowledge from the rare samples. This paper is the first to introduce the reinforcement learning to the hydraulic-supports alignment, and establish the Markov optimal decision model by TD3 algorithm. Aiming at the imbalance issue of the experience, this paper proposes a segmented experience pool and three sampling replay mechanisms according to the characteristics of the moving process with various geological states. Experimental results show that the improved TD3, utilizing a segmented experience pool with three different replay mechanisms, could effectively identify the optimal moving policy and achieve significant convergence in cases involving both normal movement and insufficient movement of hydraulic-supports. In contrast, the TD3 performs inadequately and struggles to find the optimal policy.
KW - Hydraulic-supports alignment
KW - Markov optimal decision
KW - Segmented experience pool
KW - TD3 algorithm
KW - Various geological states
UR - https://www.scopus.com/pages/publications/105000882182
U2 - 10.1007/s11063-025-11744-y
DO - 10.1007/s11063-025-11744-y
M3 - 文章
AN - SCOPUS:105000882182
SN - 1370-4621
VL - 57
JO - Neural Processing Letters
JF - Neural Processing Letters
IS - 2
M1 - 35
ER -