TY - JOUR
T1 - Data-Driven Tracking Control for Nonaffine Yaw Channel of Helicopter via Off-Policy Reinforcement Learning
AU - Zhang, Kun
AU - Luo, Shijie
AU - Wu, Huai Ning
AU - Su, Rong
N1 - Publisher Copyright:
© 1965-2011 IEEE.
PY - 2025
Y1 - 2025
N2 - This article presents an off-policy tracking control scheme for the continuous-time nonaffine yaw channel of uncrewed aerial vehicle helicopter. First, the article constructs an affine augmented system (AAS) within a parallel control structure to convert the original nonaffine tracking error dynamics into affine dynamics. Second, the article derives a stability criterion linking the nonaffine system and the AAS, demonstrating that the obtained zero-sum policy from the AAS can achieve the H∞ performance of the nonaffine system. Third, a data-driven off-policy tracking algorithm is designed for approximating the zero-sum solution of the Hamilton–Jacobi–Isaacs equations with unknown dynamics. Moreover, the recursive least squares process with a variable forgetting factor is employed to update the actor-critic neural network weights, with the algorithm's convergence being proven. Then, the uniformly ultimately bounded of tracking errors is guaranteed. Finally, two application examples are offered in simulation to validate the effectiveness of this presented method.
AB - This article presents an off-policy tracking control scheme for the continuous-time nonaffine yaw channel of uncrewed aerial vehicle helicopter. First, the article constructs an affine augmented system (AAS) within a parallel control structure to convert the original nonaffine tracking error dynamics into affine dynamics. Second, the article derives a stability criterion linking the nonaffine system and the AAS, demonstrating that the obtained zero-sum policy from the AAS can achieve the H∞ performance of the nonaffine system. Third, a data-driven off-policy tracking algorithm is designed for approximating the zero-sum solution of the Hamilton–Jacobi–Isaacs equations with unknown dynamics. Moreover, the recursive least squares process with a variable forgetting factor is employed to update the actor-critic neural network weights, with the algorithm's convergence being proven. Then, the uniformly ultimately bounded of tracking errors is guaranteed. Finally, two application examples are offered in simulation to validate the effectiveness of this presented method.
KW - Adaptive dynamic programming (ADP)
KW - nonaffine systems
KW - reinforcement learning (RL)
KW - tracking control
KW - uncrewed aerial vehicle (UAV) helicopter
UR - https://www.scopus.com/pages/publications/85217561040
U2 - 10.1109/TAES.2025.3539264
DO - 10.1109/TAES.2025.3539264
M3 - 文章
AN - SCOPUS:85217561040
SN - 0018-9251
VL - 61
SP - 7725
EP - 7737
JO - IEEE Transactions on Aerospace and Electronic Systems
JF - IEEE Transactions on Aerospace and Electronic Systems
IS - 3
ER -