Abstract
Autonomous aerial vehicle (AAV) target tracking technology is an essential component for enabling diverse low-altitude activities. Due to the constraints on energy and computing resources of AAVs, current approaches face challenges in balancing prolonged flight duration with precise tracking while avoiding high computational complexity. Therefore, this paper proposes an energy-aware formation control algorithm for multiple AAVs to cooperatively track a target while retaining a desired formation pattern. Firstly, to achieve a balanced outcome in terms of tracking performance and control effort, an actor-critic based learning predictive rule is explored to develop a near-optimal control protocol that stabilizes error dynamics and minimizes value functions for discrete-time AAV systems. By decomposing the infinite-horizon target tracking problem into a sequence of finite-horizon sub-problems, the reinforcement learning (RL)-based predictive control algorithm can achieve fast convergence in approximating the solution of Hamilton-Jacobi-Bellman (HJB) equation. Furthermore, by employing a delicately designed asynchronous policy iteration mechanism with adjustable learning intervals in RL, the cumbersome learning process can be effectively mitigated, thereby attaining both high learning efficiency and a reduced computational burden simultaneously. The involved errors are proven to be convergent and simulation results validate the optimality of our method.
| Original language | English |
|---|---|
| Journal | IEEE Transactions on Mobile Computing |
| DOIs | |
| State | Accepted/In press - 2025 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 7 Affordable and Clean Energy
Keywords
- Autonomous aerial vehicles (AAV)
- asynchronous policy iteration
- energy consumption
- reinforcement learning (RL)
- target tracking
Fingerprint
Dive into the research topics of 'Energy-Aware Collaborative AAV Target Tracking via Reinforcement Learning-Based Predictive Control with Asynchronous Policy Iteration'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver