Skip to main navigation Skip to search Skip to main content

A dynamic multi-task selective execution policy considering stochastic dependence between degradation and random shocks by deep reinforcement learning

  • Lujie Liu
  • , Jun Yang*
  • , Huiling Zheng
  • , Lei Li
  • , Ning Wang
  • *Corresponding author for this work
  • Beihang University

Research output: Contribution to journalArticlepeer-review

Abstract

To improve the efficiency of UAVs, it is common for a UAV to perform multiple tasks during each departure. However, existing mission abort policies primarily focus on scenarios where the system executes a single task and are not suitable to more complex multi-task missions. Moreover, in practice, degradation and random shocks often occur simultaneously, while existing studies typically only consider their separate effects on mission abort policies. To solve these problems, a multi-task selective execution policy considering the stochastic dependence between degradation and shocks is proposed to determine the next task for the system or the timing for mission abort. First, considering the health state of the system, location information, and the completion state of tasks, a multi-task selective execution policy is proposed. Next, to maximize the cumulative reward of the system, the corresponding sequential decision problem is formulated as a Markov Decision Process. Then, to address the dimensionality curse of continuous state space, a solution method based on deep reinforcement learning algorithms is tailored, incorporating an action masking technique to avoid repeated selection of already executed tasks. Finally, the effectiveness of the proposed method is verified through a numerical study using a UAV for multiple reconnaissance tasks.

Original languageEnglish
Article number110844
JournalReliability Engineering and System Safety
Volume257
DOIs
StatePublished - May 2025

Keywords

  • Deep reinforcement learning
  • Degradation
  • Mission abort
  • Multi-task selective execution policy
  • Random shocks
  • Stochastic dependence

Fingerprint

Dive into the research topics of 'A dynamic multi-task selective execution policy considering stochastic dependence between degradation and random shocks by deep reinforcement learning'. Together they form a unique fingerprint.

Cite this