An identifier-actor-optimizer policy learning architecture for optimal control of continuous-time nonlinear systems

  • Lin Cheng
  • , Zhen Bo Wang
  • , Fang Hua Jiang*
  • , Jun Feng Li
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

An intelligent solution method is proposed to achieve real-time optimal control for continuous-time nonlinear systems using a novel identifier-actor-optimizer (IAO) policy learning architecture. In this IAO-based policy learning approach, a dynamical identifier is developed to approximate the unknown part of system dynamics using deep neural networks (DNNs). Then, an indirect-method-based optimizer is proposed to generate high-quality optimal actions for system control considering both the constraints and performance index. Furthermore, a DNN-based actor is developed to approximate the obtained optimal actions and return good initial guesses to the optimizer. In this way, the traditional optimal control methods and state-of-the-art DNN techniques are combined in the IAO-based optimal policy learning method. Compared to the reinforcement learning algorithms with actor-critic architectures that suffer hard reward design and low computational efficiency, the IAO-based optimal policy learning algorithm enjoys fewer user-defined parameters, higher learning speeds, and steadier convergence properties in solving complex continuous-time optimal control problems (OCPs). Simulation results of three space flight control missions are given to substantiate the effectiveness of this IAO-based policy learning strategy and to illustrate the performance of the developed DNN-based optimal control method for continuous-time OCPs.

Original languageEnglish
Article number264511
JournalScience China: Physics, Mechanics and Astronomy
Volume63
Issue number6
DOIs
StatePublished - 1 Jun 2020
Externally publishedYes

Keywords

  • continous-time nonlinear systems
  • deep neural net-works
  • identifier-actor-optimizer architecture
  • intelligent optimal control

Fingerprint

Dive into the research topics of 'An identifier-actor-optimizer policy learning architecture for optimal control of continuous-time nonlinear systems'. Together they form a unique fingerprint.

Cite this