摘要
Fuzzy inference systems with reinforcement learning are currently being used in differential games to train agents with no prior experience. However, the reinforcement learning algorithms based on actor-critic structure have a drawback that the policy is depended on a probability distribution. In this paper, a novel fuzzy deterministic policy gradient algorithm is introduced and applied to classical 1-vs-1 constant-velocity pursuit-evasion differential games. The key goal is to self-learn the optimal strategy in the continuous action domain and obtain a specific physical meaning of the fuzzy rules. The novel proposed algorithm is based on the deterministic policy gradient theorem and the agent learns the near-optimal strategy under the actor-critic structure. The fuzzy inference system is applied as approximators so that the specific physical meaning can be obtained by the linguistic fuzzy rules. Furthermore, the proposed algorithm is applied to solve the decision-making problem of pursuit-evasion differential games. The result is compared with other existing algorithms and it elucidates that the proposed algorithm outperforms the precision and convergence efficiency.
| 源语言 | 英语 |
|---|---|
| 页(从-至) | 106-117 |
| 页数 | 12 |
| 期刊 | Neurocomputing |
| 卷 | 362 |
| DOI | |
| 出版状态 | 已出版 - 14 10月 2019 |
指纹
探究 'A fuzzy deterministic policy gradient algorithm for pursuit-evasion differential games' 的科研主题。它们共同构成独一无二的指纹。引用此
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver