Skip to main navigation Skip to search Skip to main content

Curiosity-tuned experience replay for wargaming decision modeling without reward-engineering

  • Beihang University
  • Zhongguancun Laboratory

Research output: Contribution to journalArticlepeer-review

Abstract

Reinforcement Learning (RL) has become a promising technique to deal with the tough decision modeling problem in the wargaming field. However, to deploy current RL algorithms requires reward-engineering scenario by scenario, which is laborious for massive wargaming scenarios. To tackle this issue, this paper proposes an improved RL method, curiosity-tuned experience replay (CTER), which allows the RL-driven decision model to achieve a relatively effective policy under the sparse reward. CTER uses the curiosity mechanism to regulate the three critical procedures during learning with experience replay: the exploration, storage, and revisitation of the experiences. Based on the prediction-based curiosity, CTER generates an intrinsic reward to fill the sparse reward space, and further provides an adaptive exploration strategy to collect more informative experiences. Moreover, CTER develops a novel prioritized replay and memory updating mechanism to reuse experiences more efficiently. Through the systematic evaluation and comparison on typical game tasks and wargaming tasks, CTER shows its effectiveness and generalization in different scenarios without reward-engineering. Especially, the policy performance of CTER-based RL with the sparse reward is almost equivalent to that of ordinary RL with dense engineered rewards. Our work may offer a relatively universal approach for wargaming decision modeling, which can free the RL-based decision modelers from the laborious reward-engineering.

Original languageEnglish
Article number102842
JournalSimulation Modelling Practice and Theory
Volume129
DOIs
StatePublished - Dec 2023

Keywords

  • Curiosity
  • Experience replay
  • Reinforcement learning
  • Reward
  • Wargaming

Fingerprint

Dive into the research topics of 'Curiosity-tuned experience replay for wargaming decision modeling without reward-engineering'. Together they form a unique fingerprint.

Cite this