TY - GEN
T1 - HF-DQN
T2 - 2024 China Automation Congress, CAC 2024
AU - Wang, Shaofan
AU - Li, Ke
AU - Zhang, Tao
AU - Zhang, Zhao
AU - Hu, Zhenning
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Reinforcement learning agents can learn to solve sequential decision tasks by interacting with the environment. However, as the complexity of the state space and task increases, the exploration effort required by these agents grows exponentially. To address this limitation, human knowledge can be integrated as valuable supplementary information for the agent. One effective method is imitation learning, where the agent learns by mimicking human-demonstrated decisions. However, human guidance need not be limited to demonstrations. In some applications, expert demonstration data may not be available, and other forms of guidance may be more appropriate, requiring less human effort. The first contribution of this work is the proposal of a concise human feedback-based reinforcement learning (HF-DQN) algorithm. This method incorporates human feedback to aid the RL process, providing guidance without requiring full demonstrations. Secondly, we constructed multiple simulated environments for autonomous navigation tasks, including ego-vehicle obstacle avoidance, visual obstacle avoidance, and UAV landing, to evaluate various TAMER (Training an Agent Manually via Evaluative Reinforcement) framework-based methods. Additionally, standard dueling-DQN was also implemented for comparison. Our findings show that HF-DQN agents demonstrate stable performance and outperform their baselines for various tasks in simulated environments.
AB - Reinforcement learning agents can learn to solve sequential decision tasks by interacting with the environment. However, as the complexity of the state space and task increases, the exploration effort required by these agents grows exponentially. To address this limitation, human knowledge can be integrated as valuable supplementary information for the agent. One effective method is imitation learning, where the agent learns by mimicking human-demonstrated decisions. However, human guidance need not be limited to demonstrations. In some applications, expert demonstration data may not be available, and other forms of guidance may be more appropriate, requiring less human effort. The first contribution of this work is the proposal of a concise human feedback-based reinforcement learning (HF-DQN) algorithm. This method incorporates human feedback to aid the RL process, providing guidance without requiring full demonstrations. Secondly, we constructed multiple simulated environments for autonomous navigation tasks, including ego-vehicle obstacle avoidance, visual obstacle avoidance, and UAV landing, to evaluate various TAMER (Training an Agent Manually via Evaluative Reinforcement) framework-based methods. Additionally, standard dueling-DQN was also implemented for comparison. Our findings show that HF-DQN agents demonstrate stable performance and outperform their baselines for various tasks in simulated environments.
KW - Autonomous navigation
KW - deep reinforcement learning
KW - human feedback
KW - human in the loop
UR - https://www.scopus.com/pages/publications/86000777115
U2 - 10.1109/CAC63892.2024.10864643
DO - 10.1109/CAC63892.2024.10864643
M3 - 会议稿件
AN - SCOPUS:86000777115
T3 - Proceedings - 2024 China Automation Congress, CAC 2024
SP - 552
EP - 557
BT - Proceedings - 2024 China Automation Congress, CAC 2024
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 1 November 2024 through 3 November 2024
ER -