跳到主要导航 跳到搜索 跳到主要内容

Learning Noise-Induced Reward Functions for Surpassing Demonstrations in Imitation Learning

  • Beihang University

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Imitation learning (IL) has recently shown impressive performance in training a reinforcement learning agent with human demonstrations, eliminating the difficulty of designing elaborate reward functions in complex environments. However, most IL methods work under the assumption of the optimality of the demonstrations and thus cannot learn policies to surpass the demonstrators. Some methods have been investigated to obtain better-than-demonstration (BD) performance with inner human feedback or preference labels. In this paper, we propose a method to learn rewards from suboptimal demonstrations via a weighted preference learning technique (LERP). Specifically, we first formulate the suboptimality of demonstrations as the inaccurate estimation of rewards. The inaccuracy is modeled with a reward noise random variable following the Gumbel distribution. Moreover, we derive an upper bound of the expected return with different noise coefficients and propose a theorem to surpass the demonstrations. Unlike existing literature, our analysis does not depend on the linear reward constraint. Consequently, we develop a BD model with a weighted preference learning technique. Experimental results on continuous control and high-dimensional discrete control tasks show the superiority of our LERP method over other state-of-the-art BD methods.

源语言英语
主期刊名AAAI-23 Technical Tracks 7
编辑Brian Williams, Yiling Chen, Jennifer Neville
出版商AAAI press
7953-7961
页数9
ISBN(电子版)9781577358800
DOI
出版状态已出版 - 27 6月 2023
活动37th AAAI Conference on Artificial Intelligence, AAAI 2023 - Washington, 美国
期限: 7 2月 202314 2月 2023

出版系列

姓名Proceedings of the 37th AAAI Conference on Artificial Intelligence, AAAI 2023
37

会议

会议37th AAAI Conference on Artificial Intelligence, AAAI 2023
国家/地区美国
Washington
时期7/02/2314/02/23

指纹

探究 'Learning Noise-Induced Reward Functions for Surpassing Demonstrations in Imitation Learning' 的科研主题。它们共同构成独一无二的指纹。

引用此