跳到主要导航 跳到搜索 跳到主要内容

Towards Long-delayed Sparsity: Learning a Better Transformer through Reward Redistribution

  • Beihang University

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Recently, Decision Transformer (DT) pioneered the offline RL into a contextual conditional sequence modeling paradigm, which leverages self-attended autoregression to learn from global target rewards, states, and actions. However, many applications have a severe delay of the above signals, such as the agent can only obtain a reward signal at the end of each trajectory. This delay causes an unwanted bias cumulating in autoregressive learning global signals. In this paper, we focused its virtual example on episodic reinforcement learning with trajectory feedback. We propose a new reward redistribution algorithm for learning parameterized reward functions, and it decomposes the long-delayed reward onto each timestep. To improve the redistributing's adaptation ability, we formulate the previous decomposition as a bi-level optimization problem for global optimal. We extensively evaluate the proposed method on various benchmarks and demonstrate an overwhelming performance improvement under long-delayed settings.

源语言英语
主期刊名Proceedings of the 32nd International Joint Conference on Artificial Intelligence, IJCAI 2023
编辑Edith Elkind
出版商International Joint Conferences on Artificial Intelligence
4693-4701
页数9
ISBN(电子版)9781956792034
DOI
出版状态已出版 - 2023
活动32nd International Joint Conference on Artificial Intelligence, IJCAI 2023 - Macao, 中国
期限: 19 8月 202325 8月 2023

出版系列

姓名IJCAI International Joint Conference on Artificial Intelligence
2023-August
ISSN(印刷版)1045-0823

会议

会议32nd International Joint Conference on Artificial Intelligence, IJCAI 2023
国家/地区中国
Macao
时期19/08/2325/08/23

指纹

探究 'Towards Long-delayed Sparsity: Learning a Better Transformer through Reward Redistribution' 的科研主题。它们共同构成独一无二的指纹。

引用此