跳到主要导航 跳到搜索 跳到主要内容

Task-aware world model learning with meta weighting via bi-level optimization

  • Huining Yuan
  • , Hongkun Dou
  • , Xingyu Jiang
  • , Yue Deng*
  • *此作品的通讯作者
  • Beihang University

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Aligning the world model with the environment for the agent's specific task is crucial in model-based reinforcement learning. While value-equivalent models may achieve better task awareness than maximum-likelihood models, they sacrifice a large amount of semantic information and face implementation issues. To combine the benefits of both types of models, we propose Task-aware Environment Modeling Pipeline with bi-level Optimization (TEMPO), a bi-level model learning framework that introduces an additional level of optimization on top of a maximum-likelihood model by incorporating a meta weighter network that weights each training sample. The meta weighter in the upper level learns to generate novel sample weights by minimizing a proposed task-aware model loss. The model in the lower level focuses on important samples while maintaining rich semantic information in state representations. We evaluate TEMPO on a variety of continuous and discrete control tasks from the DeepMind Control Suite and Atari video games. Our results demonstrate that TEMPO achieves state-of-the-art performance regarding asymptotic performance, training stability, and convergence speed.

源语言英语
主期刊名Advances in Neural Information Processing Systems 36 - 37th Conference on Neural Information Processing Systems, NeurIPS 2023
编辑A. Oh, T. Neumann, A. Globerson, K. Saenko, M. Hardt, S. Levine
出版商Neural information processing systems foundation
ISBN(电子版)9781713899921
出版状态已出版 - 2023
活动37th Conference on Neural Information Processing Systems, NeurIPS 2023 - New Orleans, 美国
期限: 10 12月 202316 12月 2023

出版系列

姓名Advances in Neural Information Processing Systems
36
ISSN(印刷版)1049-5258

会议

会议37th Conference on Neural Information Processing Systems, NeurIPS 2023
国家/地区美国
New Orleans
时期10/12/2316/12/23

指纹

探究 'Task-aware world model learning with meta weighting via bi-level optimization' 的科研主题。它们共同构成独一无二的指纹。

引用此