跳到主要导航 跳到搜索 跳到主要内容

Reinforced Structured State-Evolution for Vision-Language Navigation

  • Jinyu Chen
  • , Chen Gao
  • , Erli Meng
  • , Qiong Zhang
  • , Si Liu*
  • *此作品的通讯作者
  • Beihang University
  • Xiaomi

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Vision-and-language Navigation (VLN) task requires an embodied agent to navigate to a remote location following a natural language instruction. Previous methods usually adopt a sequence model (e.g., Transformer and LSTM) as the navigator. In such a paradigm, the sequence model predicts action at each step through a maintained navigation state, which is generally represented as a one-dimensional vector. However, the crucial navigation clues (i.e., object-level environment layout) for embodied navigation task is discarded since the maintained vector is essentially unstructured. In this paper, we propose a novel Structured state-Evolution (SEvol) model to effectively maintain the environment layout clues for VLN. Specifically, we utilise the graph-based feature to represent the navigation state instead of the vector-based state. Accordingly, we devise a Reinforced Layout clues Miner (RLM) to mine and detect the most crucial layout graph for long-term navigation via a customised reinforcement learning strategy. Moreover, the Structured Evolving Module (SEM) is proposed to maintain the structured graph-based state during navigation, where the state is gradually evolved to learn the object-level spatial-temporal relationship. The experiments on the R2R and R4R datasets show that the proposed SEvol model improves VLN models' performance by large margins, e.g., +3% absolute SPL accuracy for NvEM and +8% for EnvDrop on the R2R test set.

源语言英语
主期刊名Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022
出版商IEEE Computer Society
15429-15438
页数10
ISBN(电子版)9781665469463
DOI
出版状态已出版 - 2022
活动2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022 - New Orleans, 美国
期限: 19 6月 202224 6月 2022

出版系列

姓名Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
2022-June
ISSN(印刷版)1063-6919

会议

会议2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022
国家/地区美国
New Orleans
时期19/06/2224/06/22

指纹

探究 'Reinforced Structured State-Evolution for Vision-Language Navigation' 的科研主题。它们共同构成独一无二的指纹。

引用此