跳到主要导航 跳到搜索 跳到主要内容

STRUCTURED INSTRUCTION PARSING AND SCENE ALIGNMENT FOR UAV VISION-LANGUAGE NAVIGATION

  • Beihang University

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Recent advances in aerial Vision-and-Language Navigation (VLN) have introduced a more meaningful and practical paradigm of VLN by considering significantly longer paths and more complex spatial reasoning compared to ground-based VLN. However, the larger scale and increased complexity of outdoor environments in aerial VLN present substantial challenges in establishing accurate correspondence between textual instructions and visual scenes. In this work, we propose to incorporate Large Language Models (LLMs) to extract key components from navigation instructions and construct the corresponding subtasks. This structured instruction parsing module ensures the appropriate granularity of navigation instructions, enabling more precise alignment between language and visual cues. To further enhance the integration of multi-modal information and cross-modal understanding, we introduce a scene-based subtask alignment policy that effectively associates each parsed subtask with corresponding visual observations along the navigation path. Combined, the proposed approach significantly outperforms current state-of-the-art methods on the AerialVLN dataset.

源语言英语
主期刊名2025 IEEE International Conference on Image Processing, ICIP 2025 - Proceedings
出版商IEEE Computer Society
2600-2605
页数6
ISBN(电子版)9798331523794
DOI
出版状态已出版 - 2025
活动32nd IEEE International Conference on Image Processing, ICIP 2025 - Anchorage, 美国
期限: 14 9月 202517 9月 2025

出版系列

姓名Proceedings - International Conference on Image Processing, ICIP
ISSN(印刷版)1522-4880

会议

会议32nd IEEE International Conference on Image Processing, ICIP 2025
国家/地区美国
Anchorage
时期14/09/2517/09/25

指纹

探究 'STRUCTURED INSTRUCTION PARSING AND SCENE ALIGNMENT FOR UAV VISION-LANGUAGE NAVIGATION' 的科研主题。它们共同构成独一无二的指纹。

引用此