跳到主要导航 跳到搜索 跳到主要内容

A Hierarchical Vision-Language and Reinforcement Learning Framework for Robotic Task and Motion Planning in Collaborative Manipulation

  • Junnan Zhang
  • , Chaoxu Mu*
  • , Xin Xu
  • , Lei Ren
  • *此作品的通讯作者
  • Tianjin University
  • National University of Defense Technology

科研成果: 期刊稿件文章同行评审

摘要

Vision-language-action models (VLAs) use an end-to-end learning architecture, which can realize the integration of visual perception, semantic understanding and motion control. However, when tackling with the dynamic or long-horizon tasks, VLAs have poor robustness and real-time adjustment ability against changes in target objects, instructions, and environments. To handles these limitations, we propose VL-RL, a hierarchical framework that consists of the vision-language (VL) planner that owns excellent VL information understanding and high-level task planning abilities and reinforcement learning (RL)-based low-level motion planner with enhanced flexibility and broader applicability. If the environmental state changes during task execution, the RL planner in VL-RL will directly make dynamic adjustments at the subtask level based on visual feedback to achieve the task goals, without the need for time-consuming information processing from VL planner. Experiments demonstrate that VL-RL can more efficiently and stably complete dual-robot collaborative manipulation tasks. Finally, our work is verified by dynamic grasping tasks and long-horizon complex tasks.

源语言英语
页(从-至)65-72
页数8
期刊IEEE Robotics and Automation Letters
11
1
DOI
出版状态已出版 - 2026

指纹

探究 'A Hierarchical Vision-Language and Reinforcement Learning Framework for Robotic Task and Motion Planning in Collaborative Manipulation' 的科研主题。它们共同构成独一无二的指纹。

引用此