跳到主要导航 跳到搜索 跳到主要内容

VLM-MSGraph: Vision Language Model-enabled Multi-hierarchical Scene Graph for robotic assembly

  • Shufei Li
  • , Zhijie Yan
  • , Zuoxu Wang*
  • , Yiping Gao
  • *此作品的通讯作者
  • Beihang University
  • City University of Hong Kong
  • Huazhong University of Science and Technology

科研成果: 期刊稿件文章同行评审

摘要

Intelligent robotic assembly is becoming a pivotal component of the manufacturing sector, driven by growing demands for flexibility, sustainability, and resilience. Robots in manufacturing environments need perception, decision-making, and manipulation skills to support the flexible production of diverse products. However, traditional robotic assembly systems typically rely on time-consuming training processes specific to fixed settings, lacking generalization and zero-shot learning capabilities. To address these challenges, this paper introduces a Vision Language Model-enabled Multi-hierarchical Scene Graph (VLM-MSGraph) approach for robotic assembly, featuring generalized assembly sequence learning and 3D manipulation in open scenarios. The MSGraph incorporates high-level task planning structured as triplets, organized by multiple VLM agents. At a low level, the MSGraph retains 3D spatial relationships between industrial parts, enabling the robot to perform assembly tasks while accounting for object geometry for effective manipulation. Assembly drawings, physics simulations, and assembly tasks in a laboratory setting are used to evaluate the proposed system, advancing flexible automation in robotics.

源语言英语
文章编号102978
期刊Robotics and Computer-Integrated Manufacturing
94
DOI
出版状态已出版 - 8月 2025

指纹

探究 'VLM-MSGraph: Vision Language Model-enabled Multi-hierarchical Scene Graph for robotic assembly' 的科研主题。它们共同构成独一无二的指纹。

引用此