跳到主要导航 跳到搜索 跳到主要内容

Soft-label guided multi-granularity prompts learning for human-object interaction detection

  • Xiaoqian Han
  • , Xiaowei Zhang*
  • , Guanglin Niu
  • , Mingliang Zhou
  • , Zhenkuan Pan
  • *此作品的通讯作者
  • Qingdao University
  • Chongqing University

科研成果: 期刊稿件文章同行评审

摘要

Vision-language models (VLMs) have driven substantial progress in human-object interaction (HOI) detection. However, existing VLM-based HOI detectors typically rely on coarse multimodal prompts for knowledge transfer, which makes it difficult to comprehensively capture interaction-relevant contextual cues and consequently weakens generalization to HOI detection. Meanwhile, hard-label supervised learning fundamentally ignores semantic correlations among interaction categories, which tends to suppress knowledge transfer due to misalignment with the continuous semantic similarity structure encoded by VLM representations in the embedding space. To address these challenges, we propose SMPL, a Soft-label guided Multi-granularity Prompt Learning model for HOI detection, which facilitates prompt learning by jointly capturing multi-level interaction cues and providing semantically calibrated supervision aligned with VLM embeddings. Specifically, we design multi-granularity visual and textual prompts to capture interaction cues at different levels of detail, thereby improving generalization to interaction categories. Moreover, we introduce soft-label learning to jointly optimize interaction classification with the hard-labels and soft-label supervision, which naturally reflects interaction-level semantic similarity, enabling the model to learn implicit interaction relations without additional annotations. Extensive experiments demonstrate that SMPL achieves 38.97 mAP on the HICO-DET dataset and improves performance by 2.64 mAP over the current state of the art on the challenging Rare split. SMPL also performs strongly under multiple zero-shot HOI settings, demonstrating excellent generalization to unseen interactions. The code and models are available at https://github.com/hxqstree/SMPL.

源语言英语
文章编号114765
期刊Applied Soft Computing
192
DOI
出版状态已出版 - 4月 2026

指纹

探究 'Soft-label guided multi-granularity prompts learning for human-object interaction detection' 的科研主题。它们共同构成独一无二的指纹。

引用此