跳到主要导航 跳到搜索 跳到主要内容

Sharing Task-Relevant Information in Visual Prompt Tuning by Cross-Layer Dynamic Connection

  • Beihang University

科研成果: 期刊稿件文章同行评审

摘要

Recent progress has shown great potential of visual prompt tuning (VPT) when adapting pre-trained vision transformers to various downstream tasks. However, most existing solutions independently optimize prompts at each layer, thereby neglecting the usage of task-relevant information encoded in prompt tokens across layers. Additionally, existing prompt structures are prone to interference from task-irrelevant noise in input images, which can adversely affect the sharing of task-relevant information. In this paper, we propose a novel VPT approach, SVPT. It innovatively incorporates a cross-layer dynamic connection (CDC) for input prompt tokens from adjacent layers, enabling effective sharing of task-relevant information. Furthermore, we design a dynamic aggregation (DA) module that facilitates selective sharing of information between layers. The combination of CDC and DA enhances the flexibility of the attention process within the VPT framework. Building upon these foundations, SVPT introduces an attentive enhancement (AE) mechanism that automatically identifies salient image tokens and refines them with prompt tokens in an additive manner. Extensive experiments on 24 image classification and semantic segmentation benchmarks clearly demonstrate the advantages of the proposed SVPT, compared to the state-of-the-art counterparts.

源语言英语
页(从-至)4527-4540
页数14
期刊IEEE Transactions on Image Processing
34
DOI
出版状态已出版 - 2025

指纹

探究 'Sharing Task-Relevant Information in Visual Prompt Tuning by Cross-Layer Dynamic Connection' 的科研主题。它们共同构成独一无二的指纹。

引用此