Sharing Task-Relevant Information in Visual Prompt Tuning by Cross-Layer Dynamic Connection

Research output: Contribution to journalArticlepeer-review

Abstract

Recent progress has shown great potential of visual prompt tuning (VPT) when adapting pre-trained vision transformers to various downstream tasks. However, most existing solutions independently optimize prompts at each layer, thereby neglecting the usage of task-relevant information encoded in prompt tokens across layers. Additionally, existing prompt structures are prone to interference from task-irrelevant noise in input images, which can adversely affect the sharing of task-relevant information. In this paper, we propose a novel VPT approach, SVPT. It innovatively incorporates a cross-layer dynamic connection (CDC) for input prompt tokens from adjacent layers, enabling effective sharing of task-relevant information. Furthermore, we design a dynamic aggregation (DA) module that facilitates selective sharing of information between layers. The combination of CDC and DA enhances the flexibility of the attention process within the VPT framework. Building upon these foundations, SVPT introduces an attentive enhancement (AE) mechanism that automatically identifies salient image tokens and refines them with prompt tokens in an additive manner. Extensive experiments on 24 image classification and semantic segmentation benchmarks clearly demonstrate the advantages of the proposed SVPT, compared to the state-of-the-art counterparts.

Original languageEnglish
Pages (from-to)4527-4540
Number of pages14
JournalIEEE Transactions on Image Processing
Volume34
DOIs
StatePublished - 2025

Keywords

  • Transfer learning
  • parameter efficient fine-tuning
  • visual prompt tuning

Fingerprint

Dive into the research topics of 'Sharing Task-Relevant Information in Visual Prompt Tuning by Cross-Layer Dynamic Connection'. Together they form a unique fingerprint.

Cite this