跳到主要导航 跳到搜索 跳到主要内容

Enhancing Cross-Modal Fine-Tuning with Gradually Intermediate Modality Generation

  • Lincan Cai
  • , Shuang Li*
  • , Wenxuan Ma
  • , Jingxuan Kang
  • , Binhui Xie
  • , Zixun Sun
  • , Chengwei Zhu
  • *此作品的通讯作者
  • Beijing Institute of Technology
  • University of Illinois at Urbana-Champaign
  • Tencent

科研成果: 期刊稿件会议文章同行评审

摘要

Large-scale pretrained models have proven immensely valuable in handling data-intensive modalities like text and image. However, finetuning these models for certain specialized modalities, such as protein sequence and cosmic ray, poses challenges due to the significant modality discrepancy and scarcity of labeled data. In this paper, we propose an end-to-end method, PaRe, to enhance cross-modal fine-tuning, aiming to transfer a large-scale pretrained model to various target modalities. PaRe employs a gating mechanism to select key patches from both source and target data. Through a modality-agnostic Patch Replacement scheme, these patches are preserved and combined to construct data-rich intermediate modalities ranging from easy to hard. By gradually intermediate modality generation, we can not only effectively bridge the modality gap to enhance stability and transferability of cross-modal fine-tuning, but also address the challenge of limited data in the target modality by leveraging enriched intermediate modality data. Compared with hand-designed, general-purpose, task-specific, and state-of-the-art cross-modal fine-tuning approaches, PaRe demonstrates superior performance across three challenging benchmarks, encompassing more than ten modalities.

源语言英语
页(从-至)5236-5257
页数22
期刊Proceedings of Machine Learning Research
235
出版状态已出版 - 2024
已对外发布
活动41st International Conference on Machine Learning, ICML 2024 - Vienna, 奥地利
期限: 21 7月 202427 7月 2024

指纹

探究 'Enhancing Cross-Modal Fine-Tuning with Gradually Intermediate Modality Generation' 的科研主题。它们共同构成独一无二的指纹。

引用此