跳到主要导航 跳到搜索 跳到主要内容

CLIPFusion: Infrared and visible image fusion network based on image–text large model and adaptive learning

  • Dongdong Sun
  • , Chuanyun Wang*
  • , Tian Wang
  • , Qian Gao
  • , Qiong Liu
  • , Linlin Wang
  • *此作品的通讯作者
  • Shenyang Aerospace University
  • Beijing Information Science & Technology University

科研成果: 期刊稿件文章同行评审

摘要

The goal of infrared and visible image fusion is to integrate complementary multimodal images to produce highly informative and visually effective fused images, which have a wide range of applications in automated driving, fault diagnosis and night vision. Since the infrared and visible image fusion task usually does not have real labels as a reference, the design of the loss function is highly influenced by human subjectivity, which limits the performance of the model. To address the issue of insufficient real labels, this paper designs a prompt generation network based on the image–text large model, which learns text prompts for different types of images by restricting the distances between unimodal image prompts and fused image prompts to the corresponding images in the potential space of the image–text large model. The learned prompt texts are then used as labels for fused image generation by constraining the distance between the fused image and the different prompt texts in the latent space of the large image–text model. To further improve the quality of the fused images, this paper uses the fused images generated with different iterations to adaptively fine-tune the prompt generation network to continuously improve the quality of the generated prompt text labels and indirectly improve the visual effect of the fused images. In addition, to minimise the influence of subjective information in the fused image generation process, a 3D convolution-based fused image generation network is proposed to achieve the integration of infrared and visible feature through adaptive learning in additional dimensions. Extensive experiments show that the proposed model exhibits good visual effects and quantitative metrics in infrared–visible image fusion tasks in military scenarios, autopilot scenarios and dark-light scenarios, as well as good generalisation ability in multi-focus image fusion and medical image fusion tasks.

源语言英语
文章编号103042
期刊Displays
89
DOI
出版状态已出版 - 9月 2025

指纹

探究 'CLIPFusion: Infrared and visible image fusion network based on image–text large model and adaptive learning' 的科研主题。它们共同构成独一无二的指纹。

引用此