跳到主要导航 跳到搜索 跳到主要内容

AttriDiffuser: Adversarially enhanced diffusion model for text-to-facial attribute image synthesis

  • Wenfeng Song
  • , Zhongyong Ye
  • , Meng Sun
  • , Xia Hou*
  • , Shuai Li
  • , Aimin Hao
  • *此作品的通讯作者
  • Beijing Information Science & Technology University
  • Zhongguancun Laboratory

科研成果: 期刊稿件文章同行评审

摘要

In the progressive domain of computer vision, generating high-fidelity facial images from textual descriptions with precision remains a complex challenge. While existing diffusion models have demonstrated capabilities in text-to-image synthesis, they often struggle with capturing intricate details from complex, multi-attribute textual descriptions, leading to entity or attribute loss and inaccurate combinations. We propose AttriDiffuser, a novel model designed to ensure that each entity and attribute in textual descriptions is distinctly and accurately represented in the synthesized images. AttriDiffuser utilizes a text-driven attribute diffusion adversarial model, enhancing the correspondence between textual attributes and image features. It incorporates an attribute-gating cross-attention mechanism seamlessly into the adversarial learning enhanced diffusion model. AttriDiffuser advances traditional diffusion models by integrating a face diversity discriminator, which augments adversarial training and promotes the generation of diverse yet precise facial images in alignment with complex textual descriptions. Our empirical evaluation, conducted on the renowned Multimodal VoxCeleb and CelebA-HQ datasets, and benchmarked against other state-of-the-art models, demonstrates AttriDiffuser's superior efficacy. The results indicate its unparalleled capability to synthesize high-quality facial images with rigorous adherence to complex, multi-faceted textual descriptions, marking a significant advancement in text-to-facial attribute synthesis. Our code and model will be made publicly available at https://github.com/sunmeng7/AttriDiffuser.

源语言英语
文章编号111447
期刊Pattern Recognition
163
DOI
出版状态已出版 - 7月 2025

指纹

探究 'AttriDiffuser: Adversarially enhanced diffusion model for text-to-facial attribute image synthesis' 的科研主题。它们共同构成独一无二的指纹。

引用此