Skip to main navigation Skip to search Skip to main content

AttriDiffuser: Adversarially enhanced diffusion model for text-to-facial attribute image synthesis

  • Wenfeng Song
  • , Zhongyong Ye
  • , Meng Sun
  • , Xia Hou*
  • , Shuai Li
  • , Aimin Hao
  • *Corresponding author for this work
  • Beijing Information Science & Technology University
  • Zhongguancun Laboratory

Research output: Contribution to journalArticlepeer-review

Abstract

In the progressive domain of computer vision, generating high-fidelity facial images from textual descriptions with precision remains a complex challenge. While existing diffusion models have demonstrated capabilities in text-to-image synthesis, they often struggle with capturing intricate details from complex, multi-attribute textual descriptions, leading to entity or attribute loss and inaccurate combinations. We propose AttriDiffuser, a novel model designed to ensure that each entity and attribute in textual descriptions is distinctly and accurately represented in the synthesized images. AttriDiffuser utilizes a text-driven attribute diffusion adversarial model, enhancing the correspondence between textual attributes and image features. It incorporates an attribute-gating cross-attention mechanism seamlessly into the adversarial learning enhanced diffusion model. AttriDiffuser advances traditional diffusion models by integrating a face diversity discriminator, which augments adversarial training and promotes the generation of diverse yet precise facial images in alignment with complex textual descriptions. Our empirical evaluation, conducted on the renowned Multimodal VoxCeleb and CelebA-HQ datasets, and benchmarked against other state-of-the-art models, demonstrates AttriDiffuser's superior efficacy. The results indicate its unparalleled capability to synthesize high-quality facial images with rigorous adherence to complex, multi-faceted textual descriptions, marking a significant advancement in text-to-facial attribute synthesis. Our code and model will be made publicly available at https://github.com/sunmeng7/AttriDiffuser.

Original languageEnglish
Article number111447
JournalPattern Recognition
Volume163
DOIs
StatePublished - Jul 2025

Keywords

  • Diffusion model
  • Diversity face
  • Facial synthesis
  • Generative adversarial networks
  • Text-to-facial generation

Fingerprint

Dive into the research topics of 'AttriDiffuser: Adversarially enhanced diffusion model for text-to-facial attribute image synthesis'. Together they form a unique fingerprint.

Cite this