ARRPNGAN: Text-to-image GAN with attention regularization and region proposal networks

  • Fengnan Quan*
  • , Bo Lang
  • , Yanxi Liu
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Although text-to-image synthesis has shown remarkable success in generating high-resolution photorealistic images and semantic consistency, it still faces challenges in generating images with complex backgrounds. In this paper, we address this problem by proposing a novel generative adversarial text-to-image synthesis framework based on attention regularization modules and region proposal networks (ARRPNGAN). ARRPNGAN can precisely locate the keywords in text by exploiting attention model advantages and improving the accuracy in locating the subimage of target objects with the help of an RPN. Leveraging both attention regularization and the RPN a generative adversarial network (GAN) can obtain the most text description semantics and reduce the interference of complex background information. The results of extensive experiments on the Caltech-UCSD Birds and MS COCO datasets demonstrate that the proposed ARRPNGAN significantly outperforms other state-of-the-art text-to-image methods, especially in generating photorealistic images with complex backgrounds. Codes are available at: https://github.com/quanFN/ARRPNGAN.

Original languageEnglish
Article number116728
JournalSignal Processing: Image Communication
Volume106
DOIs
StatePublished - Aug 2022

Keywords

  • Attention model
  • Generative adversarial network
  • Region proposal network
  • Text-to-image synthesis

Fingerprint

Dive into the research topics of 'ARRPNGAN: Text-to-image GAN with attention regularization and region proposal networks'. Together they form a unique fingerprint.

Cite this