Abstract
Although text-to-image synthesis has shown remarkable success in generating high-resolution photorealistic images and semantic consistency, it still faces challenges in generating images with complex backgrounds. In this paper, we address this problem by proposing a novel generative adversarial text-to-image synthesis framework based on attention regularization modules and region proposal networks (ARRPNGAN). ARRPNGAN can precisely locate the keywords in text by exploiting attention model advantages and improving the accuracy in locating the subimage of target objects with the help of an RPN. Leveraging both attention regularization and the RPN a generative adversarial network (GAN) can obtain the most text description semantics and reduce the interference of complex background information. The results of extensive experiments on the Caltech-UCSD Birds and MS COCO datasets demonstrate that the proposed ARRPNGAN significantly outperforms other state-of-the-art text-to-image methods, especially in generating photorealistic images with complex backgrounds. Codes are available at: https://github.com/quanFN/ARRPNGAN.
| Original language | English |
|---|---|
| Article number | 116728 |
| Journal | Signal Processing: Image Communication |
| Volume | 106 |
| DOIs | |
| State | Published - Aug 2022 |
Keywords
- Attention model
- Generative adversarial network
- Region proposal network
- Text-to-image synthesis
Fingerprint
Dive into the research topics of 'ARRPNGAN: Text-to-image GAN with attention regularization and region proposal networks'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver