Language-Based Image Manipulation Built on Language-Guided Ranking

  • Fuxiang Wu
  • , Liu Liu
  • , Fusheng Hao
  • , Fengxiang He
  • , Jun Cheng*
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Text-based image manipulation is a popular subject and has many applications. However, it is a challenging task because there is no ground-truth edited dataset and textual descriptions have abstractive and ambiguous properties. To alleviate the difficult issues, we propose a manipulation framework consisting of the proposal attentional GANs, language-related semantic mask, and language-guided ranker. Specially, we construct an editing proposal generator to generate the suitable edited proposals with and without semantic conditions, which supports the reorganization of sub-generators to output proposals in various aspects as many as possible. To distinguish the text-relevant and the text-irrelevant regions, we introduce a language-related semantic mask based on the source image and target caption. Then, we exploit a language-guided ranker to retrieve the best edited result from the edited proposals through using the multi-modal similarity and the language-related semantic mask. Extensive experiments on widely-used datasets demonstrate that our model could manipulate images interactively and improve the editing quality effectively.

Original languageEnglish
Pages (from-to)6219-6231
Number of pages13
JournalIEEE Transactions on Multimedia
Volume25
DOIs
StatePublished - 2023
Externally publishedYes

Keywords

  • Text-based image manipulation
  • language-guided ranker
  • semantic mask

Fingerprint

Dive into the research topics of 'Language-Based Image Manipulation Built on Language-Guided Ranking'. Together they form a unique fingerprint.

Cite this