TY - GEN
T1 - Language-Guided Global Image Editing via Cross-Modal Cyclic Mechanism
AU - Jiang, Wentao
AU - Xu, Ning
AU - Wang, Jiayun
AU - Gao, Chen
AU - Shi, Jing
AU - Lin, Zhe
AU - Liu, Si
N1 - Publisher Copyright:
© 2021 IEEE
PY - 2021
Y1 - 2021
N2 - Editing an image automatically via a linguistic request can significantly save laborious manual work and is friendly to photography novice. In this paper, we focus on the task of language-guided global image editing. Existing works suffer from imbalanced and insufficient data distribution of real-world datasets and thus fail to understand language requests well. To handle this issue, we propose to create a cycle with our image generator by creating a novel model called Editing Description Network (EDNet) which predicts an editing embedding given a pair of images. Given the cycle, we propose several free augmentation strategies to help our model understand various editing requests given the imbalanced dataset. In addition, two other novel ideas are proposed: an Image-Request Attention (IRA) module which allows our method to edit an image spatial-adaptively when the image requires different editing degree at different regions, as well as a new evaluation metric for this task which is more semantic and reasonable than conventional pixel losses (e.g. L1). Extensive experiments on two benchmark datasets demonstrate the effectiveness of our method over existing approaches.
AB - Editing an image automatically via a linguistic request can significantly save laborious manual work and is friendly to photography novice. In this paper, we focus on the task of language-guided global image editing. Existing works suffer from imbalanced and insufficient data distribution of real-world datasets and thus fail to understand language requests well. To handle this issue, we propose to create a cycle with our image generator by creating a novel model called Editing Description Network (EDNet) which predicts an editing embedding given a pair of images. Given the cycle, we propose several free augmentation strategies to help our model understand various editing requests given the imbalanced dataset. In addition, two other novel ideas are proposed: an Image-Request Attention (IRA) module which allows our method to edit an image spatial-adaptively when the image requires different editing degree at different regions, as well as a new evaluation metric for this task which is more semantic and reasonable than conventional pixel losses (e.g. L1). Extensive experiments on two benchmark datasets demonstrate the effectiveness of our method over existing approaches.
UR - https://www.scopus.com/pages/publications/85127737461
U2 - 10.1109/ICCV48922.2021.00212
DO - 10.1109/ICCV48922.2021.00212
M3 - 会议稿件
AN - SCOPUS:85127737461
T3 - Proceedings of the IEEE International Conference on Computer Vision
SP - 2095
EP - 2104
BT - Proceedings - 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 18th IEEE/CVF International Conference on Computer Vision, ICCV 2021
Y2 - 11 October 2021 through 17 October 2021
ER -