TY - JOUR
T1 - GAN-Based virtual-to-real image translation for urban scene semantic segmentation
AU - Guo, Xi
AU - Wang, Zhicheng
AU - Yang, Qin
AU - Lv, Weifeng
AU - Liu, Xianglong
AU - Wu, Qiong
AU - Huang, Jian
N1 - Publisher Copyright:
© 2019 Elsevier B.V.
PY - 2020/6/21
Y1 - 2020/6/21
N2 - Semantic image segmentation requires large amounts of pixel-wise labeled training data. Creating such data generally requires labor-intensive human manual annotation. Thus, extracting training data from video games is a practical idea, and pixel-wise annotation can be automated from video games with near perfect accuracy. However, experiments show that models trained using raw video-game data cannot be directly applied to real-world scenes because of the domain shift problem. In this paper, we propose a domain-adaptive network based on CycleGAN that translates scenes from a virtual domain to a real domain in both the pixel and feature spaces. Our contributions are threefold: 1) we propose a dynamic perceptual network to improve the quality of the generated images in the feature spaces, making the translated images are more conducive to semantic segmentation; 2) we introduce a novel weighted self-regularization loss to prevent semantic changes in translated images; and 3) we design a discrimination mechanism to coordinate multiple subnetworks and improve the overall training efficiency. We devise a series of metrics to evaluate the quality of translated images during our experiments on the public GTA-V (a video game dataset, i.e., the virtual domain) and Cityscapes (a real-world dataset, i.e., the real domain) and achieved notably improved results, demonstrating the efficacy of the proposed model.
AB - Semantic image segmentation requires large amounts of pixel-wise labeled training data. Creating such data generally requires labor-intensive human manual annotation. Thus, extracting training data from video games is a practical idea, and pixel-wise annotation can be automated from video games with near perfect accuracy. However, experiments show that models trained using raw video-game data cannot be directly applied to real-world scenes because of the domain shift problem. In this paper, we propose a domain-adaptive network based on CycleGAN that translates scenes from a virtual domain to a real domain in both the pixel and feature spaces. Our contributions are threefold: 1) we propose a dynamic perceptual network to improve the quality of the generated images in the feature spaces, making the translated images are more conducive to semantic segmentation; 2) we introduce a novel weighted self-regularization loss to prevent semantic changes in translated images; and 3) we design a discrimination mechanism to coordinate multiple subnetworks and improve the overall training efficiency. We devise a series of metrics to evaluate the quality of translated images during our experiments on the public GTA-V (a video game dataset, i.e., the virtual domain) and Cityscapes (a real-world dataset, i.e., the real domain) and achieved notably improved results, demonstrating the efficacy of the proposed model.
KW - Deep convolutional neural networks
KW - Domain adaptation
KW - Generative adversarial networks
KW - Semantic segmentation
KW - Virtual-to-real image translation
UR - https://www.scopus.com/pages/publications/85068137487
U2 - 10.1016/j.neucom.2019.01.115
DO - 10.1016/j.neucom.2019.01.115
M3 - 文章
AN - SCOPUS:85068137487
SN - 0925-2312
VL - 394
SP - 127
EP - 135
JO - Neurocomputing
JF - Neurocomputing
ER -