TY - GEN
T1 - LaDiffGAN
T2 - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2024
AU - Liu, Xuhui
AU - Zeng, Bohan
AU - Gao, Sicheng
AU - Li, Shanglin
AU - Feng, Yutang
AU - Li, Hong
AU - Liu, Boyu
AU - Liu, Jianzhuang
AU - Zhang, Baochang
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Diffusion models have recently become increasingly popular in a number of computer vision tasks, but they fail to achieve satisfactory results for unsupervised image-to-image translation, since they require massive training data and rely heavily on extra guidance. In this scenario, GANs can alleviate these issues existing in diffusion models, albeit with suboptimal quality. In this paper, we leverage the advantages of both GANs and diffusion models by training GANs with diffusion supervision in latent spaces (LaDiffGAN) to solve the unsupervised image-to-image translation task. Firstly, to promote style transfer quality, we encode the data in specific latent spaces with styles of the target and source domains. Secondly, we introduce the diffusion process with different amounts of Gaussian noise to enhance the modeling capability of GANs on the complex data distribution. We accordingly design a latent diffusion GAN loss to align the latent features between generated and training images. Lastly, we introduce a heterogeneous conditional denoising loss that incorporates image-level supervision to further improve the quality of generated results. Our LaDiffGAN significantly alleviates the drawbacks associated with diffusion models, such as data leakage, high inference cost, and high dependence on large training data sets. Extensive experiments show that LaDiffGAN outperforms previous GAN models and delivers comparable or even better performance than diffusion models.
AB - Diffusion models have recently become increasingly popular in a number of computer vision tasks, but they fail to achieve satisfactory results for unsupervised image-to-image translation, since they require massive training data and rely heavily on extra guidance. In this scenario, GANs can alleviate these issues existing in diffusion models, albeit with suboptimal quality. In this paper, we leverage the advantages of both GANs and diffusion models by training GANs with diffusion supervision in latent spaces (LaDiffGAN) to solve the unsupervised image-to-image translation task. Firstly, to promote style transfer quality, we encode the data in specific latent spaces with styles of the target and source domains. Secondly, we introduce the diffusion process with different amounts of Gaussian noise to enhance the modeling capability of GANs on the complex data distribution. We accordingly design a latent diffusion GAN loss to align the latent features between generated and training images. Lastly, we introduce a heterogeneous conditional denoising loss that incorporates image-level supervision to further improve the quality of generated results. Our LaDiffGAN significantly alleviates the drawbacks associated with diffusion models, such as data leakage, high inference cost, and high dependence on large training data sets. Extensive experiments show that LaDiffGAN outperforms previous GAN models and delivers comparable or even better performance than diffusion models.
KW - Diffusion Models
KW - GANs
KW - Image-to-image Translation
KW - Latent Space
UR - https://www.scopus.com/pages/publications/85205965519
U2 - 10.1109/CVPRW63382.2024.00118
DO - 10.1109/CVPRW63382.2024.00118
M3 - 会议稿件
AN - SCOPUS:85205965519
T3 - IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
SP - 1115
EP - 1125
BT - Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2024
PB - IEEE Computer Society
Y2 - 16 June 2024 through 22 June 2024
ER -