TY - GEN
T1 - Text-driven Physically Interpretable Face Editing
AU - Yang, Songru
AU - Meng, Yapeng
AU - Shi, Zhenwei
AU - Zou, Zhengxia
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - This paper proposes a novel and physically interpretable method for face editing with arbitrary text prompts. Different from previous GAN-inversion editing methods that manipulate its latent space or diffusion methods conduct manipulation as a reverse process, we regard the face editing process as imposing vector flow fields on face images, representing the offset of spatial coordinates and color for each pixel. Under this paradigm, we represent the vector flow field in two ways: 1) explicitly represent the flow vectors with rasterized tensors, and 2) implicitly parameterize the flow vectors as continuous, smooth, and resolution-agnostic neural fields. The flow vectors are iteratively optimized under the guidance of the pre-trained CLIP model by maximizing the correlation between the edited image and the text prompt. We also propose a learning-based one-shot face editing framework, which is fast and adaptable to any text prompt input. Compared with SOTA text-driven face editing methods, our method can generate physically interpretable face editing results with high identity consistency and image quality.
AB - This paper proposes a novel and physically interpretable method for face editing with arbitrary text prompts. Different from previous GAN-inversion editing methods that manipulate its latent space or diffusion methods conduct manipulation as a reverse process, we regard the face editing process as imposing vector flow fields on face images, representing the offset of spatial coordinates and color for each pixel. Under this paradigm, we represent the vector flow field in two ways: 1) explicitly represent the flow vectors with rasterized tensors, and 2) implicitly parameterize the flow vectors as continuous, smooth, and resolution-agnostic neural fields. The flow vectors are iteratively optimized under the guidance of the pre-trained CLIP model by maximizing the correlation between the edited image and the text prompt. We also propose a learning-based one-shot face editing framework, which is fast and adaptable to any text prompt input. Compared with SOTA text-driven face editing methods, our method can generate physically interpretable face editing results with high identity consistency and image quality.
KW - face editing
KW - physically interpretable
KW - text-driven
UR - https://www.scopus.com/pages/publications/105017592636
U2 - 10.1109/ICMEW68306.2025.11152090
DO - 10.1109/ICMEW68306.2025.11152090
M3 - 会议稿件
AN - SCOPUS:105017592636
T3 - IEEE International Conference on Multimedia and Expo Workshops: Journey to the Center of Machine Imagination, ICMEW 2025 - Proceedings
BT - IEEE International Conference on Multimedia and Expo Workshops
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2025 IEEE International Conference on Multimedia and Expo Workshops, ICMEW 2025
Y2 - 30 June 2025 through 4 July 2025
ER -