TY - JOUR
T1 - Reliable and Balanced Transfer Learning for Generalized Multimodal Face Anti-Spoofing
AU - Lin, Xun
AU - Liu, Ajian
AU - Yu, Zitong
AU - Cai, Rizhao
AU - Wang, Shuai
AU - Yu, Yi
AU - Wan, Jun
AU - Lei, Zhen
AU - Cao, Xiaochun
AU - Kot, Alex
N1 - Publisher Copyright:
© 1979-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - Face Anti-Spoofing (FAS) is essential for securing face recognition systems against presentation attacks. Recent advances in sensor technology and multimodal learning have enabled the development of multimodal FAS systems. However, existing methods often struggle to generalize to unseen attacks and diverse environments due to two key challenges: (1) Modality unreliability, where sensors such as depth and infrared suffer from severe domain shifts, impairing the reliability of cross-modal fusion; and (2) Modality imbalance, where over-reliance on a dominant modality weakens the model’s robustness against attacks that affect other modalities. To overcome these issues, we propose MMDG++, a multimodal domain-generalized FAS framework built upon the vision-language model CLIP. In MMDG++, we design the Uncertainty-Guided Cross-Adapter++ (U-Adapter++) to filter out unreliable regions within each modality, enabling more reliable multimodal interactions. Additionally, we introduce Rebalanced Modality Gradient Modulation (ReGrad) for adaptive gradient modulation to balance modality convergence. To further enhance generalization, propose Asymmetric Domain Prompts (ADPs) that leverage CLIP’s language priors to learn generalized decision boundaries across modalities. We also develop a novel multimodal FAS benchmark to evaluate generalizability under various deployment conditions. Extensive experiments across this benchmark show our method outperforms state-of-the-art FAS methods, demonstrating superior generalization capability.
AB - Face Anti-Spoofing (FAS) is essential for securing face recognition systems against presentation attacks. Recent advances in sensor technology and multimodal learning have enabled the development of multimodal FAS systems. However, existing methods often struggle to generalize to unseen attacks and diverse environments due to two key challenges: (1) Modality unreliability, where sensors such as depth and infrared suffer from severe domain shifts, impairing the reliability of cross-modal fusion; and (2) Modality imbalance, where over-reliance on a dominant modality weakens the model’s robustness against attacks that affect other modalities. To overcome these issues, we propose MMDG++, a multimodal domain-generalized FAS framework built upon the vision-language model CLIP. In MMDG++, we design the Uncertainty-Guided Cross-Adapter++ (U-Adapter++) to filter out unreliable regions within each modality, enabling more reliable multimodal interactions. Additionally, we introduce Rebalanced Modality Gradient Modulation (ReGrad) for adaptive gradient modulation to balance modality convergence. To further enhance generalization, propose Asymmetric Domain Prompts (ADPs) that leverage CLIP’s language priors to learn generalized decision boundaries across modalities. We also develop a novel multimodal FAS benchmark to evaluate generalizability under various deployment conditions. Extensive experiments across this benchmark show our method outperforms state-of-the-art FAS methods, demonstrating superior generalization capability.
KW - Face anti-spoofing
KW - modality balancing
KW - multi-modal learning
KW - uncertainty
KW - vision-language model
UR - https://www.scopus.com/pages/publications/105006535575
U2 - 10.1109/TPAMI.2025.3573785
DO - 10.1109/TPAMI.2025.3573785
M3 - 文章
C2 - 40418599
AN - SCOPUS:105006535575
SN - 0162-8828
VL - 47
SP - 7608
EP - 7625
JO - IEEE Transactions on Pattern Analysis and Machine Intelligence
JF - IEEE Transactions on Pattern Analysis and Machine Intelligence
IS - 9
ER -