Skip to main navigation Skip to search Skip to main content

CoupleFER: Dynamic Cross-Modal Fusion via Prompt Learning for Improved 2D+3D FER

  • Beihang University
  • Shanghai Artificial Intelligence Laboratory

Research output: Contribution to journalArticlepeer-review

Abstract

The integration of 2D texture information and 3D geometric data has shown great promise in advancing the accuracy and robustness of 2D+3D facial expression recognition (FER) systems. Traditional methods in this domain often rely on projecting 3D data onto 2D maps, which limits the effective utilization of critical 3D features. To address this, we introduce CoupleFER, a novel approach that utilizes a cross-modal fusion strategy by combining image-based and point cloud-based networks. Unlike conventional multi-modal fusion methods, CoupleFER introduces the Cross-Modal Prompt Fusion (CouPle) module, enabling dynamic and interactive fusion between the two branches at every layer. This allows 2D texture information to serve as a guiding prompt, thereby enhancing the performance of the 3D FER branch. To further boost robustness and generalization, we propose a dual-level supervision mechanism, which imposes constraints at both the cluster and sample levels during training. Extensive experiments on the widely used BU-3DFE and Bosphorus datasets demonstrate that CoupleFER outperforms state-of-the-art methods, achieving superior recognition accuracy. Ablation studies validate the importance of each key component of the framework, underscoring its potential to significantly improve the performance of 2D + 3D FER systems, and robustness tests demonstrate its stability.

Original languageEnglish
Pages (from-to)3154-3168
Number of pages15
JournalIEEE Transactions on Affective Computing
Volume16
Issue number4
DOIs
StatePublished - 2025

Keywords

  • Facial expression recognition
  • cross-modal fusion
  • prompt learning

Fingerprint

Dive into the research topics of 'CoupleFER: Dynamic Cross-Modal Fusion via Prompt Learning for Improved 2D+3D FER'. Together they form a unique fingerprint.

Cite this