Abstract
The integration of 2D texture information and 3D geometric data has shown great promise in advancing the accuracy and robustness of 2D+3D facial expression recognition (FER) systems. Traditional methods in this domain often rely on projecting 3D data onto 2D maps, which limits the effective utilization of critical 3D features. To address this, we introduce CoupleFER, a novel approach that utilizes a cross-modal fusion strategy by combining image-based and point cloud-based networks. Unlike conventional multi-modal fusion methods, CoupleFER introduces the Cross-Modal Prompt Fusion (CouPle) module, enabling dynamic and interactive fusion between the two branches at every layer. This allows 2D texture information to serve as a guiding prompt, thereby enhancing the performance of the 3D FER branch. To further boost robustness and generalization, we propose a dual-level supervision mechanism, which imposes constraints at both the cluster and sample levels during training. Extensive experiments on the widely used BU-3DFE and Bosphorus datasets demonstrate that CoupleFER outperforms state-of-the-art methods, achieving superior recognition accuracy. Ablation studies validate the importance of each key component of the framework, underscoring its potential to significantly improve the performance of 2D + 3D FER systems, and robustness tests demonstrate its stability.
| Original language | English |
|---|---|
| Pages (from-to) | 3154-3168 |
| Number of pages | 15 |
| Journal | IEEE Transactions on Affective Computing |
| Volume | 16 |
| Issue number | 4 |
| DOIs | |
| State | Published - 2025 |
Keywords
- Facial expression recognition
- cross-modal fusion
- prompt learning
Fingerprint
Dive into the research topics of 'CoupleFER: Dynamic Cross-Modal Fusion via Prompt Learning for Improved 2D+3D FER'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver