TY - JOUR
T1 - Weighted cross-integrated fusion network based on knowledge distillation for multi-modal personality recognition
AU - Bao, Yongtang
AU - Liu, Xiang
AU - Li, Xiao
AU - Wang, Zhihui
AU - Qi, Yue
N1 - Publisher Copyright:
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2025.
PY - 2025/7
Y1 - 2025/7
N2 - Personality recognition is crucial for deeply understanding social relationships. Although significant advancements have been made in personality recognition research in recent years, challenges still need to be addressed, particularly the heterogeneity in cross-modal information sharing. To address this, we propose a framework based on the Weighted Cross-Integrated Fusion Network (WCIF-Net). This framework comprises five modules and integrates three modalities (visual, audio, and text) to fuse multi-modal features for accurate personality recognition. Our proposed Weighted Frame Allocation Module optimizes the quality of input video frames by strategically allocating weight calculations. We also incorporate knowledge distillation and contrastive learning into the network, effectively resolving the heterogeneity problem in cross-modal information sharing. We evaluate our method on the ChaLearn First Impressions V2 and ELEA datasets, comparing it with several state-of-the-art methods using different architectures. The experimental results confirm the functionality of the individual modules and their combinations as designed. Based on two key evaluation metrics (ACC and PCC), our performance surpasses the state-of-the-art networks based on the three modalities. Furthermore, our work demonstrates the significant role that Transformers can play in understanding mental phenomena, indicating that our method has broad applicability in multi-modal affective computing.
AB - Personality recognition is crucial for deeply understanding social relationships. Although significant advancements have been made in personality recognition research in recent years, challenges still need to be addressed, particularly the heterogeneity in cross-modal information sharing. To address this, we propose a framework based on the Weighted Cross-Integrated Fusion Network (WCIF-Net). This framework comprises five modules and integrates three modalities (visual, audio, and text) to fuse multi-modal features for accurate personality recognition. Our proposed Weighted Frame Allocation Module optimizes the quality of input video frames by strategically allocating weight calculations. We also incorporate knowledge distillation and contrastive learning into the network, effectively resolving the heterogeneity problem in cross-modal information sharing. We evaluate our method on the ChaLearn First Impressions V2 and ELEA datasets, comparing it with several state-of-the-art methods using different architectures. The experimental results confirm the functionality of the individual modules and their combinations as designed. Based on two key evaluation metrics (ACC and PCC), our performance surpasses the state-of-the-art networks based on the three modalities. Furthermore, our work demonstrates the significant role that Transformers can play in understanding mental phenomena, indicating that our method has broad applicability in multi-modal affective computing.
KW - Affective computing
KW - Multi-modal
KW - Personality recognition
KW - Transformer
UR - https://www.scopus.com/pages/publications/105008703768
U2 - 10.1007/s10489-025-06623-x
DO - 10.1007/s10489-025-06623-x
M3 - 文章
AN - SCOPUS:105008703768
SN - 0924-669X
VL - 55
JO - Applied Intelligence
JF - Applied Intelligence
IS - 10
M1 - 761
ER -