Skip to main navigation Skip to search Skip to main content

Trans3DHead: a multi-task transformer for 3D head alignment

  • Jin Jiang
  • , Shengcai Liao*
  • , Haibo Jin
  • , Xiaoyuan Yang*
  • *Corresponding author for this work
  • Beihang University
  • United Arab Emirates University
  • Hong Kong University of Science and Technology

Research output: Contribution to journalArticlepeer-review

Abstract

3D head alignment is an important task in various multimedia applications. Recently, few prior works have focused on information exchange among different vertices or 3DMM parameters in regression. On the other hand, there is a drawback that using high-resolution feature maps makes algorithms memory-consuming and inefficient. To solve these issues, we first propose a multi-task model equipped with two transformer-based branches which further enhances the information communication among different elements through self-attention and cross-attention mechanisms. To solve the problem of low efficiency of high-resolution feature maps and improve the accuracy of facial landmark detection, a lightweight module named query-aware memory (QAMem) is designed to enhance the discriminative ability of queries on low-resolution feature maps by assigning separate memory values to each query rather than a shared one. With the help of QAMem, our model is efficient because it removes the dependence on high-resolution feature maps and is still able to obtain superior accuracy. To further improve the robustness of the predicted landmarks, we introduce a multi-layer additive residual regression (MARR) module that can provide a more stable and reliable reference based on the average face model. Furthermore, the multi-information loss function with Euler Angles Loss is proposed to supervise the network with more effective information, making the model more robust to handle the case of atypical head poses. Extensive experiments on two public benchmarks show that our approach can achieve state-of-the-art performance. In addition, visualization results and ablation experiments verify the effectiveness of the proposed model.

Original languageEnglish
Article number180
JournalInternational Journal of Machine Learning and Cybernetics
Volume17
Issue number4
DOIs
StatePublished - Apr 2026

Keywords

  • 3D head alignment
  • Facial landmark detection
  • Vision transformer

Fingerprint

Dive into the research topics of 'Trans3DHead: a multi-task transformer for 3D head alignment'. Together they form a unique fingerprint.

Cite this