Abstract
3D head alignment is an important task in various multimedia applications. Recently, few prior works have focused on information exchange among different vertices or 3DMM parameters in regression. On the other hand, there is a drawback that using high-resolution feature maps makes algorithms memory-consuming and inefficient. To solve these issues, we first propose a multi-task model equipped with two transformer-based branches which further enhances the information communication among different elements through self-attention and cross-attention mechanisms. To solve the problem of low efficiency of high-resolution feature maps and improve the accuracy of facial landmark detection, a lightweight module named query-aware memory (QAMem) is designed to enhance the discriminative ability of queries on low-resolution feature maps by assigning separate memory values to each query rather than a shared one. With the help of QAMem, our model is efficient because it removes the dependence on high-resolution feature maps and is still able to obtain superior accuracy. To further improve the robustness of the predicted landmarks, we introduce a multi-layer additive residual regression (MARR) module that can provide a more stable and reliable reference based on the average face model. Furthermore, the multi-information loss function with Euler Angles Loss is proposed to supervise the network with more effective information, making the model more robust to handle the case of atypical head poses. Extensive experiments on two public benchmarks show that our approach can achieve state-of-the-art performance. In addition, visualization results and ablation experiments verify the effectiveness of the proposed model.
| Original language | English |
|---|---|
| Article number | 180 |
| Journal | International Journal of Machine Learning and Cybernetics |
| Volume | 17 |
| Issue number | 4 |
| DOIs | |
| State | Published - Apr 2026 |
Keywords
- 3D head alignment
- Facial landmark detection
- Vision transformer
Fingerprint
Dive into the research topics of 'Trans3DHead: a multi-task transformer for 3D head alignment'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver