TY - GEN
T1 - Multi-scale aggregation network for direct face alignment
AU - Li, Peizhao
AU - Zhang, Anran
AU - Yue, Lei
AU - Zhen, Xiantong
AU - Cao, Xianbin
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/3/4
Y1 - 2019/3/4
N2 - Face alignment has been extensively researched in computer vision while remaining a challenging task. Direct face alignment based on convolutional neural networks (CNN) without relying on cascaded regression has recently emerged and achieved promising performance. In this paper, we propose a multi-scale aggregation network (MAN) for direct face alignment by aggregating features from intermediate layers of a CNN. Specifically, MAN adopts a new convolutional architecture to aggregate features at all scales in different semantic levels, which establishes highly informative facial representations for accurate alignment. Moreover, we introduce the attention mechanism into the network, which drives it to focus on the spatial regions closely related to facial landmarks for further improved performance. Our MAN achieves a general end-to-end learning architecture for multi-scale feature aggregation, which, coupled with spatial attention mechanism, is well-suited for direct face alignment. Extensive experiments conducted on four benchmark datasets, including AFLW, 300W, CelebA and 300VW, show that MAN consistently produces high performance and surpasses several state-of-the-art methods in most cases.
AB - Face alignment has been extensively researched in computer vision while remaining a challenging task. Direct face alignment based on convolutional neural networks (CNN) without relying on cascaded regression has recently emerged and achieved promising performance. In this paper, we propose a multi-scale aggregation network (MAN) for direct face alignment by aggregating features from intermediate layers of a CNN. Specifically, MAN adopts a new convolutional architecture to aggregate features at all scales in different semantic levels, which establishes highly informative facial representations for accurate alignment. Moreover, we introduce the attention mechanism into the network, which drives it to focus on the spatial regions closely related to facial landmarks for further improved performance. Our MAN achieves a general end-to-end learning architecture for multi-scale feature aggregation, which, coupled with spatial attention mechanism, is well-suited for direct face alignment. Extensive experiments conducted on four benchmark datasets, including AFLW, 300W, CelebA and 300VW, show that MAN consistently produces high performance and surpasses several state-of-the-art methods in most cases.
UR - https://www.scopus.com/pages/publications/85063594140
U2 - 10.1109/WACV.2019.00233
DO - 10.1109/WACV.2019.00233
M3 - 会议稿件
AN - SCOPUS:85063594140
T3 - Proceedings - 2019 IEEE Winter Conference on Applications of Computer Vision, WACV 2019
SP - 2156
EP - 2165
BT - Proceedings - 2019 IEEE Winter Conference on Applications of Computer Vision, WACV 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 19th IEEE Winter Conference on Applications of Computer Vision, WACV 2019
Y2 - 7 January 2019 through 11 January 2019
ER -