TY - GEN
T1 - Weakly-supervised learning of mid-level features for pedestrian attribute recognition and localization
AU - Zhou, Yang
AU - Yu, Kai
AU - Leng, Biao
AU - Zhang, Zhang
AU - Li, Dangwei
AU - Huang, Kaiqi
AU - Feng, Bailan
AU - Yao, Chunfeng
N1 - Publisher Copyright:
© 2017. The copyright of this document resides with its authors.
PY - 2017
Y1 - 2017
N2 - Most existing methods for pedestrian attribute recognition in video surveillance can be formulated as a multi-label image classification methodology, while attribute localization is usually disregarded due to the low image qualities and large variations of camera viewpoints and human poses. In this paper, we propose a weakly-supervised learning based approaching to implementing multi-attribute classification and localization simultaneously, without the need of bounding box annotations of attributes. Firstly, a set of mid-level attribute features are discovered by a multi-scale attribute-aware module receiving the outputs of multiple inception layers in a deep Convolution Neural Network (CNN) e.g., GoogLeNet, where a Flexible Spatial Pyramid Pooling (FSPP) operation is performed to acquire the activation maps of attribute features. Subsequently, attribute labels are predicted through a fully-connected layer which performs the regression between the response magnitudes in activation maps and the image-level attribute annotations. Finally, the locations of pedestrian attributes can be inferred by fusing the multiple activation maps, where the fusion weights are estimated as the correlation strengths between attributes and relevant mid-level features. To validate the proposed approach, extensive experiments are performed on the two currently largest pedestrian attribute datasets, i.e.
AB - Most existing methods for pedestrian attribute recognition in video surveillance can be formulated as a multi-label image classification methodology, while attribute localization is usually disregarded due to the low image qualities and large variations of camera viewpoints and human poses. In this paper, we propose a weakly-supervised learning based approaching to implementing multi-attribute classification and localization simultaneously, without the need of bounding box annotations of attributes. Firstly, a set of mid-level attribute features are discovered by a multi-scale attribute-aware module receiving the outputs of multiple inception layers in a deep Convolution Neural Network (CNN) e.g., GoogLeNet, where a Flexible Spatial Pyramid Pooling (FSPP) operation is performed to acquire the activation maps of attribute features. Subsequently, attribute labels are predicted through a fully-connected layer which performs the regression between the response magnitudes in activation maps and the image-level attribute annotations. Finally, the locations of pedestrian attributes can be inferred by fusing the multiple activation maps, where the fusion weights are estimated as the correlation strengths between attributes and relevant mid-level features. To validate the proposed approach, extensive experiments are performed on the two currently largest pedestrian attribute datasets, i.e.
UR - https://www.scopus.com/pages/publications/85088408410
U2 - 10.5244/c.31.69
DO - 10.5244/c.31.69
M3 - 会议稿件
AN - SCOPUS:85088408410
T3 - British Machine Vision Conference 2017, BMVC 2017
BT - British Machine Vision Conference 2017, BMVC 2017
PB - BMVA Press
T2 - 28th British Machine Vision Conference, BMVC 2017
Y2 - 4 September 2017 through 7 September 2017
ER -