TY - JOUR
T1 - Hierarchical Integration of Rich Features for Video-Based Person Re-Identification
AU - Liu, Zheng
AU - Wang, Yunhong
AU - Li, Annan
N1 - Publisher Copyright:
© 1991-2012 IEEE.
PY - 2019/12
Y1 - 2019/12
N2 - Person re-identification (ReID) aims to associate the identity of pedestrians captured by cameras across non-overlapped areas. Video-based ReID plays an important role in intelligent video surveillance systems and has attracted growing attention in recent years. In this paper, we propose an end-to-end video-based ReID framework based on the convolutional neural network (CNN) for efficient spatio-temporal modeling and enhanced similarity measuring. Specifically, we build our descriptor of sequences by basic mathematical calculations on the semantic mid-level image features, which avoids the time consuming computations and the loss of spatial correlations. We further hierarchically extract image features from multiple intermediate CNN stages to build multi-level sequence descriptors. For a descriptor at one stage, we design an effective auxiliary pairwise loss which is jointly optimized with a triplet loss. To integrate hierarchical representation, we propose an intuitive yet effective summation-based similarity integration scheme to match identities more accurately. Furthermore, we extend our framework by a multi-model ensemble strategy, which effectively assembles three popular CNN models to represent walking sequences more comprehensively and improve the performance. Extensive experiments on three video-based ReID datasets show that the proposed framework outperforms the state-of-the-art methods.
AB - Person re-identification (ReID) aims to associate the identity of pedestrians captured by cameras across non-overlapped areas. Video-based ReID plays an important role in intelligent video surveillance systems and has attracted growing attention in recent years. In this paper, we propose an end-to-end video-based ReID framework based on the convolutional neural network (CNN) for efficient spatio-temporal modeling and enhanced similarity measuring. Specifically, we build our descriptor of sequences by basic mathematical calculations on the semantic mid-level image features, which avoids the time consuming computations and the loss of spatial correlations. We further hierarchically extract image features from multiple intermediate CNN stages to build multi-level sequence descriptors. For a descriptor at one stage, we design an effective auxiliary pairwise loss which is jointly optimized with a triplet loss. To integrate hierarchical representation, we propose an intuitive yet effective summation-based similarity integration scheme to match identities more accurately. Furthermore, we extend our framework by a multi-model ensemble strategy, which effectively assembles three popular CNN models to represent walking sequences more comprehensively and improve the performance. Extensive experiments on three video-based ReID datasets show that the proposed framework outperforms the state-of-the-art methods.
KW - Person re-identification
KW - multi-model ensemble
KW - similarity measuring
KW - spatio-temporal aggregation
UR - https://www.scopus.com/pages/publications/85057840338
U2 - 10.1109/TCSVT.2018.2883995
DO - 10.1109/TCSVT.2018.2883995
M3 - 文章
AN - SCOPUS:85057840338
SN - 1051-8215
VL - 29
SP - 3646
EP - 3659
JO - IEEE Transactions on Circuits and Systems for Video Technology
JF - IEEE Transactions on Circuits and Systems for Video Technology
IS - 12
M1 - 8552668
ER -