TY - JOUR
T1 - Weighted triple-sequence loss for video-based person re-identification
AU - Jiang, Ming
AU - Leng, Biao
AU - Song, Guanglu
AU - Meng, Zhijun
N1 - Publisher Copyright:
© 2019 Elsevier B.V.
PY - 2020/3/14
Y1 - 2020/3/14
N2 - Person re-identification (re-id) task has attracted a lot of attention because of the excellent performance powered by the Convolutional Neural Network (CNN). However, video-based person re-id is still challenging and far to be solved. On the one hand, the sequence contains complementary information but also more noise information. On the other hand, the training for video is still mainly based on classifying a single frame via its person identity. The representation of the video is generated by aggregating each frame feature. All these will cause that the robustness of video feature is not adequate at the training stage, and the model is easy to be misled by noise information existing in frames. In order to alleviate the difficulty of training video-based re-id, we propose a novel loss named Weighted Triple-Sequence Loss (WTSL) to optimize the video-based feature and reduce the impact of outliers. Further more, we design a Spatial Transformed Partial Network (STPN) coordinated with jointly optimizing image-level and video-level features to generate more robust representation. Extensive experiments show that our algorithm outperforms the state-of-the-art results and achieves 82.2%, 95.2%, and 85.9% rank-1 accuracy on three popular video-based benchmarks: iLIDS-VID, PRID2011, and MARS, respectively.
AB - Person re-identification (re-id) task has attracted a lot of attention because of the excellent performance powered by the Convolutional Neural Network (CNN). However, video-based person re-id is still challenging and far to be solved. On the one hand, the sequence contains complementary information but also more noise information. On the other hand, the training for video is still mainly based on classifying a single frame via its person identity. The representation of the video is generated by aggregating each frame feature. All these will cause that the robustness of video feature is not adequate at the training stage, and the model is easy to be misled by noise information existing in frames. In order to alleviate the difficulty of training video-based re-id, we propose a novel loss named Weighted Triple-Sequence Loss (WTSL) to optimize the video-based feature and reduce the impact of outliers. Further more, we design a Spatial Transformed Partial Network (STPN) coordinated with jointly optimizing image-level and video-level features to generate more robust representation. Extensive experiments show that our algorithm outperforms the state-of-the-art results and achieves 82.2%, 95.2%, and 85.9% rank-1 accuracy on three popular video-based benchmarks: iLIDS-VID, PRID2011, and MARS, respectively.
KW - Deep learning
KW - Person re-identification
KW - Spatial transformed partial network
KW - Weighted triple-Sequence loss
UR - https://www.scopus.com/pages/publications/85076527825
U2 - 10.1016/j.neucom.2019.11.088
DO - 10.1016/j.neucom.2019.11.088
M3 - 文章
AN - SCOPUS:85076527825
SN - 0925-2312
VL - 381
SP - 314
EP - 321
JO - Neurocomputing
JF - Neurocomputing
ER -