跳到主要导航 跳到搜索 跳到主要内容

面向行人再识别的朝向感知特征学习

  • Yang Shan
  • , Zhang Yongfei*
  • , Pu Yanglin
  • , Yang Hangyuan
  • *此作品的通讯作者
  • Beihang University

科研成果: 期刊稿件文章同行评审

摘要

Objective In the contemporary digital and internet-driven environment, person re-identification (ReID) technology has become an integral component of domains such as intelligent surveillance, security, and new retail. However, in real-world scenarios, the same person may exhibit significant appearance differences due to changes in view, leading to degraded association performance. Existing methods typically enhance the model’s representation ability and association capacity by first-view representation learning and designing view-based loss functions to make the model perceive view information. While these methods have achieved outstanding results, significant challenges remain, which will be elaborated upon in the following sections. The first challenge is how person representational capability can be retained in models with implicit view feature learning. In terms of view feature representation, existing methods based on the transformer architecture convert view labels into feature vectors through the view embedding layer. These methods hinder the model from perceiving complex posture information from simple labels. Consequently, these methods implicitly learn the view features; that is, they do not explicitly convey to the model the spatial structure of person posture, such as the position of keypoints and their topological relationships. This situation could result in the model not precisely perceiving person postures and views, thereby diminishing the model’s representational capability for persons. To address this issue, our method embeds keypoint coordinates and models the topological structure between keypoints. When this structured information is provided to the model, it can more intuitively understand person postures, allowing for explicit learning of person posture. The second challenge is how persons with similar appearances and the same view can be separated during indiscriminate pushing of anchor from hard negatives. With regard to the design of the view-based loss function, many existing methods generally do not differentiate specific views, learning generic view features, which might strip the model of essential person view information. Alternatively, some approaches leverage triplet loss to reduce feature map distances for persons with the same views while increasing the distances between clusters of the same identity with opposing views and bringing clusters of adjacent views closer together. However, on the basis of our analysis of error cases in real scenarios, persons with similar appearances and the same views often rank higher in retrieval results, leading to degraded performance of the ReID system. Moreover, while the aforementioned methods set a uniform margin to push anchors from hard negative examples, persons with similar appearances and the same views might still not be distinctly separated. To address this issue, we introduce a large margin for different identities with similar appearances and same views to push them apart. We then introduce view-aware feature learning (VAFL) for person ReID to address the outlined challenges. Method First, we propose view feature learning based on person posture (Pos2View). Specifically, the view of a person is inherently determined by the spatial arrangement of various body parts, which provides key insights into their view. Consequently, we integrate the person’s posture information into the feature map, enhancing the model’s ability to discern the person’s view. Second, we propose triplet loss with adaptive view (AdaView), which assigns adaptive margins between examples on the basis of their views, thereby optimizing the triplet loss for person view awareness. The original triplet loss updates the model by pulling the anchor and the hard positive example closer and pushing the hard negative example away from the anchor. However, our proposed AdaView emphasizes distancing persons with the same view and similar appearances far apart in the feature space. Specifically, these similar-appearance persons are the hard negative examples in the mini-batch, which have the closest Euclidean distance. With the high visual similarity among images of the same person with same views, we aim to pull them closer in the feature space, forming sub-clusters of images with the same view. This action is reflected in the minimal margin. To make the model sensitive to changes in person appearance due to view shifts, for images of the same person with different views, we push apart their corresponding sub-clusters in the feature space. This pushing is signified by a slightly larger margin. We deliberately increase the distance between images in the feature space that have similar appearances but belong to different identities with the same view. This operation is reflected by a larger margin. Collectively, the above steps define the AdaView. Result In our comprehensive analysis, we assessed the performance of our proposed method against a variety of established techniques in the field of person ReID. Our evaluation encompassed multiple public datasets, including Market1501 (Market), DukeMTMC-ReID, MSMT17, and CUHK. To gauge the effectiveness of our approach, we utilized two primary metrics:Rank-1 (R1), which measures the accuracy of the first result in retrieval, and the mean average precision (mAP), assessing overall ranking accuracy. Our method involved leveraging person view annotations from select datasets and implementing a model trained on ResNet to predict views of individuals in the MSMT17 dataset. We employed various data augmentation strategies and adhered to hyperparameter settings in line with TransReID. In direct comparison with state-of-the-art methods, including classic person ReID techniques and recent advancements such as TransReID and UniHCP, our proposed method exhibited superior performance. Specifically, on the MSMT17 dataset, our approach surpassed UniHCP by 1. 7% in R1 and 1. 3% in mAP. This improvement can be attributed to our unique VAFL technique, which enhances cluster differentiation and retrieval accuracy. Further, we conducted tests in generalized person ReID tasks to validate our model’s adaptability and stability in diverse scenarios. Compared with representative generalization methods, our approach demonstrated a slight edge, mainly due to the VAFL technique’s capacity to refine cluster boundaries and maintain a balance between intraclass compactness and interclass dispersion. Our ablation study revealed that removing the VAFL component from our model significantly reduced its performance, highlighting the component’s critical role in the overall effectiveness of our method. This study confirms the robustness and superiority of our approach in the field of person ReID, paving the way for its practical deployment in real-world applications. Conclu⁃ sion In this paper, we introduce VAFL, which enhances the model’s sensitivity to view, aiding in distinguishing persons with similar appearances but from the same view. Experimental results demonstrate that our approach exhibits outstanding performance across various scenarios, confirming its efficiency and reliability.

投稿的翻译标题View-aware feature learning for person re-identification
源语言繁体中文
页(从-至)188-197
页数10
期刊Journal of Image and Graphics
30
1
DOI
出版状态已出版 - 1月 2025

关键词

  • adaptive view
  • person re-identification
  • person view
  • similar appearances
  • view perception

指纹

探究 '面向行人再识别的朝向感知特征学习' 的科研主题。它们共同构成独一无二的指纹。

引用此