跳到主要导航 跳到搜索 跳到主要内容

BEVTrack:基于难例挖掘训练的端到端三维多目标跟踪方法

  • Hong Zhang
  • , Jiaxu Wan
  • , Haibo Chen
  • , Jian Zhang
  • , Xuliang Li*
  • *此作品的通讯作者
  • Beihang University
  • Hubei Huazhong Changjiang Optoelectronic Technology CO. LTD
  • 32184 Unit

科研成果: 期刊稿件文章同行评审

摘要

Multi-Object Tracking(MOT)has emerged as a crucial component in autonomous driving systems,with the goal of identifying,locating,and labeling all relevant objects in consecutive video and point cloud streams. Currently,there is a growing research emphasis on developing efficient and accurate multitarget tracking in the fields of computer vision and autonomous driving. The majority of existing 3D MOT methods adopt the two-stage heuristic approach,which relies on detection information and manually tuned parameters to effectively track targets in a scene. However,in the heuristic MOT paradigm,each object is associated by a meticulously tuned Kalman filter,and the tracking process is divided into multiple stages,including matching and re-matching. Consequently,extensive parameter tuning is necessary at each stage to ensure effective tracking,resulting in a cumbersome overall process. Furthermore,these methods are insufficient for modeling complex variations and encounter challenges in solving occlusion problems. Presently,in the current field of 3D multi-object tracking,there has been a rise in the use of end-to-end tracking methods,like MUTR,which implicitly establish temporal correlations and eschew explicit heuristic strategies utilized in the past. Nonetheless,the accuracy of these methods is typically subpar,falling significantly short of non-end-to-end heuristic approaches. The main reason behind this issue can be attributed to the challenges associated with feature aggregation and perception in three-dimensional space,as opposed to two-dimensional images. The real-time demands of multi-object tracking impose restrictions on the use of only a few thin Transformer layers in the tracker. Nevertheless,this reliance on a sparse number of Transformer layers presents difficulties in achieving intricate three-dimensional feature aggregation,substantially impacting the overall tracking accuracy. Moreover,the model’s training process is frequently disrupted by extensive noisy and challenging information,thereby compromising its capacity for feature extraction throughout the tracking process. To tackle these challenges,this paper introduces a novel end-to-end framework for 3D multi-object tracking,named BEVTrack,which relies on training with hard example mining techniques. To address the issue of 3D feature correlation,this paper devises a three-dimensional tracking query utilizing Bird’s Eye View(BEV)position encoding. Through utilization of the BEV cross-attention tracking module,the model is capable of connecting the tracking query with the corresponding three-dimensional features in the BEV view,ultimately delivering precise and refined features. The proposed method implicitly models the trajectory’s positional and appearance alterations,thereby streamlining the 3D tracking process. Consequently,it becomes more proficient in associating tracking queries with authentic three-dimensional features,thereby substantially enhancing tracking accuracy. Moreover,the model utilizes BEV data for feature correlation,allowing for fast and efficient tracking using a lightweight network due to the benefits offered by BEV features,which include low computation cost and alleviating minor target position changes. In order to combat the problem of data noise,this paper presents simulated noise training via hard example mining. This approach involves introducing more challenging detections and false targets during the training process to enhance the model’s capacity for filtering out corrupting noise and effectively handling interference encountered in real-world scenarios. Regarding experimental outcomes,an in-depth comparative analysis and model ablation experiments were conducted using the Nuscenes dataset,which achieves the highest level of tracking accuracy compared to other methods without the need for additional parameter tuning,highlighting the superiority and efficiency of the proposed approach.

投稿的翻译标题BEVTrack:An End-to-end 3D Multi Object Tracking Method Based on Hard Example Mining Training
源语言繁体中文
页(从-至)152-165
页数14
期刊Journal of Signal Processing
40
1
DOI
出版状态已出版 - 1月 2024

关键词

  • Transformer
  • end to end
  • hard negative mining
  • multi object tracking

指纹

探究 'BEVTrack:基于难例挖掘训练的端到端三维多目标跟踪方法' 的科研主题。它们共同构成独一无二的指纹。

引用此