跳到主要导航 跳到搜索 跳到主要内容

MedoidsFormer: A Strong 3D Object Detection Backbone by Exploiting Interaction With Adjacent Medoid Tokens

  • Xiaoyu Tian
  • , Ming Yang
  • , Qian Yu*
  • , Junhai Yong
  • , Dong Xu
  • *此作品的通讯作者
  • Tsinghua University
  • Beihang University
  • The University of Hong Kong

科研成果: 期刊稿件文章同行评审

摘要

In this paper, we propose MedoidsFormer, a novel transformer-based backbone equipped with a self-attention mechanism that is tailored explicitly to LiDAR-based 3D object detection. Unlike 2D object detection, the proportion of target objects to the input scene is much smaller, and their distribution is significantly sparser in 3D object detection. Given these observations, we introduce a new self-attention mechanism called Medoids Attention, focusing on exploiting interactions within surrounding regions, which not only reduces computation and memory costs but obtains discriminative context information. Instead of aggregating tokens from adjacent areas, we present a dynamic semantic-aware token mining process through k-Medoids clustering to direct select representative tokens for attention modeling. Our proposed method shows consistent improvement over existing 3D object detectors through extensive experiments and achieves state-of-the-art performance on the large-scale Waymo Open Dataset. We also conduct comprehensive ablation studies to verify the efficacy of the new self-attention mechanism and provide thorough insights.

源语言英语
页(从-至)5844-5854
页数11
期刊IEEE Transactions on Circuits and Systems for Video Technology
33
10
DOI
出版状态已出版 - 1 10月 2023

指纹

探究 'MedoidsFormer: A Strong 3D Object Detection Backbone by Exploiting Interaction With Adjacent Medoid Tokens' 的科研主题。它们共同构成独一无二的指纹。

引用此