TY - JOUR
T1 - Coordinate-based anchor-free module for object detection
AU - Tang, Zhiyong
AU - Yang, Jianbing
AU - Pei, Zhongcai
AU - Song, Xiao
N1 - Publisher Copyright:
© 2021, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
PY - 2021/12
Y1 - 2021/12
N2 - Despite the impressive performance of some recent state-of-the-art detectors, small target detection, scale variation, and label ambiguities remain challenges. To tackle these issues, we present a coordinate-based anchor-free (CBAF) module for object detection. It can be used as a branch of a single-shot detector (e.g., RetinaNet or SSD) or predict the output probabilities and coordinates directly. The main idea of the CBAF module is to predict the category and the adjustments to the box of the object by part feature and its contextual part features, which are based on feature maps divided by spatial coordinates. This is inspired by the fact that human beings can infer an entire object by observing the part of the surrounding environment. The CBAF module will encode and decode boxes in the anchor-free manner per feature map with different resolutions during training and testing. During training, we first use the proposed spatial coordinate partition layer to divide feature maps into several parts of size n × n and then propose a contextual building layer to fuse the part and its contextual parts together. We will demonstrate the CBAF module through a concrete implementation. The CBAF module improves AP scores of object detection with nearly no additional computation when working in conjunction with the anchor-based RetinaNet. Furthermore, experimental results on the MS-COCO dataset show that the mAP of the CBAF module has increased by 1.1%, compared with RetinaNet. When the CBAF module works in conjunction with the anchor-based RetinaNet, the mAP increased by 2.2%.
AB - Despite the impressive performance of some recent state-of-the-art detectors, small target detection, scale variation, and label ambiguities remain challenges. To tackle these issues, we present a coordinate-based anchor-free (CBAF) module for object detection. It can be used as a branch of a single-shot detector (e.g., RetinaNet or SSD) or predict the output probabilities and coordinates directly. The main idea of the CBAF module is to predict the category and the adjustments to the box of the object by part feature and its contextual part features, which are based on feature maps divided by spatial coordinates. This is inspired by the fact that human beings can infer an entire object by observing the part of the surrounding environment. The CBAF module will encode and decode boxes in the anchor-free manner per feature map with different resolutions during training and testing. During training, we first use the proposed spatial coordinate partition layer to divide feature maps into several parts of size n × n and then propose a contextual building layer to fuse the part and its contextual parts together. We will demonstrate the CBAF module through a concrete implementation. The CBAF module improves AP scores of object detection with nearly no additional computation when working in conjunction with the anchor-based RetinaNet. Furthermore, experimental results on the MS-COCO dataset show that the mAP of the CBAF module has increased by 1.1%, compared with RetinaNet. When the CBAF module works in conjunction with the anchor-based RetinaNet, the mAP increased by 2.2%.
KW - Anchor-free
KW - Contextual part features
KW - Object detection
KW - Part feature
UR - https://www.scopus.com/pages/publications/85105126568
U2 - 10.1007/s10489-021-02373-8
DO - 10.1007/s10489-021-02373-8
M3 - 文章
AN - SCOPUS:85105126568
SN - 0924-669X
VL - 51
SP - 9066
EP - 9080
JO - Applied Intelligence
JF - Applied Intelligence
IS - 12
ER -