TY - GEN
T1 - Detector-in-Detector
T2 - 2018 Scene Understanding and Modelling Challenge, SUMO 2018, 2018Learning and Inference Methods for High-Performance Imaging, LIMHPI 2018, 2018 Attention/Intention Understanding, AIU 2018, 2018 Museum Exhibit Identification Challenge for Domain Adaptation and Few-Shot Learning, 2018 RGB-D—Sensing and Understanding via Combined Color and Depth, 2018 Dense 3D Reconstruction for Dynamic Scenes, 2018 AI Aesthetics in Art and Media, AIAM 2018, 3rd International Workshop on Robust Reading, IWRR 2018, 2018 Artificial Intelligence for Retinal Image Analysis, AIRIA 2018, 2018 Combining Vision and Language, 1st International Workshop on Advanced Machine Vision for Real-Life and Industrially Relevant Applications, AMV 2018
AU - Li, Xiaojie
AU - Yang, Lu
AU - Song, Qing
AU - Zhou, Fuqiang
N1 - Publisher Copyright:
© 2019, Springer Nature Switzerland AG.
PY - 2019
Y1 - 2019
N2 - Vision-based person, hand or face detection approaches have achieved incredible success in recent years with the development of deep convolutional neural network (CNN). In this paper, we take the inherent correlation between the body and body parts into account and propose a new framework to boost up the detection performance of the multi-level objects. In particular, we adopt region-based object detection structure with two carefully designed detectors to separately pay attention to the human body and body parts in a coarse-to-fine manner, which we call Detector-in-Detector network (DID-Net). The first detector is designed to detect human body, hand and face. The second detector, based on the body detection results of the first detector, mainly focus on detection of small hand and face inside each body. The framework is trained in an end-to-end way by optimizing a multi-task loss. Due to the lack of human body, face and hand detection dataset, we have collected and labeled a new large dataset named Human-Parts with 14,962 images and 106,879 annotations. Experiments show that our method can achieve excellent performance on Human-Parts.
AB - Vision-based person, hand or face detection approaches have achieved incredible success in recent years with the development of deep convolutional neural network (CNN). In this paper, we take the inherent correlation between the body and body parts into account and propose a new framework to boost up the detection performance of the multi-level objects. In particular, we adopt region-based object detection structure with two carefully designed detectors to separately pay attention to the human body and body parts in a coarse-to-fine manner, which we call Detector-in-Detector network (DID-Net). The first detector is designed to detect human body, hand and face. The second detector, based on the body detection results of the first detector, mainly focus on detection of small hand and face inside each body. The framework is trained in an end-to-end way by optimizing a multi-task loss. Due to the lack of human body, face and hand detection dataset, we have collected and labeled a new large dataset named Human-Parts with 14,962 images and 106,879 annotations. Experiments show that our method can achieve excellent performance on Human-Parts.
KW - Convolutional neural network
KW - Detector in Detector
KW - Human parts
UR - https://www.scopus.com/pages/publications/85067358163
U2 - 10.1007/978-3-030-20890-5_15
DO - 10.1007/978-3-030-20890-5_15
M3 - 会议稿件
AN - SCOPUS:85067358163
SN - 9783030208899
T3 - Lecture Notes in Computer Science
SP - 228
EP - 240
BT - Computer Vision – ACCV 2018 - 14th Asian Conference on Computer Vision, Revised Selected Papers
A2 - Jawahar, C.V.
A2 - Li, Hongdong
A2 - Mori, Greg
A2 - Schindler, Konrad
PB - Springer Verlag
Y2 - 2 December 2018 through 6 December 2018
ER -