TY - GEN
T1 - Towards Fair and Comprehensive Comparisons for Image-Based 3D Object Detection
AU - Ma, Xinzhu
AU - Wang, Yongtao
AU - Zhang, Yinmin
AU - Xia, Zhiyi
AU - Meng, Yuan
AU - Wang, Zhihui
AU - Li, Haojie
AU - Ouyang, Wanli
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - In this work, we build a modular-designed codebase, formulate strong training recipes, design an error diagnosis toolbox, and discuss current methods for image-based 3D object detection. In particular, different from other highly mature tasks, e.g., 2D object detection, the community of image-based 3D object detection is still evolving, where methods often adopt different training recipes and tricks resulting in unfair evaluations and comparisons. What is worse, these tricks may overwhelm their proposed designs in performance, even leading to wrong conclusions. To address this issue, we build a module-designed code-base and formulate unified training standards for the community. Furthermore, we also design an error diagnosis toolbox to measure the detailed characterization of detection models. Using these tools, we analyze current methods in-depth under varying settings and provide discussions for some open questions, e.g., discrepancies in conclusions on KITTI-3D and nuScenes datasets, which have led to different dominant methods for these datasets. We hope that this work will facilitate future research in image-based 3D object detection. Our codes will be released at https://github.com/OpenGVLab/3dodi.
AB - In this work, we build a modular-designed codebase, formulate strong training recipes, design an error diagnosis toolbox, and discuss current methods for image-based 3D object detection. In particular, different from other highly mature tasks, e.g., 2D object detection, the community of image-based 3D object detection is still evolving, where methods often adopt different training recipes and tricks resulting in unfair evaluations and comparisons. What is worse, these tricks may overwhelm their proposed designs in performance, even leading to wrong conclusions. To address this issue, we build a module-designed code-base and formulate unified training standards for the community. Furthermore, we also design an error diagnosis toolbox to measure the detailed characterization of detection models. Using these tools, we analyze current methods in-depth under varying settings and provide discussions for some open questions, e.g., discrepancies in conclusions on KITTI-3D and nuScenes datasets, which have led to different dominant methods for these datasets. We hope that this work will facilitate future research in image-based 3D object detection. Our codes will be released at https://github.com/OpenGVLab/3dodi.
UR - https://www.scopus.com/pages/publications/85188251942
U2 - 10.1109/ICCV51070.2023.00591
DO - 10.1109/ICCV51070.2023.00591
M3 - 会议稿件
AN - SCOPUS:85188251942
T3 - Proceedings of the IEEE International Conference on Computer Vision
SP - 6402
EP - 6412
BT - Proceedings - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023
Y2 - 2 October 2023 through 6 October 2023
ER -