TY - JOUR
T1 - Multi-level feature fusion and joint refinement for simultaneous object pose estimation and camera localization
AU - Wang, Junyi
AU - Qi, Yue
N1 - Publisher Copyright:
© 2024 Elsevier Ltd
PY - 2024/6
Y1 - 2024/6
N2 - Object pose estimation and camera localization are critical in various applications. However, achieving algorithm universality, which refers to category-level pose estimation and scene-independent camera localization, presents challenges for both techniques. Although the two tasks keep close relationships due to spatial geometry constraints, different tasks require distinct feature extractions. This paper pays attention to a unified RGB-D based framework that simultaneously performs category-level object pose estimation and scene-independent camera localization. The framework consists of a pose estimation branch called SLO-ObjNet, a localization branch called SLO-LocNet, a pose confidence calculation process and object-level optimization. At the start, we obtain the initial camera and object results from SLO-LocNet and SLO-ObjNet. In these two networks, we design there-level feature fusion modules as well as the loss function to achieve feature sharing between two tasks. Then the proposed approach involves a confidence calculation process to determine the accuracy of object poses obtained. Additionally, an object-level Bundle Adjustment (BA) optimization algorithm is further used to improve the precision of these techniques. The BA algorithm establishes relationships among feature points, objects, and cameras with the usage of camera-point, camera-object, and object-point metrics. To evaluate the performance of this approach, experiments are conducted on localization and pose estimation datasets including REAL275, CAMERA25, LineMOD, YCB-Video, 7 Scenes, ScanNet and TUM RGB-D. The results show that this approach outperforms existing methods in terms of both estimation and localization accuracy. Additionally, SLO-LocNet and SLO-ObjNet are trained on ScanNet data and tested on 7 Scenes and TUM RGB-D datasets to demonstrate its universality performance. Finally, we also highlight the positive effects of fusion modules, loss function, confidence process and BA for improving overall performance.
AB - Object pose estimation and camera localization are critical in various applications. However, achieving algorithm universality, which refers to category-level pose estimation and scene-independent camera localization, presents challenges for both techniques. Although the two tasks keep close relationships due to spatial geometry constraints, different tasks require distinct feature extractions. This paper pays attention to a unified RGB-D based framework that simultaneously performs category-level object pose estimation and scene-independent camera localization. The framework consists of a pose estimation branch called SLO-ObjNet, a localization branch called SLO-LocNet, a pose confidence calculation process and object-level optimization. At the start, we obtain the initial camera and object results from SLO-LocNet and SLO-ObjNet. In these two networks, we design there-level feature fusion modules as well as the loss function to achieve feature sharing between two tasks. Then the proposed approach involves a confidence calculation process to determine the accuracy of object poses obtained. Additionally, an object-level Bundle Adjustment (BA) optimization algorithm is further used to improve the precision of these techniques. The BA algorithm establishes relationships among feature points, objects, and cameras with the usage of camera-point, camera-object, and object-point metrics. To evaluate the performance of this approach, experiments are conducted on localization and pose estimation datasets including REAL275, CAMERA25, LineMOD, YCB-Video, 7 Scenes, ScanNet and TUM RGB-D. The results show that this approach outperforms existing methods in terms of both estimation and localization accuracy. Additionally, SLO-LocNet and SLO-ObjNet are trained on ScanNet data and tested on 7 Scenes and TUM RGB-D datasets to demonstrate its universality performance. Finally, we also highlight the positive effects of fusion modules, loss function, confidence process and BA for improving overall performance.
KW - Category-level object pose estimation
KW - Feature fusion
KW - Geometry constraint
KW - Object-level BA
KW - Scene-independent camera localization
UR - https://www.scopus.com/pages/publications/85188142848
U2 - 10.1016/j.neunet.2024.106238
DO - 10.1016/j.neunet.2024.106238
M3 - 文章
C2 - 38508048
AN - SCOPUS:85188142848
SN - 0893-6080
VL - 174
JO - Neural Networks
JF - Neural Networks
M1 - 106238
ER -