TY - JOUR
T1 - Visual camera relocalization using both hand-crafted and learned features
AU - Wang, Junyi
AU - Qi, Yue
N1 - Publisher Copyright:
© 2023 Elsevier Ltd
PY - 2024/1
Y1 - 2024/1
N2 - The localization of the camera is essential in AR, MR, and robotics. Diverse pipelines employ a hand-crafted or learning based way to predict the camera pose as per the task. In the localization process, both weaknesses and strengths are maintained. However, few current frameworks consider these two features simultaneously. In this study, a novel relocalization pipeline for RGB or RGB-D input is proposed, including a coarse stage with learned features, further refinement with hand-crafted features, and a stable process to measure the confidence of both stages for improving localization robustness. Instead of directly regressing the camera pose, the coarse procedure uses registration to the known source and predicted weighted target point cloud to obtain the initial result. Therefore, we design a deep network called PGNet to construct the weighted target point cloud with the image and previous poses as inputs. Moreover, in consideration of dynamic surroundings, we add a segmentation branch distinguishing each point as either fixed or dynamic with the purpose of promoting dynamic perception. Correspondingly, the segmentation-extended Chamfer Distance is added to optimize PGNet. During the pose refinement, the feature space is established via hand-crafted feature extraction and matching on the training set. Based on the coarse pose, we obtain the accurate pose by applying Kabsch or Perspective-n-Point (PnP) algorithm to point-to-point correspondences built through searching the space and matching Oriented Fast and Rotated Brief (ORB) features. Furthermore, an additional process is presented by defining coarse and refinement metrics to gain a more stable performance. Finally, experiments on both static and dynamic scenes are conducted. On the one side, the results demonstrate the state-of-the-art performance over other existing methods on 7 Scenes, INDOOR-6, Cambridge Landmarks and TUM RGB-D. On the other side, the positive effects of the pose learning part, dynamic branch, confidence regression and hand-crafted feature based refinement are also provided.
AB - The localization of the camera is essential in AR, MR, and robotics. Diverse pipelines employ a hand-crafted or learning based way to predict the camera pose as per the task. In the localization process, both weaknesses and strengths are maintained. However, few current frameworks consider these two features simultaneously. In this study, a novel relocalization pipeline for RGB or RGB-D input is proposed, including a coarse stage with learned features, further refinement with hand-crafted features, and a stable process to measure the confidence of both stages for improving localization robustness. Instead of directly regressing the camera pose, the coarse procedure uses registration to the known source and predicted weighted target point cloud to obtain the initial result. Therefore, we design a deep network called PGNet to construct the weighted target point cloud with the image and previous poses as inputs. Moreover, in consideration of dynamic surroundings, we add a segmentation branch distinguishing each point as either fixed or dynamic with the purpose of promoting dynamic perception. Correspondingly, the segmentation-extended Chamfer Distance is added to optimize PGNet. During the pose refinement, the feature space is established via hand-crafted feature extraction and matching on the training set. Based on the coarse pose, we obtain the accurate pose by applying Kabsch or Perspective-n-Point (PnP) algorithm to point-to-point correspondences built through searching the space and matching Oriented Fast and Rotated Brief (ORB) features. Furthermore, an additional process is presented by defining coarse and refinement metrics to gain a more stable performance. Finally, experiments on both static and dynamic scenes are conducted. On the one side, the results demonstrate the state-of-the-art performance over other existing methods on 7 Scenes, INDOOR-6, Cambridge Landmarks and TUM RGB-D. On the other side, the positive effects of the pose learning part, dynamic branch, confidence regression and hand-crafted feature based refinement are also provided.
KW - Dynamic environment
KW - Hand-crafted feature refinement
KW - Visual relocalization
KW - Weighted point cloud generation
UR - https://www.scopus.com/pages/publications/85171758999
U2 - 10.1016/j.patcog.2023.109914
DO - 10.1016/j.patcog.2023.109914
M3 - 文章
AN - SCOPUS:85171758999
SN - 0031-3203
VL - 145
JO - Pattern Recognition
JF - Pattern Recognition
M1 - 109914
ER -