TY - GEN
T1 - Scene-independent Localization by Learning Residual Coordinate Map with Cascaded Localizers
AU - Wang, Junyi
AU - Qi, Yue
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Visual localization plays an essential role in a variety of different fields. The indirect learning based method obtains an excellent performance, but it requests a training process in the target scene before the localization. To achieve deep scene-independent localization, we start by proposing the representation called residual coordinate map between a pair of images. Based on the structure, we put forward a network called SILocNet with the proposed residual coordinate map as the output. The network consists of feature extraction, multi-level feature fusion and transformer based coordinate decoder. Moreover, considering the dynamic scene, we introduce an additional segmentation branch that distinguishes fixed and dynamic parts to promote network perception. With SILocNet in place, a cascaded localizer design is presented for reducing the accumulative error. Meanwhile, the simple mathematical analysis behind the cascaded localizers is also provided. To verify how well our algorithm could perform, we conduct experiments on static 7 Scenes, ScanNet and dynamic TUM RGB-D. In particular, we train the network on ScanNet and test it on 7 Scenes and TUM RGB-D to demonstrate the generality performance. All experiments demonstrate superior performance to other existing methods. Additionally, the effects of the cascaded localizer design, feature fusion, transformer based coordinate decoder and segmentation loss are also discussed.
AB - Visual localization plays an essential role in a variety of different fields. The indirect learning based method obtains an excellent performance, but it requests a training process in the target scene before the localization. To achieve deep scene-independent localization, we start by proposing the representation called residual coordinate map between a pair of images. Based on the structure, we put forward a network called SILocNet with the proposed residual coordinate map as the output. The network consists of feature extraction, multi-level feature fusion and transformer based coordinate decoder. Moreover, considering the dynamic scene, we introduce an additional segmentation branch that distinguishes fixed and dynamic parts to promote network perception. With SILocNet in place, a cascaded localizer design is presented for reducing the accumulative error. Meanwhile, the simple mathematical analysis behind the cascaded localizers is also provided. To verify how well our algorithm could perform, we conduct experiments on static 7 Scenes, ScanNet and dynamic TUM RGB-D. In particular, we train the network on ScanNet and test it on 7 Scenes and TUM RGB-D to demonstrate the generality performance. All experiments demonstrate superior performance to other existing methods. Additionally, the effects of the cascaded localizer design, feature fusion, transformer based coordinate decoder and segmentation loss are also discussed.
KW - Scene-independent localization
KW - cascaded localizer
KW - dynamic scene
KW - residual coordinate map
UR - https://www.scopus.com/pages/publications/85180363385
U2 - 10.1109/ISMAR59233.2023.00022
DO - 10.1109/ISMAR59233.2023.00022
M3 - 会议稿件
AN - SCOPUS:85180363385
T3 - Proceedings - 2023 IEEE International Symposium on Mixed and Augmented Reality, ISMAR 2023
SP - 79
EP - 88
BT - Proceedings - 2023 IEEE International Symposium on Mixed and Augmented Reality, ISMAR 2023
A2 - Bruder, Gerd
A2 - Olivier, Anne-Helene
A2 - Cunningham, Andrew
A2 - Peng, Evan Yifan
A2 - Grubert, Jens
A2 - Williams, Ian
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 22nd IEEE International Symposium on Mixed and Augmented Reality, ISMAR 2023
Y2 - 16 October 2023 through 20 October 2023
ER -