TY - JOUR
T1 - Semantic-integrated multi-model fitting for real-time VSLAM in highly dynamic environments
AU - Zhang, Tiantian
AU - Li, Ni
AU - Gong, Guanghong
AU - Lin, Xin
N1 - Publisher Copyright:
© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2025.
PY - 2026/1
Y1 - 2026/1
N2 - The paper explores the challenges of visual simultaneous localization and mapping (VSLAM) in highly dynamic environments, a capability crucial for applications such as autonomous driving and service robots. We propose semantic-integrated multi-model fitting (SMMF)-SLAMMOT, a tightly coupled VSLAM and moving object tracking (MOT) method, capable of simultaneously estimating the full SE(3) motions of a stereo camera and the surrounding moving rigid objects, without relying on geometric priors. The SMMF-SLAMMOT framework begins with a two-level dynamic data association technique, which leverages object embedding descriptors from a detector to enhance matching robustness in crowded scenes. Subsequently, a semantic-integrated multi-model fitting method is proposed to achieve more accurate and robust multiple motion segmentation and estimation. Furthermore, we devise a spatial–temporal reprojection factor to enhance the accuracy and efficiency of the 4D mapping. Evaluations on the OMD and KITTI Tracking datasets, along with a self-collected dataset from the CARLA simulator, demonstrate the superiority of SMMF-SLAMMOT in terms of accuracy of self-localization and moving object tracking, as well as real-time performance. Specifically, on the KITTI Tracking dataset, compared to state-of-the-art systems, our method achieves a median 14% improvement in camera pose estimation accuracy and a median 43% enhancement in sparse feature-based object motion estimation accuracy, while achieving a twofold faster tracking frequency at 24 frames per second. The source code and the datasets are available at https://github.com/zhangtiantians/SMMF_SLAMMOT. This work not only advances the VSLAM field but also provides practical solutions for real-world applications in dynamic scenes.
AB - The paper explores the challenges of visual simultaneous localization and mapping (VSLAM) in highly dynamic environments, a capability crucial for applications such as autonomous driving and service robots. We propose semantic-integrated multi-model fitting (SMMF)-SLAMMOT, a tightly coupled VSLAM and moving object tracking (MOT) method, capable of simultaneously estimating the full SE(3) motions of a stereo camera and the surrounding moving rigid objects, without relying on geometric priors. The SMMF-SLAMMOT framework begins with a two-level dynamic data association technique, which leverages object embedding descriptors from a detector to enhance matching robustness in crowded scenes. Subsequently, a semantic-integrated multi-model fitting method is proposed to achieve more accurate and robust multiple motion segmentation and estimation. Furthermore, we devise a spatial–temporal reprojection factor to enhance the accuracy and efficiency of the 4D mapping. Evaluations on the OMD and KITTI Tracking datasets, along with a self-collected dataset from the CARLA simulator, demonstrate the superiority of SMMF-SLAMMOT in terms of accuracy of self-localization and moving object tracking, as well as real-time performance. Specifically, on the KITTI Tracking dataset, compared to state-of-the-art systems, our method achieves a median 14% improvement in camera pose estimation accuracy and a median 43% enhancement in sparse feature-based object motion estimation accuracy, while achieving a twofold faster tracking frequency at 24 frames per second. The source code and the datasets are available at https://github.com/zhangtiantians/SMMF_SLAMMOT. This work not only advances the VSLAM field but also provides practical solutions for real-world applications in dynamic scenes.
KW - Dynamic scenes
KW - Moving object tracking
KW - Multi-model fitting
KW - Visual SLAM
UR - https://www.scopus.com/pages/publications/105024757943
U2 - 10.1007/s00371-025-04235-7
DO - 10.1007/s00371-025-04235-7
M3 - 文章
AN - SCOPUS:105024757943
SN - 0178-2789
VL - 42
JO - Visual Computer
JF - Visual Computer
IS - 1
M1 - 56
ER -