Abstract
The paper explores the challenges of visual simultaneous localization and mapping (VSLAM) in highly dynamic environments, a capability crucial for applications such as autonomous driving and service robots. We propose semantic-integrated multi-model fitting (SMMF)-SLAMMOT, a tightly coupled VSLAM and moving object tracking (MOT) method, capable of simultaneously estimating the full SE(3) motions of a stereo camera and the surrounding moving rigid objects, without relying on geometric priors. The SMMF-SLAMMOT framework begins with a two-level dynamic data association technique, which leverages object embedding descriptors from a detector to enhance matching robustness in crowded scenes. Subsequently, a semantic-integrated multi-model fitting method is proposed to achieve more accurate and robust multiple motion segmentation and estimation. Furthermore, we devise a spatial–temporal reprojection factor to enhance the accuracy and efficiency of the 4D mapping. Evaluations on the OMD and KITTI Tracking datasets, along with a self-collected dataset from the CARLA simulator, demonstrate the superiority of SMMF-SLAMMOT in terms of accuracy of self-localization and moving object tracking, as well as real-time performance. Specifically, on the KITTI Tracking dataset, compared to state-of-the-art systems, our method achieves a median 14% improvement in camera pose estimation accuracy and a median 43% enhancement in sparse feature-based object motion estimation accuracy, while achieving a twofold faster tracking frequency at 24 frames per second. The source code and the datasets are available at https://github.com/zhangtiantians/SMMF_SLAMMOT. This work not only advances the VSLAM field but also provides practical solutions for real-world applications in dynamic scenes.
| Original language | English |
|---|---|
| Article number | 56 |
| Journal | Visual Computer |
| Volume | 42 |
| Issue number | 1 |
| DOIs | |
| State | Published - Jan 2026 |
Keywords
- Dynamic scenes
- Moving object tracking
- Multi-model fitting
- Visual SLAM
Fingerprint
Dive into the research topics of 'Semantic-integrated multi-model fitting for real-time VSLAM in highly dynamic environments'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver