TY - JOUR
T1 - Joint depth estimation and multi-model SLAM for robust perception in structure-degraded environments
AU - Wang, Weipeng
AU - Ji, Wenxuan
AU - Xiao, Jin
AU - Hu, Xiaoguang
AU - Jia, Zichong
AU - Shi, Jiaqi
N1 - Publisher Copyright:
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2026.
PY - 2026/4
Y1 - 2026/4
N2 - Robust spatial perception is essential for SLAM in robotics and autonomous systems, but existing pipelines often fail in structure-deficient scenes when relying on a single modality or decoupling depth estimation from SLAM. We present a joint depth-enhanced, multi-model SLAM system tailored for such scenarios with three core contributions: First, we propose a multi-model depth fusion framework (MDFF) that fuses visual, LiDAR, inertial, and learned depth cues; Second, we design a dense scan-to-map module (DSM) within the LiDAR–Inertial Subsystem (LIS) that eliminates handcrafted features; Third, we develop a depth-aware backend optimization (DBO) that jointly refines poses, landmarks, and scale using multi and single-view depth constraints. The system targets high-throughput computing, with embarrassingly parallel per-point residuals and GPU-ready depth inference. Experiments show that DSM reduces LiDAR-inertial processing time versus LVI-SAM while the full pipeline runs in real time (21.5 FPS LiDAR, 28.6 FPS camera) and delivers higher localization accuracy than representative baselines.
AB - Robust spatial perception is essential for SLAM in robotics and autonomous systems, but existing pipelines often fail in structure-deficient scenes when relying on a single modality or decoupling depth estimation from SLAM. We present a joint depth-enhanced, multi-model SLAM system tailored for such scenarios with three core contributions: First, we propose a multi-model depth fusion framework (MDFF) that fuses visual, LiDAR, inertial, and learned depth cues; Second, we design a dense scan-to-map module (DSM) within the LiDAR–Inertial Subsystem (LIS) that eliminates handcrafted features; Third, we develop a depth-aware backend optimization (DBO) that jointly refines poses, landmarks, and scale using multi and single-view depth constraints. The system targets high-throughput computing, with embarrassingly parallel per-point residuals and GPU-ready depth inference. Experiments show that DSM reduces LiDAR-inertial processing time versus LVI-SAM while the full pipeline runs in real time (21.5 FPS LiDAR, 28.6 FPS camera) and delivers higher localization accuracy than representative baselines.
KW - Depth estimation
KW - Multi-model
KW - SLAM
KW - Structure-degraded environments
UR - https://www.scopus.com/pages/publications/105035047147
U2 - 10.1007/s11227-026-08461-1
DO - 10.1007/s11227-026-08461-1
M3 - 文章
AN - SCOPUS:105035047147
SN - 0920-8542
VL - 82
JO - Journal of Supercomputing
JF - Journal of Supercomputing
IS - 5
M1 - 306
ER -