TY - GEN
T1 - Depth State Space Model for Light Field Depth Estimation via Text-Similar Representation
AU - Sun, Zexin
AU - Wang, Tun
AU - Yang, Da
AU - Cui, Zhenglong
AU - Chen, Rongshan
AU - Li, Ying
AU - Su, Guanqun
AU - Sheng, Hao
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2026.
PY - 2026
Y1 - 2026
N2 - Light field (LF) technology captures both spatial and angular information of the real world, enabling accurate depth estimation. Cost volume-based methods mostly consider LF depth estimation as a shift-matching process, which fail to efficiently establish the relationship among different viewpoints. State Space Model (SSM) has shown strong capabilities in long-sequence modeling, providing a powerful mechanism to capture viewpoints associations. In this paper, we observe that LF depth estimation can be viewed as state transition and then propose a text-similar representation based on the distribution of pixel values across different viewpoints, which is able to detect occluded and discontinuous regions. Furthermore, to extract the potential depth features, we represent it as Depth State Space Model (DSSM), leveraging the state transition mechanism of SSM to capture spatial, angular and structural characteristics in complex regions. Based on the proposed DSSM, we develop DSS-Net for depth estimation. Experiments demonstrate that our approach achieves state-of-the-art performance, with significant improvements in occluded and discontinuous regions, highlighting its effectiveness in addressing the complexities of LF depth estimation.
AB - Light field (LF) technology captures both spatial and angular information of the real world, enabling accurate depth estimation. Cost volume-based methods mostly consider LF depth estimation as a shift-matching process, which fail to efficiently establish the relationship among different viewpoints. State Space Model (SSM) has shown strong capabilities in long-sequence modeling, providing a powerful mechanism to capture viewpoints associations. In this paper, we observe that LF depth estimation can be viewed as state transition and then propose a text-similar representation based on the distribution of pixel values across different viewpoints, which is able to detect occluded and discontinuous regions. Furthermore, to extract the potential depth features, we represent it as Depth State Space Model (DSSM), leveraging the state transition mechanism of SSM to capture spatial, angular and structural characteristics in complex regions. Based on the proposed DSSM, we develop DSS-Net for depth estimation. Experiments demonstrate that our approach achieves state-of-the-art performance, with significant improvements in occluded and discontinuous regions, highlighting its effectiveness in addressing the complexities of LF depth estimation.
KW - Cost volume
KW - Depth estimation
KW - Depth state space model
KW - Light field
KW - Occlusion-aware
KW - Text-similar representation
UR - https://www.scopus.com/pages/publications/105022976789
U2 - 10.1007/978-981-95-3001-4_25
DO - 10.1007/978-981-95-3001-4_25
M3 - 会议稿件
AN - SCOPUS:105022976789
SN - 9789819530007
T3 - Lecture Notes in Computer Science
SP - 339
EP - 351
BT - Knowledge Science, Engineering and Management - 18th International Conference, KSEM 2025, Proceedings
A2 - Zhu, Tianqing
A2 - Zhou, Wanlei
A2 - Zhu, Congcong
PB - Springer Science and Business Media Deutschland GmbH
T2 - 18th International Conference on Knowledge Science, Engineering and Management KSEM 2025
Y2 - 4 August 2025 through 7 August 2025
ER -