TY - JOUR
T1 - End-to-End Semantic Segmentation Utilizing Multi-Scale Baseline Light Field
AU - Cong, Ruixuan
AU - Sheng, Hao
AU - Yang, Dazhi
AU - Yang, Da
AU - Chen, Rongshan
AU - Wang, Sizhe
AU - Cui, Zhenglong
N1 - Publisher Copyright:
© 1991-2012 IEEE.
PY - 2024
Y1 - 2024
N2 - Semantic segmentation based on 4D light field (LF) images exhibits superior performance by exploiting rich spatial and angular information. However, current methods only focus on narrow-baseline cases, ignoring the feasibility and capability of large disparity scene for segmentation. Motivated by this, we propose a novel network called LF-IENet++ suitable for both narrow-baseline LF and wide-baseline LF in this paper, which fully mines complementary information across views via implicit feature integration and explicit feature propagation. In order to concentrate on inconsistent context between view images during feature integration, we shield small disparity regions manifested as repeat content to avoid redundant attention. Besides, a two-stage operation consisting of the image-level warping and feature-level warping is introduced to mitigate the propagation distortion. Since both feature integration and feature propagation require exact guidance from prior disparity, we design a semantic-aware disparity estimator that leverages semantic cues to optimize disparity generation while ensuring that our network can perform semantic segmentation in an end-to-end solution. To validate the effectiveness of the proposed method, we present the first multi-scale baseline dataset for LF semantic segmentation. Compared to state-of-the-art methods, our LF-IENet++ achieves outstanding performance and shows high robustness under different disparity situations. Besides, our method obtains higher accuracy on wide-baseline cases, demonstrating the significance of introducing large disparity LF for semantic segmentation.
AB - Semantic segmentation based on 4D light field (LF) images exhibits superior performance by exploiting rich spatial and angular information. However, current methods only focus on narrow-baseline cases, ignoring the feasibility and capability of large disparity scene for segmentation. Motivated by this, we propose a novel network called LF-IENet++ suitable for both narrow-baseline LF and wide-baseline LF in this paper, which fully mines complementary information across views via implicit feature integration and explicit feature propagation. In order to concentrate on inconsistent context between view images during feature integration, we shield small disparity regions manifested as repeat content to avoid redundant attention. Besides, a two-stage operation consisting of the image-level warping and feature-level warping is introduced to mitigate the propagation distortion. Since both feature integration and feature propagation require exact guidance from prior disparity, we design a semantic-aware disparity estimator that leverages semantic cues to optimize disparity generation while ensuring that our network can perform semantic segmentation in an end-to-end solution. To validate the effectiveness of the proposed method, we present the first multi-scale baseline dataset for LF semantic segmentation. Compared to state-of-the-art methods, our LF-IENet++ achieves outstanding performance and shows high robustness under different disparity situations. Besides, our method obtains higher accuracy on wide-baseline cases, demonstrating the significance of introducing large disparity LF for semantic segmentation.
KW - Light field
KW - multi-scale baseline dataset
KW - semantic segmentation
KW - semantic-aware disparity estimator
UR - https://www.scopus.com/pages/publications/85186079670
U2 - 10.1109/TCSVT.2024.3367370
DO - 10.1109/TCSVT.2024.3367370
M3 - 文章
AN - SCOPUS:85186079670
SN - 1051-8215
VL - 34
SP - 5790
EP - 5804
JO - IEEE Transactions on Circuits and Systems for Video Technology
JF - IEEE Transactions on Circuits and Systems for Video Technology
IS - 7
ER -