TY - JOUR
T1 - Out-of-Distribution Semantic Segmentation with Disentangled and Calibrated Representation
AU - Wan, Maoxian
AU - Li, Kaige
AU - Geng, Qichuan
AU - Su, Binyi
AU - Zhou, Zhong
N1 - Publisher Copyright:
© 1991-2012 IEEE.
PY - 2026
Y1 - 2026
N2 - Out-of-distribution (OoD) semantic segmentation aims to recognize pixels of classes undefined in the training dataset. Existing methods mostly focus on training the model to fit real OoD data samples to identify OoD pixels, which requires extra data collection and annotation efforts. By contrast, synthesizing OoD data with training data provides a more resource-efficient alternative. However, synthetic data generated from controlled settings lacks diversity, causing the model to suffer from overfitting. To this end, we propose a disentangled representation learning (DRL) method to guide the model to disentangle semantic-related and semantic-unrelated features from synthetic OoD data. DRL encourages the model to utilize the former to identify semantic categories, rather than overfitting to such semantic-unrelated features as synthetic artificiality. Specifically, DRL first incorporates two disentanglers to extract the semantic-related and -unrelated features and then applies a shuffle and reconstruction mechanism to regularize the disentangled features. Furthermore, to facilitate disentangling, we propose a pixel-wise feature similarity calibration (PSC) module, which utilizes more accurate ID-OoD similarity to calibrate inaccurate ID-OoD similarity learned exclusively from ID data. Thus, PSC delivers accurate and stable pixel-wise features for effective disentangling. Extensive experiments illustrate that the proposed method exhibits strong generalization ability. It attains 74.04% AuPRC and 20.82% FPR on Road Anomaly, 69.85% AuPRC and 5.78% FPR on Fishyscapes LostAndFound Validation Set, using SegFormer with the MiT-B5 backbone. Source code is available at https://github.com/WanMotion/DisentangledOoDSeg.
AB - Out-of-distribution (OoD) semantic segmentation aims to recognize pixels of classes undefined in the training dataset. Existing methods mostly focus on training the model to fit real OoD data samples to identify OoD pixels, which requires extra data collection and annotation efforts. By contrast, synthesizing OoD data with training data provides a more resource-efficient alternative. However, synthetic data generated from controlled settings lacks diversity, causing the model to suffer from overfitting. To this end, we propose a disentangled representation learning (DRL) method to guide the model to disentangle semantic-related and semantic-unrelated features from synthetic OoD data. DRL encourages the model to utilize the former to identify semantic categories, rather than overfitting to such semantic-unrelated features as synthetic artificiality. Specifically, DRL first incorporates two disentanglers to extract the semantic-related and -unrelated features and then applies a shuffle and reconstruction mechanism to regularize the disentangled features. Furthermore, to facilitate disentangling, we propose a pixel-wise feature similarity calibration (PSC) module, which utilizes more accurate ID-OoD similarity to calibrate inaccurate ID-OoD similarity learned exclusively from ID data. Thus, PSC delivers accurate and stable pixel-wise features for effective disentangling. Extensive experiments illustrate that the proposed method exhibits strong generalization ability. It attains 74.04% AuPRC and 20.82% FPR on Road Anomaly, 69.85% AuPRC and 5.78% FPR on Fishyscapes LostAndFound Validation Set, using SegFormer with the MiT-B5 backbone. Source code is available at https://github.com/WanMotion/DisentangledOoDSeg.
KW - Out-of-distribution
KW - disentangled representations
KW - semantic segmentation
UR - https://www.scopus.com/pages/publications/105012745138
U2 - 10.1109/TCSVT.2025.3597071
DO - 10.1109/TCSVT.2025.3597071
M3 - 文章
AN - SCOPUS:105012745138
SN - 1051-8215
VL - 36
SP - 971
EP - 985
JO - IEEE Transactions on Circuits and Systems for Video Technology
JF - IEEE Transactions on Circuits and Systems for Video Technology
IS - 1
ER -