TY - JOUR
T1 - Investigating Synthetic-to-Real Transfer Robustness for Stereo Matching and Optical Flow Estimation
AU - Zhang, Jiawei
AU - Li, Jiahe
AU - Huang, Lei
AU - Luo, Haonan
AU - Yu, Xiaohan
AU - Gu, Lin
AU - Zheng, Jin
AU - Bai, Xiao
N1 - Publisher Copyright:
© 1979-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - With advancements in robust stereo matching and optical flow estimation networks, models pre-trained on synthetic data demonstrate strong robustness to unseen domains. However, their robustness can be seriously degraded when fine-tuning them in real-world scenarios. This paper investigates fine-tuning stereo matching and optical flow estimation networks without compromising their robustness to unseen domains. Specifically, we divide the pixels into consistent and inconsistent regions by comparing Ground Truth (GT) with Pseudo Label (PL) and demonstrate that the imbalance learning of consistent and inconsistent regions in GT causes robustness degradation. Based on our analysis, we propose the DKT framework, which utilizes PL to balance the learning of different regions in GT. The core idea is to utilize an exponential moving average (EMA) teacher to measure what the student network has learned and dynamically adjust the learning regions. We further propose the DKT++ framework, which improves target-domain performances and network robustness by applying slow-fast update teachers to generate more accurate PL, introducing the unlabeled data and synthetic data. We integrate our frameworks with state-of-the-art networks and evaluate their effectiveness on several real-world datasets. Extensive experiments show that our method effectively preserves the robustness of stereo matching and optical flow networks during fine-tuning.
AB - With advancements in robust stereo matching and optical flow estimation networks, models pre-trained on synthetic data demonstrate strong robustness to unseen domains. However, their robustness can be seriously degraded when fine-tuning them in real-world scenarios. This paper investigates fine-tuning stereo matching and optical flow estimation networks without compromising their robustness to unseen domains. Specifically, we divide the pixels into consistent and inconsistent regions by comparing Ground Truth (GT) with Pseudo Label (PL) and demonstrate that the imbalance learning of consistent and inconsistent regions in GT causes robustness degradation. Based on our analysis, we propose the DKT framework, which utilizes PL to balance the learning of different regions in GT. The core idea is to utilize an exponential moving average (EMA) teacher to measure what the student network has learned and dynamically adjust the learning regions. We further propose the DKT++ framework, which improves target-domain performances and network robustness by applying slow-fast update teachers to generate more accurate PL, introducing the unlabeled data and synthetic data. We integrate our frameworks with state-of-the-art networks and evaluate their effectiveness on several real-world datasets. Extensive experiments show that our method effectively preserves the robustness of stereo matching and optical flow networks during fine-tuning.
KW - Stereo matching
KW - knowledge distillation
KW - network robustness
KW - optical flow estimation
UR - https://www.scopus.com/pages/publications/105010031796
U2 - 10.1109/TPAMI.2025.3584847
DO - 10.1109/TPAMI.2025.3584847
M3 - 文章
C2 - 40591472
AN - SCOPUS:105010031796
SN - 0162-8828
VL - 47
SP - 9113
EP - 9129
JO - IEEE Transactions on Pattern Analysis and Machine Intelligence
JF - IEEE Transactions on Pattern Analysis and Machine Intelligence
IS - 10
ER -