TY - JOUR
T1 - Learning Continuous Spatiotemporal Implicit Neural Fields for Unsupervised Video Denoising
AU - Hu, Xiaowan
AU - Liu, Henan
AU - Zheng, Ce
AU - Li, Xinyang
AU - Xu, Mai
N1 - Publisher Copyright:
© 1979-2012 IEEE.
PY - 2026
Y1 - 2026
N2 - Video denoising is fundamental to low-level vision and real-world imaging, yet existing self-supervised methods remain fragile under severe noise and complex motion. Most approaches still rely on spatially and temporally discrete grid-based representations: blind-spot networks enforce J-invariance by masking center pixels with a limited receptive field, while recurrent models build temporal dependencies on discretized frame sequences and noise-sensitive optical flow, leading to error accumulation and motion artifacts. We address this model bottleneck by reformulating self-supervised video denoising as learning a continuous spatiotemporal implicit field. Building on coordinate-based implicit neural representations, we propose a unified video denoising model with a spatiotemporal implicit neural field (SINF). In the spatial domain, blind-spot implicit spatial field maps coordinates directly to pixel-level representations, enabling globally informed texture recovery beyond receptive-field limits. In the temporal domain, an implicit temporal embedding with periodic activations encodes motion continuously over time, while a time-aware spatial graph module refines cross-frame alignment. Together, SINF remodels discretized video signals into a continuous spatiotemporal intensity field, enabling more robust pixel-wise associations than coarse optical flow. Extensive experiments on synthetic and real noisy video benchmarks demonstrate that our SINF achieves state-of-the-art performance on synthetic and real noisy video benchmarks.
AB - Video denoising is fundamental to low-level vision and real-world imaging, yet existing self-supervised methods remain fragile under severe noise and complex motion. Most approaches still rely on spatially and temporally discrete grid-based representations: blind-spot networks enforce J-invariance by masking center pixels with a limited receptive field, while recurrent models build temporal dependencies on discretized frame sequences and noise-sensitive optical flow, leading to error accumulation and motion artifacts. We address this model bottleneck by reformulating self-supervised video denoising as learning a continuous spatiotemporal implicit field. Building on coordinate-based implicit neural representations, we propose a unified video denoising model with a spatiotemporal implicit neural field (SINF). In the spatial domain, blind-spot implicit spatial field maps coordinates directly to pixel-level representations, enabling globally informed texture recovery beyond receptive-field limits. In the temporal domain, an implicit temporal embedding with periodic activations encodes motion continuously over time, while a time-aware spatial graph module refines cross-frame alignment. Together, SINF remodels discretized video signals into a continuous spatiotemporal intensity field, enabling more robust pixel-wise associations than coarse optical flow. Extensive experiments on synthetic and real noisy video benchmarks demonstrate that our SINF achieves state-of-the-art performance on synthetic and real noisy video benchmarks.
KW - implicit neural representation
KW - self-supervised learning
KW - spatiotemporal modeling
KW - Video denoising
UR - https://www.scopus.com/pages/publications/105034836093
U2 - 10.1109/TPAMI.2026.3680159
DO - 10.1109/TPAMI.2026.3680159
M3 - 文章
AN - SCOPUS:105034836093
SN - 0162-8828
JO - IEEE Transactions on Pattern Analysis and Machine Intelligence
JF - IEEE Transactions on Pattern Analysis and Machine Intelligence
ER -