TY - JOUR
T1 - Diffusion Self-Distillation for Remote Sensing Scene Classification
AU - Hu, Yutao
AU - Zhang, Lei
AU - Luo, Xiaoyan
AU - Cao, Xianbin
N1 - Publisher Copyright:
© 1980-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - Remote sensing scene classification, a fundamental task in remote image analysis, has obtained rapid progress due to the powerful capabilities of convolutional neural networks (CNNs). Achieving precise classification performance heavily relies on the feature extraction capacity of the network. However, due to the large variation and severe distortion within the images, extracting robust feature representations is necessary but challenging. Self-distillation could enhance the shallow layers by providing stronger gradients and more accurate supervision from deeper layers, thereby promoting the extraction of spatially detailed features. Nonetheless, due to the limited capacity of shallow layers to learn truly valuable knowledge, shallow layer features can be viewed as the noisy version of deep layer features and contain more disruptive factors, which significantly impedes the effectiveness of self-distillation. To address this issue, in this article, we establish the diffusion self-distillation network (DSDNet), which incorporates the conditional diffusion denoising model into the self-distillation framework. Specifically, DSDNet filters noise from shallow features through the diffusion denoising process, enabling more precise and accurate distillation between the refined student features and the teacher features. Extensive experiments on four challenging remote sensing datasets demonstrate that the proposed DSDNet achieves significant performance improvements over various backbone networks with negligible increases in parameters, delivering state-of-the-art classification performance.
AB - Remote sensing scene classification, a fundamental task in remote image analysis, has obtained rapid progress due to the powerful capabilities of convolutional neural networks (CNNs). Achieving precise classification performance heavily relies on the feature extraction capacity of the network. However, due to the large variation and severe distortion within the images, extracting robust feature representations is necessary but challenging. Self-distillation could enhance the shallow layers by providing stronger gradients and more accurate supervision from deeper layers, thereby promoting the extraction of spatially detailed features. Nonetheless, due to the limited capacity of shallow layers to learn truly valuable knowledge, shallow layer features can be viewed as the noisy version of deep layer features and contain more disruptive factors, which significantly impedes the effectiveness of self-distillation. To address this issue, in this article, we establish the diffusion self-distillation network (DSDNet), which incorporates the conditional diffusion denoising model into the self-distillation framework. Specifically, DSDNet filters noise from shallow features through the diffusion denoising process, enabling more precise and accurate distillation between the refined student features and the teacher features. Extensive experiments on four challenging remote sensing datasets demonstrate that the proposed DSDNet achieves significant performance improvements over various backbone networks with negligible increases in parameters, delivering state-of-the-art classification performance.
KW - Diffusion
KW - knowledge transfer
KW - remote sensing scene classification
KW - self-distillation
UR - https://www.scopus.com/pages/publications/105005176279
U2 - 10.1109/TGRS.2025.3569616
DO - 10.1109/TGRS.2025.3569616
M3 - 文章
AN - SCOPUS:105005176279
SN - 0196-2892
VL - 63
JO - IEEE Transactions on Geoscience and Remote Sensing
JF - IEEE Transactions on Geoscience and Remote Sensing
M1 - 5626315
ER -