TY - JOUR
T1 - Learn by Oneself
T2 - Exploiting Weight-Sharing Potential in Knowledge Distillation Guided Ensemble Network
AU - Zhao, Qi
AU - Lyu, Shuchang
AU - Chen, Lijiang
AU - Liu, Binghao
AU - Xu, Ting Bing
AU - Cheng, Guangliang
AU - Feng, Wenquan
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023/11/1
Y1 - 2023/11/1
N2 - Recent CNNs (convolutional neural networks) have become more and more compact. The elegant structure design highly improves the performance of CNNs. With the development of knowledge distillation technique, the performance of CNNs gets further improved. However, existing knowledge distillation guided methods either rely on offline pretrained high-quality large teacher models or online heavy training burden. To solve the above problems, we propose a feature-sharing and weight-sharing based ensemble network (training framework) guided by knowledge distillation (EKD-FWSNet) to make baseline models stronger in terms of representation ability with less training computation and memory cost involved. Specifically, motivated by getting rid of the dependence of offline pretrained teacher model, we design an end-to-end online training scheme to optimize EKD-FWSNet. Motivated by decreasing the online training burden, we only introduce one auxiliary classmate branch to construct multiple forward branches, which will then be integrated as ensemble teacher to guide baseline model. Compared to previous online ensemble training frameworks, EKD-FWSNet can provide diverse output predictions without relying on increasing auxiliary classmate branches. Motivated by maximizing the optimization power of EKD-FWSNet, we exploit the representation potential of weight-sharing blocks and design efficient knowledge distillation mechanism in EKD-FWSNet. Extensive comparison experiments and visualization analysis on benchmark datasets (CIFAR-10/100, tiny-ImageNet, CUB-200 and ImageNet) show that self-learned EKD-FWSNet can boost the performance of baseline models by large margin, which has obvious superiority compared to previous related methods. Extensive analysis also proves the interpretability of EKD-FWSNet. Our code is available at https://github.com/cv516Buaa/EKD-FWSNet.
AB - Recent CNNs (convolutional neural networks) have become more and more compact. The elegant structure design highly improves the performance of CNNs. With the development of knowledge distillation technique, the performance of CNNs gets further improved. However, existing knowledge distillation guided methods either rely on offline pretrained high-quality large teacher models or online heavy training burden. To solve the above problems, we propose a feature-sharing and weight-sharing based ensemble network (training framework) guided by knowledge distillation (EKD-FWSNet) to make baseline models stronger in terms of representation ability with less training computation and memory cost involved. Specifically, motivated by getting rid of the dependence of offline pretrained teacher model, we design an end-to-end online training scheme to optimize EKD-FWSNet. Motivated by decreasing the online training burden, we only introduce one auxiliary classmate branch to construct multiple forward branches, which will then be integrated as ensemble teacher to guide baseline model. Compared to previous online ensemble training frameworks, EKD-FWSNet can provide diverse output predictions without relying on increasing auxiliary classmate branches. Motivated by maximizing the optimization power of EKD-FWSNet, we exploit the representation potential of weight-sharing blocks and design efficient knowledge distillation mechanism in EKD-FWSNet. Extensive comparison experiments and visualization analysis on benchmark datasets (CIFAR-10/100, tiny-ImageNet, CUB-200 and ImageNet) show that self-learned EKD-FWSNet can boost the performance of baseline models by large margin, which has obvious superiority compared to previous related methods. Extensive analysis also proves the interpretability of EKD-FWSNet. Our code is available at https://github.com/cv516Buaa/EKD-FWSNet.
KW - Knowledge distillation
KW - ensemble learning
KW - high-efficiency network
KW - weight-sharing blocks
UR - https://www.scopus.com/pages/publications/85153516874
U2 - 10.1109/TCSVT.2023.3267115
DO - 10.1109/TCSVT.2023.3267115
M3 - 文章
AN - SCOPUS:85153516874
SN - 1051-8215
VL - 33
SP - 6661
EP - 6678
JO - IEEE Transactions on Circuits and Systems for Video Technology
JF - IEEE Transactions on Circuits and Systems for Video Technology
IS - 11
ER -