TY - JOUR
T1 - A Hardware-Friendly Joint Denoising and Demosaicing System Based on Efficient FPGA Implementation
AU - Wang, Jiqing
AU - Wang, Xiang
AU - Shen, Yu
N1 - Publisher Copyright:
© 2025 by the authors.
PY - 2026/1
Y1 - 2026/1
N2 - This paper designs a hardware-implementable joint denoising and demosaicing acceleration system. Firstly, a lightweight network architecture with multi-scale feature extraction based on partial convolution is proposed at the algorithm level. The partial convolution scheme can reduce the redundancy of filters and feature maps, thereby reducing memory accesses, and achieve excellent visual effects with a smaller model complexity. In addition, multi-scale extraction can expand the receptive field while reducing model parameters. Then, we apply separable convolution and partial convolution to reduce the parameters of the model. Compared with the standard convolutional solution, the parameters and MACs are reduced by 83.38% and 77.71%, respectively. Moreover, different networks bring different memory access and complex computing methods; thus, we introduce a unified and flexibly configurable hardware acceleration processing platform and implement it on the Xilinx Zynq UltraScale + FPGA board. Finally, compared with the state-of-the-art neural network solution on the Kodak24 set, the peak signal-to-noise ratio and the structural similarity index measure are approximately improved by 2.36dB and 0.0806, respectively, and the computing efficiency is improved by 2.09×. Furthermore, the hardware architecture supports multi-parallelism and can adapt to the different edge-embedded scenarios. Overall, the image processing task solution proposed in this paper has positive advantages in the joint denoising and demosaicing system.
AB - This paper designs a hardware-implementable joint denoising and demosaicing acceleration system. Firstly, a lightweight network architecture with multi-scale feature extraction based on partial convolution is proposed at the algorithm level. The partial convolution scheme can reduce the redundancy of filters and feature maps, thereby reducing memory accesses, and achieve excellent visual effects with a smaller model complexity. In addition, multi-scale extraction can expand the receptive field while reducing model parameters. Then, we apply separable convolution and partial convolution to reduce the parameters of the model. Compared with the standard convolutional solution, the parameters and MACs are reduced by 83.38% and 77.71%, respectively. Moreover, different networks bring different memory access and complex computing methods; thus, we introduce a unified and flexibly configurable hardware acceleration processing platform and implement it on the Xilinx Zynq UltraScale + FPGA board. Finally, compared with the state-of-the-art neural network solution on the Kodak24 set, the peak signal-to-noise ratio and the structural similarity index measure are approximately improved by 2.36dB and 0.0806, respectively, and the computing efficiency is improved by 2.09×. Furthermore, the hardware architecture supports multi-parallelism and can adapt to the different edge-embedded scenarios. Overall, the image processing task solution proposed in this paper has positive advantages in the joint denoising and demosaicing system.
KW - algorithm-hardware co-optimized
KW - demosaicing and denoising
KW - partial convolution
KW - reconfigurable architecture
KW - unified acceleration computing platform
UR - https://www.scopus.com/pages/publications/105028778756
U2 - 10.3390/mi17010044
DO - 10.3390/mi17010044
M3 - 文章
AN - SCOPUS:105028778756
SN - 2072-666X
VL - 17
JO - Micromachines
JF - Micromachines
IS - 1
M1 - 44
ER -