TY - JOUR
T1 - CIM2PQ
T2 - An Arraywise and Hardware-Friendly Mixed Precision Quantization Method for Analog Computing-In-Memory
AU - Sun, Sifan
AU - Bai, Jinyu
AU - Shi, Zhaoyu
AU - Zhao, Weisheng
AU - Kang, Wang
N1 - Publisher Copyright:
© 1982-2012 IEEE.
PY - 2024/7/1
Y1 - 2024/7/1
N2 - Computing-in-memory (CIM) architecture is a promising convolutional neural network (CNN) accelerator known for its highly efficient matrix-vector multiplications (MVMs). However, due to the low-precision computation and limited size of CIM memory arrays, it is necessary to decompose the huge MVMs into smaller subsets. Conventional NN quantization methods overlook the characteristics of CIM hardware, resulting in diminished system performance and efficiency. This article proposes a mixed precision quantization (MPQ) method based on evolutionary algorithm for CIM-based accelerators, while considering the hardware characteristics of CIM, called CIM2PQ, which can automatically generate quantization strategies for NN model to improve the efficiency of CIM systems. First, inspired by the CIM computing paradigm, an arraywise quantization granularity is introduced in the MPQ search space, which can jointly quantize the inputs, weights, and partial sums. Second, a production procedure containing fine-grained crossover and progressive adaptive mutation is proposed, which can efficiently explore the search space and speed up the search process. Third, we propose a fast and efficient strategy evaluation method to obtain the performance of quantization strategy on the CIM platform, saving the evaluation time significantly without requiring fine-Tuning. Finally, to protect CIM-friendly strategies with lower bit-widths but worse-Algorithm performance, we propose a strategy selection method based on multiobjective optimization, named qNSGA-III. The effectiveness of the proposed method has been demonstrated through experimental results of various NNs and datasets. For ResNet-18, the hardware efficiency and accuracy can be improved to 117% with 7.05%, 113% with 3.37%, and 119% with 5.78%, on CIFAR-10, CIFAR-100, and ImageNet, respectively, compared to the baseline MPQ method.
AB - Computing-in-memory (CIM) architecture is a promising convolutional neural network (CNN) accelerator known for its highly efficient matrix-vector multiplications (MVMs). However, due to the low-precision computation and limited size of CIM memory arrays, it is necessary to decompose the huge MVMs into smaller subsets. Conventional NN quantization methods overlook the characteristics of CIM hardware, resulting in diminished system performance and efficiency. This article proposes a mixed precision quantization (MPQ) method based on evolutionary algorithm for CIM-based accelerators, while considering the hardware characteristics of CIM, called CIM2PQ, which can automatically generate quantization strategies for NN model to improve the efficiency of CIM systems. First, inspired by the CIM computing paradigm, an arraywise quantization granularity is introduced in the MPQ search space, which can jointly quantize the inputs, weights, and partial sums. Second, a production procedure containing fine-grained crossover and progressive adaptive mutation is proposed, which can efficiently explore the search space and speed up the search process. Third, we propose a fast and efficient strategy evaluation method to obtain the performance of quantization strategy on the CIM platform, saving the evaluation time significantly without requiring fine-Tuning. Finally, to protect CIM-friendly strategies with lower bit-widths but worse-Algorithm performance, we propose a strategy selection method based on multiobjective optimization, named qNSGA-III. The effectiveness of the proposed method has been demonstrated through experimental results of various NNs and datasets. For ResNet-18, the hardware efficiency and accuracy can be improved to 117% with 7.05%, 113% with 3.37%, and 119% with 5.78%, on CIFAR-10, CIFAR-100, and ImageNet, respectively, compared to the baseline MPQ method.
KW - Computing-in-memory (CIM)
KW - mixed precision quantization (MPQ)
KW - neural network
KW - post-Training quantization (PTQ)
UR - https://www.scopus.com/pages/publications/85183986611
U2 - 10.1109/TCAD.2024.3358609
DO - 10.1109/TCAD.2024.3358609
M3 - 文章
AN - SCOPUS:85183986611
SN - 0278-0070
VL - 43
SP - 2084
EP - 2097
JO - IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
JF - IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
IS - 7
ER -