TY - GEN
T1 - Towards Efficient SRAM-PIM Architecture Design by Exploiting Unstructured Bit-Level Sparsity
AU - Duan, Cenlin
AU - Yang, Jianlei
AU - Wang, Yiou
AU - Wang, Yikun
AU - Qi, Yingjie
AU - He, Xiaolin
AU - Yan, Bonan
AU - Wang, Xueyan
AU - Jia, Xiaotao
AU - Zhao, Weisheng
N1 - Publisher Copyright:
© 2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.
PY - 2024/11/7
Y1 - 2024/11/7
N2 - Bit-level sparsity in neural network models harbors immense untapped potential. Eliminating redundant calculations of randomly distributed zero-bits significantly boosts computational efficiency. Yet, traditional digital SRAM-PIM architecture, limited by rigid crossbar architecture, struggles to effectively exploit this unstructured sparsity. To address this challenge, we propose Dyadic Block PIM (DB-PIM), a groundbreaking algorithm-architecture co-design framework. First, we propose an algorithm coupled with a distinctive sparsity pattern, termed a dyadic block (DB), that preserves the random distribution of non-zero bits to maintain accuracy while restricting the number of these bits in each weight to improve regularity. Architecturally, we develop a custom PIM macro that includes dyadic block multiplication units (DBMUs) and Canonical Signed Digit (CSD)-based adder trees, specifically tailored for Multiply-Accumulate (MAC) operations. An input pre-processing unit (IPU) further refines performance and efficiency by capitalizing on block-wise input sparsity. Results show that our proposed co-design framework achieves a remarkable speedup of up to 7.69× and energy savings of 83.43%.
AB - Bit-level sparsity in neural network models harbors immense untapped potential. Eliminating redundant calculations of randomly distributed zero-bits significantly boosts computational efficiency. Yet, traditional digital SRAM-PIM architecture, limited by rigid crossbar architecture, struggles to effectively exploit this unstructured sparsity. To address this challenge, we propose Dyadic Block PIM (DB-PIM), a groundbreaking algorithm-architecture co-design framework. First, we propose an algorithm coupled with a distinctive sparsity pattern, termed a dyadic block (DB), that preserves the random distribution of non-zero bits to maintain accuracy while restricting the number of these bits in each weight to improve regularity. Architecturally, we develop a custom PIM macro that includes dyadic block multiplication units (DBMUs) and Canonical Signed Digit (CSD)-based adder trees, specifically tailored for Multiply-Accumulate (MAC) operations. An input pre-processing unit (IPU) further refines performance and efficiency by capitalizing on block-wise input sparsity. Results show that our proposed co-design framework achieves a remarkable speedup of up to 7.69× and energy savings of 83.43%.
KW - PIM
KW - SRAM
KW - algorithm/architecture co-design
KW - bit-level sparsity
UR - https://www.scopus.com/pages/publications/85211088215
U2 - 10.1145/3649329.3655948
DO - 10.1145/3649329.3655948
M3 - 会议稿件
AN - SCOPUS:85211088215
T3 - Proceedings - Design Automation Conference
BT - Proceedings of the 61st ACM/IEEE Design Automation Conference, DAC 2024
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 61st ACM/IEEE Design Automation Conference, DAC 2024
Y2 - 23 June 2024 through 27 June 2024
ER -