TY - GEN
T1 - Compression Format and Systolic Array Structure Co-design for Accelerating Sparse Matrix Multiplication in DNNs
AU - Cao, Yongxiang
AU - Jiang, Jixiang
AU - Zhao, Guocheng
AU - Wang, Wei
AU - Jiang, Hongxu
AU - Song, Yanfei
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.
PY - 2025
Y1 - 2025
N2 - The systolic array (SA) architecture is widely used in accelerator/AI chip design due to its excellent matrix multiplication acceleration effect. However, the mismatch between the traditional sparse matrix compression format and the SA architecture results in poor SA performance in computing sparse matrices. To solve the problem, we propose a method for co-designing the sparse compression format and SA architecture. This work proposes the Vector Group Compressed Coordinate (VGCC) data stream compression format and designs a sparse SA architecture based on hardware encoding and decoding. VGCC can handle data streams of any size without increasing the index bit width. The VGCC compression size is significantly smaller than general sparse compression formats such as COO/CSR. In a sparse matrix with a sparsity of 90%, it saves 67.13% of storage space compared to the state-of-the-art ECOO data stream compression format and achieves an average acceleration of 1.54 times. We also propose a VGCC-friendly SA architecture. Our SA architecture engine reduces logic consumption by 51.6% compared to Sparse TPU while achieving a 3.23x speedup. To mitigate the issue of high no-load rates in systolic arrays, we introduce a Data Transmission Priority (DTP) data matching algorithm. With sparsity levels ranging from 50% to 90%, the algorithm’s calculation speed is enhanced by 1.124x to 1.991x. We implemented a sparse DNN accelerator on an FPGA Zynq UltraScale+MPSoC ZCU102, utilizing the VGCC compression format proposed in this study for compressing input data. When compared to the S2 Engine, the accelerator achieved a 3.37x speed improvement in matrix multiplication with 75% sparsity, significantly enhancing the speed of sparse matrix multiplication. The accelerator achieved an average throughput of 1079.33GOP/s when inferencing DNNs.
AB - The systolic array (SA) architecture is widely used in accelerator/AI chip design due to its excellent matrix multiplication acceleration effect. However, the mismatch between the traditional sparse matrix compression format and the SA architecture results in poor SA performance in computing sparse matrices. To solve the problem, we propose a method for co-designing the sparse compression format and SA architecture. This work proposes the Vector Group Compressed Coordinate (VGCC) data stream compression format and designs a sparse SA architecture based on hardware encoding and decoding. VGCC can handle data streams of any size without increasing the index bit width. The VGCC compression size is significantly smaller than general sparse compression formats such as COO/CSR. In a sparse matrix with a sparsity of 90%, it saves 67.13% of storage space compared to the state-of-the-art ECOO data stream compression format and achieves an average acceleration of 1.54 times. We also propose a VGCC-friendly SA architecture. Our SA architecture engine reduces logic consumption by 51.6% compared to Sparse TPU while achieving a 3.23x speedup. To mitigate the issue of high no-load rates in systolic arrays, we introduce a Data Transmission Priority (DTP) data matching algorithm. With sparsity levels ranging from 50% to 90%, the algorithm’s calculation speed is enhanced by 1.124x to 1.991x. We implemented a sparse DNN accelerator on an FPGA Zynq UltraScale+MPSoC ZCU102, utilizing the VGCC compression format proposed in this study for compressing input data. When compared to the S2 Engine, the accelerator achieved a 3.37x speed improvement in matrix multiplication with 75% sparsity, significantly enhancing the speed of sparse matrix multiplication. The accelerator achieved an average throughput of 1079.33GOP/s when inferencing DNNs.
KW - Co-design
KW - Sparse matrix compression format
KW - Sparse matrix multiplication
KW - Systolic arrays architecture
UR - https://www.scopus.com/pages/publications/85219166157
U2 - 10.1007/978-981-96-1545-2_8
DO - 10.1007/978-981-96-1545-2_8
M3 - 会议稿件
AN - SCOPUS:85219166157
SN - 9789819615445
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 114
EP - 133
BT - Algorithms and Architectures for Parallel Processing - 24th International Conference, ICA3PP 2024, Proceedings
A2 - Zhu, Tianqing
A2 - Li, Jin
A2 - Castiglione, Aniello
PB - Springer Science and Business Media Deutschland GmbH
T2 - 24th International Conference on Algorithms and Architectures for Parallel Processing, ICA3PP 2024
Y2 - 29 October 2024 through 31 October 2024
ER -