TY - GEN
T1 - Jigsaw
T2 - 53rd International Conference on Parallel Processing, ICPP 2024
AU - Zhang, Kaige
AU - Liu, Xiaoyan
AU - Yang, Hailong
AU - Feng, Tianyu
AU - Yang, Xinyu
AU - Liu, Yi
AU - Luan, Zhongzhi
AU - Qian, Depei
N1 - Publisher Copyright:
© 2024 Owner/Author.
PY - 2024/8/12
Y1 - 2024/8/12
N2 - As deep learning models continue to grow larger, model pruning is employed to reduce memory footprint and computation complexity, which generates a large number of sparse matrix-matrix multiplication (SpMM) with unstructured sparsity (e.g., vector sparsity). However, leveraging GPU especially the newly integrated sparse tensor core (SpTC) to accelerate SpMM is quite challenging due to the unstructured sparsity. Unfortunately, existing works fail to fully exploit the SpTC on GPU due to the difficulty of satisfying the stringent requirement for restricted sparsity (e.g., 2:4 sparsity). In this paper, we propose Jigsaw, a novel method to utilize SpTC for accelerating SpMM with vector sparsity. Specifically, we propose the multi-granularity sparsity reorder method to transform the sparse data for satisfying the sparse pattern supported on SpTC. In addition, we propose a reorder-aware storage format for the transformed sparse data to better adapt to the parallelism of SpTC. Moreover, we propose corresponding optimizations to better exploit the SpTC for further accelerating SpMM. The experiment results demonstrate that Jigsaw outperforms state-of-the-art SpMM implementations and achieves promising speedup over cuBLAS.
AB - As deep learning models continue to grow larger, model pruning is employed to reduce memory footprint and computation complexity, which generates a large number of sparse matrix-matrix multiplication (SpMM) with unstructured sparsity (e.g., vector sparsity). However, leveraging GPU especially the newly integrated sparse tensor core (SpTC) to accelerate SpMM is quite challenging due to the unstructured sparsity. Unfortunately, existing works fail to fully exploit the SpTC on GPU due to the difficulty of satisfying the stringent requirement for restricted sparsity (e.g., 2:4 sparsity). In this paper, we propose Jigsaw, a novel method to utilize SpTC for accelerating SpMM with vector sparsity. Specifically, we propose the multi-granularity sparsity reorder method to transform the sparse data for satisfying the sparse pattern supported on SpTC. In addition, we propose a reorder-aware storage format for the transformed sparse data to better adapt to the parallelism of SpTC. Moreover, we propose corresponding optimizations to better exploit the SpTC for further accelerating SpMM. The experiment results demonstrate that Jigsaw outperforms state-of-the-art SpMM implementations and achieves promising speedup over cuBLAS.
KW - deep learning optimization
KW - sparse matrix reordering
KW - sparse matrix-matrix multiplication
KW - sparse tensor core
UR - https://www.scopus.com/pages/publications/85202453199
U2 - 10.1145/3673038.3673108
DO - 10.1145/3673038.3673108
M3 - 会议稿件
AN - SCOPUS:85202453199
T3 - ACM International Conference Proceeding Series
SP - 1124
EP - 1134
BT - 53rd International Conference on Parallel Processing, ICPP 2024 - Main Conference Proceedings
PB - Association for Computing Machinery
Y2 - 12 August 2024 through 15 August 2024
ER -