TY - GEN
T1 - DRStencil
T2 - 23rd IEEE International Conference on High Performance Computing and Communications, 7th IEEE International Conference on Data Science and Systems, 19th IEEE International Conference on Smart City and 7th IEEE International Conference on Dependability in Sensor, Cloud and Big Data Systems and Applications, HPCC-DSS-SmartCity-DependSys 2021
AU - You, Xin
AU - Yang, Hailong
AU - Jiang, Zhonghui
AU - Luan, Zhongzhi
AU - Qian, Depei
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2022
Y1 - 2022
N2 - Stencil computation is widely adopted in scientific applications as one of the most significant computation patterns. Although there are various optimizations proposed to accelerate the stencil computation, the low-order stencil still suffers from limited performance on GPU due to its low computation inten-sity. In this paper, we propose the fusion-partition optimization techniques to accelerate the low-order stencil computation and implement an effective code generation framework DRStencil to automatically generate optimized stencil codes with fusion-partition applied. Specifically, we adopt a four-stage optimization workflow such as time-fusion, partition, forward and backward computation. We also propose an auto-tuning method to deter-mine the optimal parameter settings of the generated stencil codes. We evaluate DRStencil with representative low-order stencils on Nvidia P100, V100, and A100 GPUs. Our evaluation results achieve 1.46 x, 1.59 x, and 1.10 x speedup on average for widely used low-order stencils compared to the state-of-the-art implementations on P100, V100, and A100 GPUs, respectively.
AB - Stencil computation is widely adopted in scientific applications as one of the most significant computation patterns. Although there are various optimizations proposed to accelerate the stencil computation, the low-order stencil still suffers from limited performance on GPU due to its low computation inten-sity. In this paper, we propose the fusion-partition optimization techniques to accelerate the low-order stencil computation and implement an effective code generation framework DRStencil to automatically generate optimized stencil codes with fusion-partition applied. Specifically, we adopt a four-stage optimization workflow such as time-fusion, partition, forward and backward computation. We also propose an auto-tuning method to deter-mine the optimal parameter settings of the generated stencil codes. We evaluate DRStencil with representative low-order stencils on Nvidia P100, V100, and A100 GPUs. Our evaluation results achieve 1.46 x, 1.59 x, and 1.10 x speedup on average for widely used low-order stencils compared to the state-of-the-art implementations on P100, V100, and A100 GPUs, respectively.
KW - GPU
KW - Low-order Stencil
KW - Performance Optimization
KW - Semi-Stencil
KW - Stencil Computation
KW - Time Fusion
UR - https://www.scopus.com/pages/publications/85132402900
U2 - 10.1109/HPCC-DSS-SmartCity-DependSys53884.2021.00036
DO - 10.1109/HPCC-DSS-SmartCity-DependSys53884.2021.00036
M3 - 会议稿件
AN - SCOPUS:85132402900
T3 - 2021 IEEE 23rd International Conference on High Performance Computing and Communications, 7th International Conference on Data Science and Systems, 19th International Conference on Smart City and 7th International Conference on Dependability in Sensor, Cloud and Big Data Systems and Applications, HPCC-DSS-SmartCity-DependSys 2021
SP - 63
EP - 70
BT - 2021 IEEE 23rd International Conference on High Performance Computing and Communications, 7th International Conference on Data Science and Systems, 19th International Conference on Smart City and 7th International Conference on Dependability in Sensor, Cloud and Big Data Systems and Applications, HPCC-DSS-SmartCity-DependSys 2021
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 20 December 2021 through 22 December 2021
ER -