Abstract
Stencil is one of the indispensable computation patterns in scientific applications, which is a long-standing optimization target in the field of high performance computing (HPC). The Sunway processor adopted in Sunway TaihuLight supercomputer has demonstrated its performance potential with unique heterogeneous many-core architecture. Although a large number of optimization methods have been proposed, the memory-bound nature of stencil computation and the limited bandwidth of Sunway processor make it challenging to adapt stencil computation efficiently on Sunway processor. To better use the computation capability of Sunway processor, we propose a combined tiling optimization of stencil computation tailored for the architectural features. In addition, we implement double buffering, vectorization, and register communication to further accelerate stencil computation on Sunway processor. We evaluate our method on six stencil benchmarks with different orders and shapes (thus different memory access patterns and computation intensities). The experimental results show that our implementation can achieve 1.97 × speedup on average compared to the state-of-the-art stencil implementation on Sunway.
| Original language | English |
|---|---|
| Pages (from-to) | 322-333 |
| Number of pages | 12 |
| Journal | CCF Transactions on High Performance Computing |
| Volume | 5 |
| Issue number | 3 |
| DOIs | |
| State | Published - Sep 2023 |
Keywords
- Combined tiling
- Performance optimization
- Stencil computation
- Sunway processor
Fingerprint
Dive into the research topics of 'Adapting combined tiling to stencil optimizations on sunway processor'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver