Abstract
To improve the performance of sparse Cholesky factorization, existing research divides the adjacent columns of the sparse matrix with the same nonzero patterns into supernodes for parallelization. However, due to the various structures of sparse matrices, the computation of the generated supernodes varies significantly, and thus hard to optimize when computed by dense matrix kernels. Therefore, how to efficiently map sparse Choleksy factorization to the emerging architectures, such as Sunway many-core processor, remains an active research direction. In this article, we propose swCholesky, which is a highly optimized implementation of sparse Cholesky factorization on Sunway processor. Specifically, we design three kernel task queues and a dense matrix library to dynamically adapt to the kernel characteristics and architecture features. In addition, we propose an auto-tuning mechanism to search for the optimal settings of the important parameters in swCholesky. Our experiments show that swCholesky achieves better performance than state-of-the-art implementations.
| Original language | English |
|---|---|
| Article number | 8903486 |
| Pages (from-to) | 1636-1650 |
| Number of pages | 15 |
| Journal | IEEE Transactions on Parallel and Distributed Systems |
| Volume | 31 |
| Issue number | 7 |
| DOIs | |
| State | Published - 1 Jul 2020 |
Keywords
- Sparse Cholesky factorization
- Sunway architecture
- performance optimization
Fingerprint
Dive into the research topics of 'Accelerating Sparse Cholesky Factorization on Sunway Manycore Architecture'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver