TY - GEN
T1 - Software transactional memory for GPU architectures
AU - Xu, Yunlong
AU - Wang, Rui
AU - Goswami, Nilanjan
AU - Li, Tao
AU - Gao, Lan
AU - Qian, Depei
PY - 2014
Y1 - 2014
N2 - Modern GPUs have shown promising results in accel- erating computation intensive and numerical workloads with limited dynamic data sharing. However, many real-world applications manifest ample amount of data sharing among concurrently executing threads. Often data sharing requires mutual exclusion mechanism to ensure data integrity in multithreaded environment. Although modern GPUs provide atomic primitives that can be leveraged to construct fine-grained locks, lock-based synchronization requires significant programming efforts to achieve func- Tional correctness. The massive multithreading and SIMT execution paradigm of GPUs further extend the challenges of GPU locks. To make applications with dynamic data sharing benefit from GPU acceleration, we propose a novel software transac- Tional memory system for GPU architectures (GPU-STM). The major challenges include ensuring good scalability with respect to the massive multithreading of GPUs, and preventing livelocks caused by the SIMT execution paradigm of GPUs. To this end, we propose (1) a hierarchical valida- Tion technique and (2) an encounter-time lock-sorting mech- Anism to deal with the two challenges, respectively. We build our GPU-STM prototype based on the commercially avail- Able GPU platform and runtime. Our real system based evaluation shows that GPU-STM outperforms coarse-grain locks on GPUs by up to 20x.
AB - Modern GPUs have shown promising results in accel- erating computation intensive and numerical workloads with limited dynamic data sharing. However, many real-world applications manifest ample amount of data sharing among concurrently executing threads. Often data sharing requires mutual exclusion mechanism to ensure data integrity in multithreaded environment. Although modern GPUs provide atomic primitives that can be leveraged to construct fine-grained locks, lock-based synchronization requires significant programming efforts to achieve func- Tional correctness. The massive multithreading and SIMT execution paradigm of GPUs further extend the challenges of GPU locks. To make applications with dynamic data sharing benefit from GPU acceleration, we propose a novel software transac- Tional memory system for GPU architectures (GPU-STM). The major challenges include ensuring good scalability with respect to the massive multithreading of GPUs, and preventing livelocks caused by the SIMT execution paradigm of GPUs. To this end, we propose (1) a hierarchical valida- Tion technique and (2) an encounter-time lock-sorting mech- Anism to deal with the two challenges, respectively. We build our GPU-STM prototype based on the commercially avail- Able GPU platform and runtime. Our real system based evaluation shows that GPU-STM outperforms coarse-grain locks on GPUs by up to 20x.
KW - General-purpose GPU computing
KW - Parallel programming
KW - Software transactional memory
UR - https://www.scopus.com/pages/publications/84900622009
U2 - 10.1145/2544137.2544139
DO - 10.1145/2544137.2544139
M3 - 会议稿件
AN - SCOPUS:84900622009
SN - 9781450326704
T3 - Proceedings of the 12th ACM/IEEE International Symposium on Code Generation and Optimization, CGO 2014
SP - 1
EP - 10
BT - Proceedings of the 12th ACM/IEEE International Symposium on Code Generation and Optimization, CGO 2014
PB - Association for Computing Machinery
T2 - 12th ACM/IEEE International Symposium on Code Generation and Optimization, CGO 2014
Y2 - 15 February 2014 through 19 February 2014
ER -