TY - GEN
T1 - Towards a general and efficient linked-list hash table on GPUs
AU - Gao, Lan
AU - Xu, Yunlong
AU - Xu, Chongyang
AU - Wang, Rui
AU - Yang, Hailong
AU - Luan, Zhongzhi
AU - Qian, Depei
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/8
Y1 - 2019/8
N2 - Hash table is a fundamental indexing data structure, and extensively used in many applications. Recently, several hash table implementations have been proposed on GPUs. However, they have two pitfalls. Firstly, their synchronization approaches can only ensure the atomicity among individual operations. This makes them can hardly be used in some real-world applications (e.g., online transaction processing, OLTP), which require groups of operations to be executed atomically. Secondly, they use either a GPU thread or a warp of GPU threads to process an operation at a time (named thread-and warp-level execution respectively). The former one incurs more branch divergences, while the latter one requires all threads within a warp to be active. To this end, we (1) propose a readers-writer lock based, bucket-level synchronization to realize the atomicity among both individual and groups of operations, and (2) combine thread-and warp-level executions to optimize system performance. Moreover, we optimize the data structure of linked-list with respect to GPU architectures. Putting them together, we realize a general and efficient linked-list hash table on GPUs, named GEL-HASH. Compared with existing GPU hash table implementations, GEL-HASH can easily support more real-world applications. Moreover, we compare GEL-HASH with the state-of-the-art hash table implementation on GPUs (slab hash) using applications that are well supported by the existing implementations. The results show that, for small data scales (numbers of key-value pairs stored in a hash table) GEL-HASH outperforms slab hash by up to 3.05X, while for large data scales GEL-HASH exhibits less than 20% performance slowdown.
AB - Hash table is a fundamental indexing data structure, and extensively used in many applications. Recently, several hash table implementations have been proposed on GPUs. However, they have two pitfalls. Firstly, their synchronization approaches can only ensure the atomicity among individual operations. This makes them can hardly be used in some real-world applications (e.g., online transaction processing, OLTP), which require groups of operations to be executed atomically. Secondly, they use either a GPU thread or a warp of GPU threads to process an operation at a time (named thread-and warp-level execution respectively). The former one incurs more branch divergences, while the latter one requires all threads within a warp to be active. To this end, we (1) propose a readers-writer lock based, bucket-level synchronization to realize the atomicity among both individual and groups of operations, and (2) combine thread-and warp-level executions to optimize system performance. Moreover, we optimize the data structure of linked-list with respect to GPU architectures. Putting them together, we realize a general and efficient linked-list hash table on GPUs, named GEL-HASH. Compared with existing GPU hash table implementations, GEL-HASH can easily support more real-world applications. Moreover, we compare GEL-HASH with the state-of-the-art hash table implementation on GPUs (slab hash) using applications that are well supported by the existing implementations. The results show that, for small data scales (numbers of key-value pairs stored in a hash table) GEL-HASH outperforms slab hash by up to 3.05X, while for large data scales GEL-HASH exhibits less than 20% performance slowdown.
KW - Accelerator architectures
KW - Data structures
KW - Hash tables
KW - Parallel programming
KW - Synchronization
UR - https://www.scopus.com/pages/publications/85073544610
U2 - 10.1109/HPCC/SmartCity/DSS.2019.00201
DO - 10.1109/HPCC/SmartCity/DSS.2019.00201
M3 - 会议稿件
AN - SCOPUS:85073544610
T3 - Proceedings - 21st IEEE International Conference on High Performance Computing and Communications, 17th IEEE International Conference on Smart City and 5th IEEE International Conference on Data Science and Systems, HPCC/SmartCity/DSS 2019
SP - 1452
EP - 1460
BT - Proceedings - 21st IEEE International Conference on High Performance Computing and Communications, 17th IEEE International Conference on Smart City and 5th IEEE International Conference on Data Science and Systems, HPCC/SmartCity/DSS 2019
A2 - Xiao, Zheng
A2 - Yang, Laurence T.
A2 - Balaji, Pavan
A2 - Li, Tao
A2 - Li, Keqin
A2 - Zomaya, Albert
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 21st IEEE International Conference on High Performance Computing and Communications, 17th IEEE International Conference on Smart City and 5th IEEE International Conference on Data Science and Systems, HPCC/SmartCity/DSS 2019
Y2 - 10 August 2019 through 12 August 2019
ER -