TY - JOUR
T1 - HCIndex
T2 - a Hilbert-Curve-based clustering index for efficient multi-dimensional queries for cloud storage systems
AU - Wang, Xinyang
AU - Sun, Yu
AU - Sun, Qiao
AU - Lin, Weiwei
AU - Wang, James Z.
AU - Li, Wei
N1 - Publisher Copyright:
© 2022, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
PY - 2023/6
Y1 - 2023/6
N2 - With the rapid development of the Internet of Things and cloud computing, HBase has become a good choice for massive data storage, and is efficient in reading and writing data. However, HBase is not supportive for multi-dimensional query of non-rowkey data, unconducive to data analysis and processing. To address this issue, we first analyze the constitution principle and deficiency of secondary index and clustering index, and select clustering index as the basis of optimization. Then, we choose the Hilbert curve in the space filling curve as the linearization technology, design the pre-partition algorithm and subspace partition algorithm, and realize the Hilbert-curve-based clustering index (HCIndex) which supports multi-dimensional point query and range query. Finally, the performance of HCIndex is verified by comparison experiments with HBase Scan, HiBase and CCIndex. The experimental results show that the query efficiency of HCIndex has been greatly improved at the expense of very limited storage space, which is necessary for storing index data and only 1.7 times the size of the original data table of HBase. Compared with HBase scan, the query efficiency of HCIndex’s multi-dimensional point query and range query has been increased to more than 4 times and more than 2 times, respectively. Therefore, the proposed HCIndex is well suited for efficient multi-dimensional and complex queries of massive data in cloud storage systems.
AB - With the rapid development of the Internet of Things and cloud computing, HBase has become a good choice for massive data storage, and is efficient in reading and writing data. However, HBase is not supportive for multi-dimensional query of non-rowkey data, unconducive to data analysis and processing. To address this issue, we first analyze the constitution principle and deficiency of secondary index and clustering index, and select clustering index as the basis of optimization. Then, we choose the Hilbert curve in the space filling curve as the linearization technology, design the pre-partition algorithm and subspace partition algorithm, and realize the Hilbert-curve-based clustering index (HCIndex) which supports multi-dimensional point query and range query. Finally, the performance of HCIndex is verified by comparison experiments with HBase Scan, HiBase and CCIndex. The experimental results show that the query efficiency of HCIndex has been greatly improved at the expense of very limited storage space, which is necessary for storing index data and only 1.7 times the size of the original data table of HBase. Compared with HBase scan, the query efficiency of HCIndex’s multi-dimensional point query and range query has been increased to more than 4 times and more than 2 times, respectively. Therefore, the proposed HCIndex is well suited for efficient multi-dimensional and complex queries of massive data in cloud storage systems.
KW - Big data
KW - Clustering index
KW - HBase
KW - Multi-dimensional query
KW - Space filling curve
UR - https://www.scopus.com/pages/publications/85138282715
U2 - 10.1007/s10586-022-03723-y
DO - 10.1007/s10586-022-03723-y
M3 - 文章
AN - SCOPUS:85138282715
SN - 1386-7857
VL - 26
SP - 2011
EP - 2025
JO - Cluster Computing
JF - Cluster Computing
IS - 3
ER -