HCIndex: a Hilbert-Curve-based clustering index for efficient multi-dimensional queries for cloud storage systems

  • Xinyang Wang
  • , Yu Sun
  • , Qiao Sun*
  • , Weiwei Lin*
  • , James Z. Wang
  • , Wei Li
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

With the rapid development of the Internet of Things and cloud computing, HBase has become a good choice for massive data storage, and is efficient in reading and writing data. However, HBase is not supportive for multi-dimensional query of non-rowkey data, unconducive to data analysis and processing. To address this issue, we first analyze the constitution principle and deficiency of secondary index and clustering index, and select clustering index as the basis of optimization. Then, we choose the Hilbert curve in the space filling curve as the linearization technology, design the pre-partition algorithm and subspace partition algorithm, and realize the Hilbert-curve-based clustering index (HCIndex) which supports multi-dimensional point query and range query. Finally, the performance of HCIndex is verified by comparison experiments with HBase Scan, HiBase and CCIndex. The experimental results show that the query efficiency of HCIndex has been greatly improved at the expense of very limited storage space, which is necessary for storing index data and only 1.7 times the size of the original data table of HBase. Compared with HBase scan, the query efficiency of HCIndex’s multi-dimensional point query and range query has been increased to more than 4 times and more than 2 times, respectively. Therefore, the proposed HCIndex is well suited for efficient multi-dimensional and complex queries of massive data in cloud storage systems.

Original languageEnglish
Pages (from-to)2011-2025
Number of pages15
JournalCluster Computing
Volume26
Issue number3
DOIs
StatePublished - Jun 2023
Externally publishedYes

Keywords

  • Big data
  • Clustering index
  • HBase
  • Multi-dimensional query
  • Space filling curve

Fingerprint

Dive into the research topics of 'HCIndex: a Hilbert-Curve-based clustering index for efficient multi-dimensional queries for cloud storage systems'. Together they form a unique fingerprint.

Cite this