TY - JOUR
T1 - A high-bandwidth and low-cost data processing approach with heterogeneous storage architectures
AU - Wei, Bing
AU - Xiao, Limin
AU - Wei, Wei
AU - Song, Yao
AU - Yan, Baicheng
AU - Huo, Zhisheng
N1 - Publisher Copyright:
© 2020, Springer-Verlag London Ltd., part of Springer Nature.
PY - 2023/4
Y1 - 2023/4
N2 - How to efficiently process big data at a low cost is a substantial challenge. Many efficient and economical data processing approaches have been proposed in the fields including business, scientific research, and public administration. Unfortunately, seismic data processing has not achieved the same level of devolvement in the field of oil exploration. While many storage architectures, such as network-attached storage (NAS) and storage area network (SAN), have been widely used to process massive amounts of seismic data, these architectures are expensive in terms of bandwidth and capacity. In this paper, we propose a high-bandwidth and low-cost approach to fill this gap. NASStore is our data store built on NAS for processing seismic data. However, it cannot provide a high bandwidth at a low cost when it comes to data-intensive computing scenarios due to the massive bandwidth requirement and the huge volume of data to be stored. Distributed file systems, such as the Hadoop Distributed File System (HDFS), offer an alternative approach to store data. It delivers high aggregate performance to user applications while running on inexpensive commodity hardware. In order to overcome the shortcomings of NASStore, we first present HDFSStore that is built on HDFS for processing seismic data. We then couple NASStore and HDFSStore to construct a new hybrid data store, called SeisStore, in which efficient parallel write, read, and update mechanisms are employed to improve the system performance. The experiment results show that SeisStore reduces the storage cost than NASStore by up to 23.20% and improves the access bandwidth than NASStore and HDFSStore by up to 478.84% and 16.99%, respectively.
AB - How to efficiently process big data at a low cost is a substantial challenge. Many efficient and economical data processing approaches have been proposed in the fields including business, scientific research, and public administration. Unfortunately, seismic data processing has not achieved the same level of devolvement in the field of oil exploration. While many storage architectures, such as network-attached storage (NAS) and storage area network (SAN), have been widely used to process massive amounts of seismic data, these architectures are expensive in terms of bandwidth and capacity. In this paper, we propose a high-bandwidth and low-cost approach to fill this gap. NASStore is our data store built on NAS for processing seismic data. However, it cannot provide a high bandwidth at a low cost when it comes to data-intensive computing scenarios due to the massive bandwidth requirement and the huge volume of data to be stored. Distributed file systems, such as the Hadoop Distributed File System (HDFS), offer an alternative approach to store data. It delivers high aggregate performance to user applications while running on inexpensive commodity hardware. In order to overcome the shortcomings of NASStore, we first present HDFSStore that is built on HDFS for processing seismic data. We then couple NASStore and HDFSStore to construct a new hybrid data store, called SeisStore, in which efficient parallel write, read, and update mechanisms are employed to improve the system performance. The experiment results show that SeisStore reduces the storage cost than NASStore by up to 23.20% and improves the access bandwidth than NASStore and HDFSStore by up to 478.84% and 16.99%, respectively.
KW - Hadoop Distributed File System
KW - High-bandwidth and low-cost data processing
KW - Hybrid data store
KW - Network-attached storage
KW - Seismic data processing
UR - https://www.scopus.com/pages/publications/85083363237
U2 - 10.1007/s00779-020-01383-6
DO - 10.1007/s00779-020-01383-6
M3 - 文章
AN - SCOPUS:85083363237
SN - 1617-4909
VL - 27
SP - 159
EP - 176
JO - Personal and Ubiquitous Computing
JF - Personal and Ubiquitous Computing
IS - 2
ER -