TY - GEN
T1 - Matrix-Query
T2 - 2013 5th International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, CyberC 2013
AU - Liu, Qiao
AU - Ji, Ping
AU - Zuo, Yuan
PY - 2013
Y1 - 2013
N2 - Along with the development of distributed computation and the rapid growth of data, scientific research increasingly requires the support of high-efficiency relational data processing framework. According to the characteristics of scientific data, for example bulk inserts and unfrequented change, this paper proposes a streaming processing model called Matrix-Query with the matching data storage architecture for relational query. Through transforming the original relational schema to entities and key-value indexing, the data storage solution provides more localization operation and data positioning. Compare to traditional Map-Reduce model, the Matrix-Query isolates the influence between subtasks to ensure execution in a streaming and parallel manner and reduces negative impacts of writing intermediate file. We also optimize the data structure and subtask management to improve the performance of Matrix-Query. The experimental results demonstrate performance advantage of Matrix-query compared to two famous data processing systems, Hive and HadoopDB, which build on the top of Map-Reduce model.
AB - Along with the development of distributed computation and the rapid growth of data, scientific research increasingly requires the support of high-efficiency relational data processing framework. According to the characteristics of scientific data, for example bulk inserts and unfrequented change, this paper proposes a streaming processing model called Matrix-Query with the matching data storage architecture for relational query. Through transforming the original relational schema to entities and key-value indexing, the data storage solution provides more localization operation and data positioning. Compare to traditional Map-Reduce model, the Matrix-Query isolates the influence between subtasks to ensure execution in a streaming and parallel manner and reduces negative impacts of writing intermediate file. We also optimize the data structure and subtask management to improve the performance of Matrix-Query. The experimental results demonstrate performance advantage of Matrix-query compared to two famous data processing systems, Hive and HadoopDB, which build on the top of Map-Reduce model.
KW - Distributed computation
KW - Relational query processing model
KW - SQL
UR - https://www.scopus.com/pages/publications/84893209616
U2 - 10.1109/CyberC.2013.36
DO - 10.1109/CyberC.2013.36
M3 - 会议稿件
AN - SCOPUS:84893209616
SN - 9780768551067
T3 - Proceedings - 2013 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, CyberC 2013
SP - 179
EP - 185
BT - Proceedings - 2013 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, CyberC 2013
PB - IEEE Computer Society
Y2 - 10 October 2013 through 12 October 2013
ER -