TY - GEN
T1 - EAGRE
T2 - 29th International Conference on Data Engineering, ICDE 2013
AU - Zhang, Xiaofei
AU - Chen, Lei
AU - Tong, Yongxin
AU - Wang, Min
PY - 2013
Y1 - 2013
N2 - To benefit from the Cloud platform's unlimited resources, managing and evaluating huge volume of RDF data in a scalable manner has attracted intensive research efforts recently. Progresses have been made on evaluating SPARQL queries with either high-level declarative programming languages, like Pig [1], or a sequence of sophisticated designed MapReduce jobs, both of which tend to answer the query with multiple join operations. However, due to the simplicity of Cloud storage and the coarse organization of RDF data in existing solutions, multiple join operations easily bring significant I/O and network traffic which can severely degrade the system performance. In this work, we first propose EAGRE, an Entity-Aware Graph compREssion technique to form a new representation of RDF data on Cloud platforms, based on which we propose an I/O efficient strategy to evaluate SPARQL queries as quickly as possible, especially queries with specified solution sequence modifiers, e.g., PROJECTION, ORDER BY, etc. We implement a prototype system and conduct extensive experiments over both real and synthetic datasets on an in-house cluster. The experimental results show that our solution can achieve over an order of magnitude of time saving for the SPARQL query evaluation compared to the state-of-art MapReduce-based solutions.
AB - To benefit from the Cloud platform's unlimited resources, managing and evaluating huge volume of RDF data in a scalable manner has attracted intensive research efforts recently. Progresses have been made on evaluating SPARQL queries with either high-level declarative programming languages, like Pig [1], or a sequence of sophisticated designed MapReduce jobs, both of which tend to answer the query with multiple join operations. However, due to the simplicity of Cloud storage and the coarse organization of RDF data in existing solutions, multiple join operations easily bring significant I/O and network traffic which can severely degrade the system performance. In this work, we first propose EAGRE, an Entity-Aware Graph compREssion technique to form a new representation of RDF data on Cloud platforms, based on which we propose an I/O efficient strategy to evaluate SPARQL queries as quickly as possible, especially queries with specified solution sequence modifiers, e.g., PROJECTION, ORDER BY, etc. We implement a prototype system and conduct extensive experiments over both real and synthetic datasets on an in-house cluster. The experimental results show that our solution can achieve over an order of magnitude of time saving for the SPARQL query evaluation compared to the state-of-art MapReduce-based solutions.
UR - https://www.scopus.com/pages/publications/84881346785
U2 - 10.1109/ICDE.2013.6544856
DO - 10.1109/ICDE.2013.6544856
M3 - 会议稿件
AN - SCOPUS:84881346785
SN - 9781467349086
T3 - Proceedings - International Conference on Data Engineering
SP - 565
EP - 576
BT - ICDE 2013 - 29th International Conference on Data Engineering
Y2 - 8 April 2013 through 11 April 2013
ER -