An XML data placement strategy for distributed XML storage and parallel query

  • Jing Zhang*
  • , Bo Lang
  • , Yawei Duan
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Since there has been significant amount of XML documents generated in various application domains, efficient XML management has become an important problem. Distributed XML storage and parallel query based on MapReduce can be an effective solution to this problem. As XML data placement strategy is a key factor of parallel system performance, in this paper we present an XML placement strategy, which is Query Workload Estimation based XML Placement strategy (QWEXP) for efficient distributed XML storage and parallel query. To achieve query workload balance, it partitions XML based on query workload estimation which is calculated by XML structure without knowing of user queries, considering that in common application scenarios user queries are unknown in advance. The partitioned XML segments are around an XML storage unit W0, to support scalability of parallel XML database. Finally segments are distributed to each processing node evenly to ensure workload balance on parallel query execution. Experimental results have shown that QWEXP promotes the speedup and scaleup properties of parallel XML system greatly.

Original languageEnglish
Title of host publicationProceedings - 2011 12th International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2011
Pages433-439
Number of pages7
DOIs
StatePublished - 2011
Event2011 12th International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2011 - Gwangju, Korea, Republic of
Duration: 20 Oct 201122 Oct 2011

Publication series

NameParallel and Distributed Computing, Applications and Technologies, PDCAT Proceedings

Conference

Conference2011 12th International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2011
Country/TerritoryKorea, Republic of
CityGwangju
Period20/10/1122/10/11

Keywords

  • Distributed XML storage
  • Parallel XML query
  • XML data placement

Fingerprint

Dive into the research topics of 'An XML data placement strategy for distributed XML storage and parallel query'. Together they form a unique fingerprint.

Cite this