A practical performance model for hadoop mapreduce

  • Xuelian Lin*
  • , Zide Meng
  • , Chuan Xu
  • , Meng Wang
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

An accurate performance model for MapReduce is increasingly important for analyzing and optimizing MapReduce jobs. It is also a precondition to implement cost-based scheduling strategies or to translate Hive like query jobs into sets of low cost MapReduce jobs. However, the multiple processing steps in MapReduce task, as well as the complexity of relationships among these steps and the difficulty to measure the computational complexity of MapReduce task, greatly challenges the development and application of a precise performance model. In this paper, we define the concept of relative computational complexity of MapReduce task to estimate the complexity of task, and illustrate the way to measure it. Then, we analyze the detail composition of MapReduce tasks and relationships among them, decompose the major cost items, and present a vector style cost model with equation to calculate each cost items. Moreover, we provide equations to estimate the task execution time based on cost vectors. The experiment on several Hadoop clusters confirms the effectiveness of our proposed performance model.

Original languageEnglish
Title of host publicationProceedings - 2012 IEEE International Conference on Cluster Computing Workshops, Cluster Workshops 2012
PublisherIEEE Computer Society
Pages231-239
Number of pages9
ISBN (Print)9780768548449
DOIs
StatePublished - 2012
Event2012 IEEE International Conference on Cluster Computing Workshops, Cluster Workshops 2012 - Beijing, China
Duration: 24 Sep 201228 Sep 2012

Publication series

NameProceedings - 2012 IEEE International Conference on Cluster Computing Workshops, Cluster Workshops 2012

Conference

Conference2012 IEEE International Conference on Cluster Computing Workshops, Cluster Workshops 2012
Country/TerritoryChina
CityBeijing
Period24/09/1228/09/12

Keywords

  • Hadoop
  • MapReduce
  • Performance model

Fingerprint

Dive into the research topics of 'A practical performance model for hadoop mapreduce'. Together they form a unique fingerprint.

Cite this