跳到主要导航 跳到搜索 跳到主要内容

A practical performance model for hadoop mapreduce

  • Xuelian Lin*
  • , Zide Meng
  • , Chuan Xu
  • , Meng Wang
  • *此作品的通讯作者
  • Beihang University
  • Baidu Inc

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

An accurate performance model for MapReduce is increasingly important for analyzing and optimizing MapReduce jobs. It is also a precondition to implement cost-based scheduling strategies or to translate Hive like query jobs into sets of low cost MapReduce jobs. However, the multiple processing steps in MapReduce task, as well as the complexity of relationships among these steps and the difficulty to measure the computational complexity of MapReduce task, greatly challenges the development and application of a precise performance model. In this paper, we define the concept of relative computational complexity of MapReduce task to estimate the complexity of task, and illustrate the way to measure it. Then, we analyze the detail composition of MapReduce tasks and relationships among them, decompose the major cost items, and present a vector style cost model with equation to calculate each cost items. Moreover, we provide equations to estimate the task execution time based on cost vectors. The experiment on several Hadoop clusters confirms the effectiveness of our proposed performance model.

源语言英语
主期刊名Proceedings - 2012 IEEE International Conference on Cluster Computing Workshops, Cluster Workshops 2012
出版商IEEE Computer Society
231-239
页数9
ISBN(印刷版)9780768548449
DOI
出版状态已出版 - 2012
活动2012 IEEE International Conference on Cluster Computing Workshops, Cluster Workshops 2012 - Beijing, 中国
期限: 24 9月 201228 9月 2012

出版系列

姓名Proceedings - 2012 IEEE International Conference on Cluster Computing Workshops, Cluster Workshops 2012

会议

会议2012 IEEE International Conference on Cluster Computing Workshops, Cluster Workshops 2012
国家/地区中国
Beijing
时期24/09/1228/09/12

指纹

探究 'A practical performance model for hadoop mapreduce' 的科研主题。它们共同构成独一无二的指纹。

引用此