TY - GEN
T1 - A practical performance model for hadoop mapreduce
AU - Lin, Xuelian
AU - Meng, Zide
AU - Xu, Chuan
AU - Wang, Meng
PY - 2012
Y1 - 2012
N2 - An accurate performance model for MapReduce is increasingly important for analyzing and optimizing MapReduce jobs. It is also a precondition to implement cost-based scheduling strategies or to translate Hive like query jobs into sets of low cost MapReduce jobs. However, the multiple processing steps in MapReduce task, as well as the complexity of relationships among these steps and the difficulty to measure the computational complexity of MapReduce task, greatly challenges the development and application of a precise performance model. In this paper, we define the concept of relative computational complexity of MapReduce task to estimate the complexity of task, and illustrate the way to measure it. Then, we analyze the detail composition of MapReduce tasks and relationships among them, decompose the major cost items, and present a vector style cost model with equation to calculate each cost items. Moreover, we provide equations to estimate the task execution time based on cost vectors. The experiment on several Hadoop clusters confirms the effectiveness of our proposed performance model.
AB - An accurate performance model for MapReduce is increasingly important for analyzing and optimizing MapReduce jobs. It is also a precondition to implement cost-based scheduling strategies or to translate Hive like query jobs into sets of low cost MapReduce jobs. However, the multiple processing steps in MapReduce task, as well as the complexity of relationships among these steps and the difficulty to measure the computational complexity of MapReduce task, greatly challenges the development and application of a precise performance model. In this paper, we define the concept of relative computational complexity of MapReduce task to estimate the complexity of task, and illustrate the way to measure it. Then, we analyze the detail composition of MapReduce tasks and relationships among them, decompose the major cost items, and present a vector style cost model with equation to calculate each cost items. Moreover, we provide equations to estimate the task execution time based on cost vectors. The experiment on several Hadoop clusters confirms the effectiveness of our proposed performance model.
KW - Hadoop
KW - MapReduce
KW - Performance model
UR - https://www.scopus.com/pages/publications/84872518138
U2 - 10.1109/ClusterW.2012.24
DO - 10.1109/ClusterW.2012.24
M3 - 会议稿件
AN - SCOPUS:84872518138
SN - 9780768548449
T3 - Proceedings - 2012 IEEE International Conference on Cluster Computing Workshops, Cluster Workshops 2012
SP - 231
EP - 239
BT - Proceedings - 2012 IEEE International Conference on Cluster Computing Workshops, Cluster Workshops 2012
PB - IEEE Computer Society
T2 - 2012 IEEE International Conference on Cluster Computing Workshops, Cluster Workshops 2012
Y2 - 24 September 2012 through 28 September 2012
ER -