Skip to main navigation Skip to search Skip to main content

Data Mining Based Root-Cause Analysis of Performance Bottleneck for Big Data Workload

  • Beihang University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Straggler task is commonly considered as the major bottleneck in parallel data processing. Previous work mainly focuses on the coarse-grained straggler detection and optimization such as speculative scheduling. However, fine-grained root-cause analysis of straggler tasks is rarely considered. In addition, existing work simply depends on empirical analysis, which lacks of useful guidance to performance optimization. In this paper, we propose a new methodology of fine-grained straggler root-cause analysis using machine learning. We collect raw metrics from Spark event log and hardware sampling tool, and refine them into high-level metrics for model learning. Then we present the root-cause analysis of stragglers through CART tree. A customized prune method is also applied to improve analysis accuracy. From the analysis, we derive several new findings beyond the well known causes of stragglers. Our work provides a new perspective on identifying and understanding the inefficiency in parallel data processing programs by applying machine learning techniques to fine-grained root-cause analysis of straggler tasks.

Original languageEnglish
Title of host publicationProceedings - 2017 IEEE 19th Intl Conference on High Performance Computing and Communications, HPCC 2017, 2017 IEEE 15th Intl Conference on Smart City, SmartCity 2017 and 2017 IEEE 3rd Intl Conference on Data Science and Systems, DSS 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages254-261
Number of pages8
ISBN (Electronic)9781538625880
DOIs
StatePublished - 2 Jul 2017
Event19th IEEE Intl Conference on High Performance Computing and Communications, 15th IEEE Intl Conference on Smart City, and 3rd IEEE Intl Conference on Data Science and Systems, HPCC/SmartCity/DSS 2017 - Bangkok, Thailand
Duration: 18 Dec 201720 Dec 2017

Publication series

NameProceedings - 2017 IEEE 19th Intl Conference on High Performance Computing and Communications, HPCC 2017, 2017 IEEE 15th Intl Conference on Smart City, SmartCity 2017 and 2017 IEEE 3rd Intl Conference on Data Science and Systems, DSS 2017
Volume2018-January

Conference

Conference19th IEEE Intl Conference on High Performance Computing and Communications, 15th IEEE Intl Conference on Smart City, and 3rd IEEE Intl Conference on Data Science and Systems, HPCC/SmartCity/DSS 2017
Country/TerritoryThailand
CityBangkok
Period18/12/1720/12/17

Fingerprint

Dive into the research topics of 'Data Mining Based Root-Cause Analysis of Performance Bottleneck for Big Data Workload'. Together they form a unique fingerprint.

Cite this