Skip to main navigation Skip to search Skip to main content

GVARP: Detecting Performance Variance on Large-Scale Heterogeneous Systems

  • Beihang University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Performance variance is one of the nasty pitfalls of large-scale heterogeneous systems, which can lead to unexpected and unpredictable performance degradation for parallel programs. Such performance issues typically arise from various random hardware and software faults, making it exceedingly difficult to pinpoint the exact causes of performance variance in specific instances. In this paper, we propose GVARP, a performance variance detection tool for large-scale heterogeneous systems. GVARP employs static analysis to identify the performancecritical parameters of kernel functions. Additionally, GVARP segments the program execution with external library calls and asynchronous kernel operations. Then GVARP constructs a state transfer graph and estimates the workload of each program segment to identify and cluster instances of similar workloads, facilitating the detection of performance variance. Our evaluation results demonstrate that GVARP effectively detects performance variance at a large scale with acceptable overhead and provides intuitive insights to locate the sources of performance variance.

Original languageEnglish
Title of host publicationProceedings of SC 2024
Subtitle of host publicationInternational Conference for High Performance Computing, Networking, Storage and Analysis
PublisherIEEE Computer Society
ISBN (Electronic)9798350352917
DOIs
StatePublished - 17 Nov 2024
Event2024 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2024 - Atlanta, United States
Duration: 17 Nov 202422 Nov 2024

Publication series

NameInternational Conference for High Performance Computing, Networking, Storage and Analysis, SC
ISSN (Print)2167-4329
ISSN (Electronic)2167-4337

Conference

Conference2024 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2024
Country/TerritoryUnited States
CityAtlanta
Period17/11/2422/11/24

Keywords

  • Large-Scale Heterogeneous System
  • Performance Analysis
  • Performance Variance

Fingerprint

Dive into the research topics of 'GVARP: Detecting Performance Variance on Large-Scale Heterogeneous Systems'. Together they form a unique fingerprint.

Cite this