TY - GEN
T1 - GVARP
T2 - 2024 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2024
AU - You, Xin
AU - Xuan, Zhibo
AU - Yang, Hailong
AU - Luan, Zhongzhi
AU - Liu, Yi
AU - Qian, Depei
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024/11/17
Y1 - 2024/11/17
N2 - Performance variance is one of the nasty pitfalls of large-scale heterogeneous systems, which can lead to unexpected and unpredictable performance degradation for parallel programs. Such performance issues typically arise from various random hardware and software faults, making it exceedingly difficult to pinpoint the exact causes of performance variance in specific instances. In this paper, we propose GVARP, a performance variance detection tool for large-scale heterogeneous systems. GVARP employs static analysis to identify the performancecritical parameters of kernel functions. Additionally, GVARP segments the program execution with external library calls and asynchronous kernel operations. Then GVARP constructs a state transfer graph and estimates the workload of each program segment to identify and cluster instances of similar workloads, facilitating the detection of performance variance. Our evaluation results demonstrate that GVARP effectively detects performance variance at a large scale with acceptable overhead and provides intuitive insights to locate the sources of performance variance.
AB - Performance variance is one of the nasty pitfalls of large-scale heterogeneous systems, which can lead to unexpected and unpredictable performance degradation for parallel programs. Such performance issues typically arise from various random hardware and software faults, making it exceedingly difficult to pinpoint the exact causes of performance variance in specific instances. In this paper, we propose GVARP, a performance variance detection tool for large-scale heterogeneous systems. GVARP employs static analysis to identify the performancecritical parameters of kernel functions. Additionally, GVARP segments the program execution with external library calls and asynchronous kernel operations. Then GVARP constructs a state transfer graph and estimates the workload of each program segment to identify and cluster instances of similar workloads, facilitating the detection of performance variance. Our evaluation results demonstrate that GVARP effectively detects performance variance at a large scale with acceptable overhead and provides intuitive insights to locate the sources of performance variance.
KW - Large-Scale Heterogeneous System
KW - Performance Analysis
KW - Performance Variance
UR - https://www.scopus.com/pages/publications/85214973375
U2 - 10.1109/SC41406.2024.00063
DO - 10.1109/SC41406.2024.00063
M3 - 会议稿件
AN - SCOPUS:85214973375
T3 - International Conference for High Performance Computing, Networking, Storage and Analysis, SC
BT - Proceedings of SC 2024
PB - IEEE Computer Society
Y2 - 17 November 2024 through 22 November 2024
ER -