TY - GEN
T1 - Identifying Potential Anomalous Operations in Graph Neural Network Training
AU - Xuan, Zhibo
AU - Yang, Hailong
AU - You, Xin
AU - Luan, Zhongzhi
AU - Liu, Yi
AU - Qian, Depei
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2026.
PY - 2026
Y1 - 2026
N2 - Graph Neural Networks (GNNs) have demonstrated transformative potential across domains, driving the development of specialized frameworks like Deep Graph Library (DGL) and PyTorch Geometric (PyG) that employ emerging techniques to overcome computational bottlenecks in large-scale graph learning. However, due to the inherent sparsity of GNN models and the complexity of heterogeneous computing systems, optimizing GNN performance remains a significant challenge. Existing profiling tools, such as Nsight Systems, primarily focus on visualizing resource utilization over time, helping users identify inefficient execution patterns. While this approach provides insights into hardware-level performance, it lacks higher-level, code-centric analysis, making it difficult for developers to pinpoint and resolve performance bottlenecks in GNN training. To address these limitations, we propose GNNProf, an automated performance analysis tool designed to detect and diagnose potential inefficiencies in GNN training. GNNProf collects and restructures CPU function-level performance data into an analyzable format, and applies machine learning and unsupervised learning techniques to identify potential performance anomalies. By automatically recognizing inefficient functions and highlighting performance-critical regions, GNNProf enables developers to gain deeper insights into the execution behavior of GNN training. Additionally, it provides intuitive visualizations that facilitate performance debugging and optimization, ultimately improving training efficiency on heterogeneous systems.
AB - Graph Neural Networks (GNNs) have demonstrated transformative potential across domains, driving the development of specialized frameworks like Deep Graph Library (DGL) and PyTorch Geometric (PyG) that employ emerging techniques to overcome computational bottlenecks in large-scale graph learning. However, due to the inherent sparsity of GNN models and the complexity of heterogeneous computing systems, optimizing GNN performance remains a significant challenge. Existing profiling tools, such as Nsight Systems, primarily focus on visualizing resource utilization over time, helping users identify inefficient execution patterns. While this approach provides insights into hardware-level performance, it lacks higher-level, code-centric analysis, making it difficult for developers to pinpoint and resolve performance bottlenecks in GNN training. To address these limitations, we propose GNNProf, an automated performance analysis tool designed to detect and diagnose potential inefficiencies in GNN training. GNNProf collects and restructures CPU function-level performance data into an analyzable format, and applies machine learning and unsupervised learning techniques to identify potential performance anomalies. By automatically recognizing inefficient functions and highlighting performance-critical regions, GNNProf enables developers to gain deeper insights into the execution behavior of GNN training. Additionally, it provides intuitive visualizations that facilitate performance debugging and optimization, ultimately improving training efficiency on heterogeneous systems.
KW - Graph Neural Networks
KW - Machine Learning
KW - Performance Analysis
KW - Profiling
UR - https://www.scopus.com/pages/publications/105022216370
U2 - 10.1007/978-981-95-1021-4_27
DO - 10.1007/978-981-95-1021-4_27
M3 - 会议稿件
AN - SCOPUS:105022216370
SN - 9789819510207
T3 - Lecture Notes in Computer Science
SP - 367
EP - 376
BT - Advanced Parallel Processing Technologies - 16th International Symposium, APPT 2025, Proceedings
A2 - Li, Chao
A2 - Qian, Xuehai
A2 - Gizopoulos, Dimitris
A2 - Grot, Boris
PB - Springer Science and Business Media Deutschland GmbH
T2 - 16th International Symposium on Advanced Parallel Processing Technologies, APPT 2025
Y2 - 13 July 2025 through 16 July 2025
ER -