Identifying Potential Anomalous Operations in Graph Neural Network Training

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Graph Neural Networks (GNNs) have demonstrated transformative potential across domains, driving the development of specialized frameworks like Deep Graph Library (DGL) and PyTorch Geometric (PyG) that employ emerging techniques to overcome computational bottlenecks in large-scale graph learning. However, due to the inherent sparsity of GNN models and the complexity of heterogeneous computing systems, optimizing GNN performance remains a significant challenge. Existing profiling tools, such as Nsight Systems, primarily focus on visualizing resource utilization over time, helping users identify inefficient execution patterns. While this approach provides insights into hardware-level performance, it lacks higher-level, code-centric analysis, making it difficult for developers to pinpoint and resolve performance bottlenecks in GNN training. To address these limitations, we propose GNNProf, an automated performance analysis tool designed to detect and diagnose potential inefficiencies in GNN training. GNNProf collects and restructures CPU function-level performance data into an analyzable format, and applies machine learning and unsupervised learning techniques to identify potential performance anomalies. By automatically recognizing inefficient functions and highlighting performance-critical regions, GNNProf enables developers to gain deeper insights into the execution behavior of GNN training. Additionally, it provides intuitive visualizations that facilitate performance debugging and optimization, ultimately improving training efficiency on heterogeneous systems.

Original languageEnglish
Title of host publicationAdvanced Parallel Processing Technologies - 16th International Symposium, APPT 2025, Proceedings
EditorsChao Li, Xuehai Qian, Dimitris Gizopoulos, Boris Grot
PublisherSpringer Science and Business Media Deutschland GmbH
Pages367-376
Number of pages10
ISBN (Print)9789819510207
DOIs
StatePublished - 2026
Event16th International Symposium on Advanced Parallel Processing Technologies, APPT 2025 - Athens, Greece
Duration: 13 Jul 202516 Jul 2025

Publication series

NameLecture Notes in Computer Science
Volume16062 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference16th International Symposium on Advanced Parallel Processing Technologies, APPT 2025
Country/TerritoryGreece
CityAthens
Period13/07/2516/07/25

Keywords

  • Graph Neural Networks
  • Machine Learning
  • Performance Analysis
  • Profiling

Fingerprint

Dive into the research topics of 'Identifying Potential Anomalous Operations in Graph Neural Network Training'. Together they form a unique fingerprint.

Cite this