TY - GEN
T1 - Towards Affordable, Adaptive and Automatic GNN Training on CPU-GPU Heterogeneous Platforms
AU - Qiao, Tong
AU - Zhou, Ao
AU - Qi, Yingjie
AU - Wang, Yiou
AU - Wan, Han
AU - Yang, Jianlei
AU - Hu, Chunming
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Graph Neural Networks (GNNs) have been widely adopted due to their strong performance. However, GNN training often relies on expensive, high-performance computing platforms, limiting accessibility for many tasks. Profiling of representative GNN workloads indicates that substantial efficiency gains are possible on resource-constrained devices by fully exploiting available resources. This paper introduces A3GNN, a framework for Affordable, Adaptive, and Automatic GNN training on heterogeneous CPU-GPU platforms. It improves resource usage through locality-aware sampling and fine-grained parallelism scheduling. Moreover, it leverages reinforcement learning to explore the design space and achieve pareto-optimal trade-offs among throughput, memory footprint, and accuracy. Experiments show that A3GNN can bridge the performance gap, allowing seven Nvidia 2080Ti GPUs to outperform two A100 GPUs by up to 1.8 in throughput with minimal accuracy loss.
AB - Graph Neural Networks (GNNs) have been widely adopted due to their strong performance. However, GNN training often relies on expensive, high-performance computing platforms, limiting accessibility for many tasks. Profiling of representative GNN workloads indicates that substantial efficiency gains are possible on resource-constrained devices by fully exploiting available resources. This paper introduces A3GNN, a framework for Affordable, Adaptive, and Automatic GNN training on heterogeneous CPU-GPU platforms. It improves resource usage through locality-aware sampling and fine-grained parallelism scheduling. Moreover, it leverages reinforcement learning to explore the design space and achieve pareto-optimal trade-offs among throughput, memory footprint, and accuracy. Experiments show that A3GNN can bridge the performance gap, allowing seven Nvidia 2080Ti GPUs to outperform two A100 GPUs by up to 1.8 in throughput with minimal accuracy loss.
KW - graph neural networks
KW - Multi-GPUs
KW - parallelism optimization
KW - training optimization
UR - https://www.scopus.com/pages/publications/105032531937
U2 - 10.1109/ICCD65941.2025.00019
DO - 10.1109/ICCD65941.2025.00019
M3 - 会议稿件
AN - SCOPUS:105032531937
T3 - Proceedings - IEEE International Conference on Computer Design: VLSI in Computers and Processors
SP - 87
EP - 94
BT - Proceedings - 2025 IEEE 43rd International Conference on Computer Design, ICCD 2025
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 43rd International Conference on Computer Design, ICCD 2025
Y2 - 10 November 2025 through 12 November 2025
ER -