WholeGraph: A Fast Graph Neural Network Training Framework with Multi-GPU Distributed Shared Memory Architecture

SC22 Proceedings

Technical Papers Archive

WholeGraph: A Fast Graph Neural Network Training Framework with Multi-GPU Distributed Shared Memory Architecture

Authors: Dongxu Yang, Junhong Liu, Jiaxing Qi, and Junjie Lai (NVIDIA Corporation)

Abstract: Graph neural networks (GNNs) are prevalent to deal with graph-structured datasets, encoding graph data into low dimensional vectors. In this paper, we present a fast training graph neural network framework, i.e., WholeGraph, based on a multi-GPU distributed shared memory architecture. WholeGraph consists of partitioning the graph and corresponding node or edge features to multi-GPUs, eliminating the bottleneck of communication between CPU and GPUs during the training process. And the communication between different GPUs is implemented by GPUDirect Peer-to-Peer (P2P) memory access technology. Furthermore, WholeGraph provides several optimized computing operators. Our evaluations show that on large-scale graphs WholeGraph outperforms state-of-the-art GNN frameworks, such as Deep Graph Library (DGL) and Pytorch Geometric (PyG). The speedups of WholeGraph are up to 57.32x and 242.98x compared with DGL and PyG on a single machine multi-GPU node, respectively. The ratio of GPU utilization can sustain above 95% during GNN training process.

Presentation: file

Back to Technical Papers Archive Listing