Scalable Irregular Parallelism with GPUs: Getting CPUs Out of the Way

SC22 Proceedings

Technical Papers Archive

Scalable Irregular Parallelism with GPUs: Getting CPUs Out of the Way

Authors: Yuxin Chen (University of California, Davis); Benjamin Brock (University of California, Berkeley); Serban Porumbescu (University of California, Davis); Aydin Buluc (Lawrence Berkeley National Laboratory (LBNL)); Katherine Yelick (University of California, Berkeley); and John Owens (University of California, Davis)

Abstract: We present Atos, a dynamic scheduling framework for multi-node-GPU systems that supports PGAS-style lightweight one-sided memory operations within and between nodes.

Atos's lightweight GPU-to-GPU communication enables latency hiding and can smooth the interconnection usage for bisection-limited problems. These benefits are significant for dynamic, irregular applications that often involve fine-grained communication at unpredictable times and without predetermined patterns. Some principles for high performance: (1) do not involve the CPU in the communication control path; (2) allow GPU communication within kernels, addressing memory consistency directly rather than relying on synchronization with the CPU; (3) perform dynamic communication aggregation when interconnections have limited bandwidth. By lowering the overhead of communication and allowing it within GPU kernels, we support large, high-utilization GPU kernels but with more frequent communication. We evaluate Atos on two irregular problems: Breadth-First-Search and PageRank. Atos outperforms the state-of-the-art graph libraries Gunrock, Grout and Galois on both single-node-multi-GPU and multi-node-GPU settings.

Back to Technical Papers Archive Listing