Quiver: Supporting GPUs for Low-Latency, High-Throughput GNN Serving with Workload Awareness

Zeyuan Tan*, Xiulong Yuan*, Congjie He*, Man-Kit Sit, Guo Li, Xiaoze Liu, Baole Ai, Kai Zeng, Peter Pietzuch, Luo Mai

April 2023

PDF Code

Abstract

Systems for serving inference requests on graph neural networks (GNN) must combine low latency with high throughout, but they face irregular computation due to skew in the number of sampled graph nodes and aggregated GNN features. This makes it challenging to exploit GPUs effectively: using GPUs to sample only a few graph nodes yields lower performance than CPU-based sampling; and aggregating many features exhibits high data movement costs between GPUs and CPUs. Therefore, current GNN serving systems use CPUs for graph sampling and feature aggregation, limiting throughput. We describe Quiver, a distributed GPU-based GNN serving system with low-latency and high-throughput. Quiver’s key idea is to exploit workload metrics for predicting the irregular computation of GNN requests, and governing the use of GPUs for graph sampling and feature aggregation: (1) for graph sampling, Quiver calculates the probabilistic sampled graph size, a metric that predicts the degree of parallelism in graph sampling. Quiver uses this metric to assign sampling tasks to GPUs only when the performance gains surpass CPU-based sampling; and (2) for feature aggregation, Quiver relies on the feature access probability to decide which features to partition and replicate across a distributed GPU NUMA topology. We show that Quiver achieves up to 35× lower latency with an 8× higher throughput compared to state-of-the-art GNN approaches (DGL and PyG)

Type

Conference paper

Publication

In Arxiv

Machine Learning Systems

Luo Mai

Assistant Professor

My research interests include operating systems, distributed systems, machine learning systems and data management.

Quiver: Supporting GPUs for Low-Latency, High-Throughput GNN Serving with Workload Awareness

Abstract

Zeyuan Tan

MScR Student

Xiulong Yuan

Collaborator

Congjie He

PhD Student

Man-Kit Sit

PhD Student

Luo Mai

Assistant Professor

Related