GEAR: A GPU-Centric Experience Replay System for Large Reinforcement Learning Models

Hanjing Wang*, Man-Kit Sit*, Congjie He, Ying Wen, Weinan Zhang, Jun Wang, Yaodong Yang, Luo Mai

May 2023

PDF Code

Abstract

This paper introduces a distributed, GPU-centric experience replay system, GEAR, designed to perform scalable reinforcement learning (RL) with large sequence models (such as transformers). With such models, existing systems such as Reverb face considerable bottlenecks in mem- ory, computation, and communication. GEAR, however, optimizes memory efficiency by enabling the memory resources on GPU servers (including host memory and device memory) to manage trajectory data. Furthermore, it facilitates decentralized GPU devices to expedite vari- ous trajectory selection strategies, circumventing computational bottlenecks. GEAR is equipped with GPU kernels capable of collecting trajec- tories using zero-copy access to host memory, along with remote-directed-memory access over InfiniBand, improving communication efficiency. Cluster experiments have shown that GEAR can achieve performance levels up to 6× greater than Reverb when training state-of-the-art large RL models. GEAR is open-sourced at https:// github.com/bigrl-team/gear.

Type

Conference paper

Publication

In International Conference on Machine Learning

Machine Learning Systems

Luo Mai

Assistant Professor

My research interests include computer systems, machine learning systems and data management.

GEAR: A GPU-Centric Experience Replay System for Large Reinforcement Learning Models

Abstract

Man-Kit Sit

PhD Student

Congjie He

PhD Student

Luo Mai

Assistant Professor

Related