GEAR: A GPU-Centric Experience Replay System for Large Reinforcement Learning Models

Abstract

Large Reinforcement Learning (RL) models require experience replay systems that can efficiently store and select from massive amounts of trajectories. This requirement, however, cannot be met by existing experience replay systems: they incur memory, computation and communication bottlenecks when being used with training large RL models. To address this, this paper presents GEAR, a novel distributed GPU-centric experience replay system. GEAR can store trajectory shards on GPU servers which train large RL models, increasing the memory efficiency of storing trajectories. GEAR further enables distributed GPU servers to accelerate various trajectory selection strategies, avoiding computation bottlenecks. GEAR also has GPU kernels that can collect trajectories using zero-copy access to host memory and remote-directed-memory access over InfiniBand, optimising the communication efficiency in collecting trajectories. Cluster experiments show that GEAR achieves up to 6 times performance over the state-of-the-art system: Reverb when training large RL models.

Publication
In International Conference on Machine Learning
Man-Kit Sit
Man-Kit Sit
PhD Student
Congjie He
Congjie He
PhD Student
Luo Mai
Luo Mai
Assistant Professor

My research interests include computer systems, machine learning and data management.

Related