Ekko: A Large-Scale Deep Learning Recommender System with Low-Latency Model Update

Abstract

Deep Learning Recommender Systems (DLRSs) need to up- date models at low-latency so that new users and content can be served in a timely manner. Existing DLRSs, however, fail to do so. They train and validate models offline and broadcast models to inference clusters from a single source. This results in high model update latency (e.g., dozens of minutes), which often adversely affects Service-Level-Objectives (SLOs). We describe Ekko, a large-scale DLRS with low-latency model updates. Our key idea is to allow model updates to be immediately disseminated from the training cluster to all inference clusters, thus minimising model update latency. To achieve this, Ekko realises efficient peer-to-peer model update dissemination which exploits the sparsity and temporal local- ity in updating DLRS models, which improves the throughput and latency of model updates. Ekko further provides SLO protection mechanisms, including a model update scheduler that can prioritise, over busy networks, the sending of model updates which can affect SLOs, and an inference model state manager which monitors the SLOs of inference models, and rollback the models if SLO-detrimental biased updates have been detected. Evaluation results show that Ekko is orders of magnitude faster than state-of-the-art DLRS systems. Ekko has been deployed in our production environment for more than one year, serves over a billion users daily, and reduces the model update latency compared to state-of-the-art systems from dozens of minutes to 2.4 seconds.

Publication
In USENIX Symposium on Operating Systems Design and Implementation (OSDI)
Yao Fu
Yao Fu
PhD Student
Man-Kit Sit
Man-Kit Sit
PhD Student
Luo Mai
Luo Mai
Assistant Professor

My research interests include computer systems, machine learning and data management.

Related