Publications

(2024). Tenplex: Dynamic Parallelism for Deep Learning using Parallelizable Tensor Collections. In SOSP.

PDF

(2024). ServerlessLLM: Locality-Enhanced Serverless Inference for Large Language Models. In OSDI.

PDF Code

(2023). TorchOpt: An Efficient Library for Differentiable Optimization. In JMLR.

PDF Code

(2023). GEAR: A GPU-Centric Experience Replay System for Large Reinforcement Learning Models. In ICML.

PDF Code

(2022). Ekko: A Large-Scale Deep Learning Recommender System with Low-Latency Model Update. In USENIX OSDI.

PDF

(2022). A Theoretical Understanding of Gradient Bias in Meta-Reinforcement Learning. In NeurIPS.

PDF

(2022). MegBA: A GPU-Based Distributed Library for Large-Scale Bundle Adjustment. In ECCV.

PDF Code

(2021). Move Fast and Meet Deadlines: Fine-grained Real-time Stream Processing with Cameo. In USENIX NSDI.

PDF

(2021). Fast and Flexible Human Pose Estimation with HyperPose. In ACM Multimedia (Open-source Software Competition).

PDF Code

(2021). Efficient Reinforcement Learning Development with RLzoo. In ACM Multimedia (Open-source Software Competition).

PDF Code

(2020). KungFu: Making Training in Distributed Machine Learning Adaptive. In USENIX OSDI.

PDF Code

(2020). Spotnik: Designing Distributed Machine Learning for Transient Cloud Resources. In USENIX HotCloud.

PDF

(2019). CrossBow: Scaling Deep Learning with Small Batch Sizes on Multi-GPU Servers. In VLDB.

PDF Code

(2019). Taming Hyper-parameters in Deep Learning Systems. In ACM SIGOPS Operating Systems Review (Invited Paper).

PDF

(2018). Chi: A Scalable and Programmable Control Plane for Distributed Stream Processing Systems. In VLDB.

PDF

(2017). Emu: Rapid Prototyping of Networking Services. In USENIX ATC.

PDF

(2017). TensorLayer: A Versatile Library for Efficient Deep Learning Development. In ACM Multimedia (Best Open-source Software Award).

PDF Code

(2016). Flick: Developing and Running Application-specific Network Services . In USENIX ATC.

PDF

(2016). Towards a Network Marketplace in a Cloud. In USENIX HotCloud.

PDF

(2015). Optimizing Network Performance in Distributed Machine Learning. In USENIX HotCloud.

PDF

(2014). NetAgg: Using Middleboxes for Application-specific On-path Aggregation in Data Centres. In ACM CoNEXT (Best Paper Finalist).

PDF

(2011). Load Balanced Rendezvous Data Collection in Wireless Sensor Networks. In IEEE MASS (Best Paper Finalist).

PDF