Skip to content

2026

BatchGen: An Architecture for Scalable and Efficient Batch Inference

Tairan Xu, Leyang Xue, Zhan Lu, Jinfu Deng, Hongyang Xiao, Yinsicheng Jiang, Congjie He, Matej Sandor, Le Xu, Luo Mai

conference OSDI

ContextPilot: Fast Long-Context Inference via Context Reuse

Yinsicheng Jiang, Yeqi Huang, Liang Cheng, Cheng Deng, Xuan Sun, Luo Mai

conference MLSys

BitDecoding: Unlocking Tensor Cores for Long-Context LLMs with Low-Bit KV Cache

Dayou Du, Shijie Cao, Jianyi Cheng, Luo Mai, Ting Cao, Mao Yang

conference HPCA

2025

MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems

Yinsicheng Jiang, Yao Fu, Yeqi Huang, Ping Nie, Zhan Lu, Leyang Xue, Congjie He, Man-Kit Sit, Jilong Xue, Li Dong, Ziming Miao, Dayou Du, Tairan Xu, Kai Zou, Edoardo Ponti, Luo Mai

conference NeurIPS

WaferLLM: Large Language Model Inference at Wafer Scale

Congjie He, Yeqi Huang, Pei Mu, Ziming Miao, Jilong Xue, Lingxiao Ma, Fan Yang, Luo Mai

conference OSDI

2024

Tenplex: Dynamic Parallelism for Deep Learning using Parallelizable Tensor Collections

Marcel Wagenländer, Guo Li, Bo Zhao, Luo Mai, Peter Pietzuch

conference SOSP

Learning high-frequency functions made easy with sinusoidal positional encoding

Chuanhao Sun, Zhihang Yuan, Kai Xu, Luo Mai, N Siddharth, Shuo Chen, Mahesh K Marina

conference ICML

ServerlessLLM: Low-Latency Serverless Inference for Large Language Models

Yao Fu, Leyang Xue, Yeqi Huang, Andrei-Octavian Brabete, Dmitrii Ustiugov, Yuvraj Patel, Luo Mai

conference OSDI

MoE-Infinity: Activation-Aware Expert Offloading for Efficient MoE Serving

Leyang Xue, Yao Fu, Zhan Lu, Luo Mai, Mahesh Marina

preprint Arxiv

2023

TorchOpt: An Efficient Library for Differentiable Optimization

Jie Ren, Xidong Feng, Bo Liu, Xuehai Pan, Yao Fu, Luo Mai, Yaodong Yang

conference JMLR

GEAR: A GPU-Centric Experience Replay System for Large Reinforcement Learning Models

Hanjing Wang, Man-Kit Sit, Congjie He, Ying Wen, Weinan Zhang, Jun Wang, Yaodong Yang, Luo Mai

conference ICML

2022

Ekko: A Large-Scale Deep Learning Recommender System with Low-Latency Model Update

Chijun Sima, Yao Fu, Man-Kit Sit, Liyi Guo, Xuri Gong, Feng Lin, Junyu Wu, Yongsheng Li, Haidong Rong, Pierre-Louis Aublin, Luo Mai

conference USENIX OSDI

A Theoretical Understanding of Gradient Bias in Meta-Reinforcement Learning

Bo Liu, Xidong Feng, Jie Ren, Luo Mai, Rui Zhu, Haifeng Zhang, Jun Wang, Yaodong Yang

conference NeurIPS

MegBA: A GPU-Based Distributed Library for Large-Scale Bundle Adjustment

Jie Ren, Wenteng Liang, Ran Yan, Luo Mai, Shiwen Liu, Xiao Liu

conference ECCV

2021

Move Fast and Meet Deadlines: Fine-grained Real-time Stream Processing with Cameo

Le Xu, Shivaram Venkataraman, Indranil Gupta, Luo Mai, Rahul Potharaju

conference USENIX NSDI

Fast and Flexible Human Pose Estimation with HyperPose

Yixiao Guo, Jiawei Liu, Guo Li, Luo Mai, Hao Dong

conference ACM Multimedia (Open-source Software Competition)

Efficient Reinforcement Learning Development with RLzoo

Zihan Ding, Tianyang Yu, Yanhua Huang, Hongming Zhang, Guo Li, Quancheng Guo, Luo Mai, Hao Dong

conference ACM Multimedia (Open-source Software Competition)

2020

KungFu: Making Training in Distributed Machine Learning Adaptive

Luo Mai, Guo Li, Marcel Wagenlander, Konstantinos Fertakis, Andrei-Octavian Brabete, Peter Pietzuch

conference USENIX OSDI

Spotnik: Designing Distributed Machine Learning for Transient Cloud Resources

Marcel Wagenlander, Luo Mai, Guo Li, Peter Pietzuch

conference USENIX HotCloud

2019

Taming Hyper-parameters in Deep Learning Systems

Luo Mai, Alexandros Koliousis, Guo Li, Andrei-Octavian Brabete, Peter Pietzuch

journal ACM SIGOPS Operating Systems Review (Invited Paper)

CrossBow: Scaling Deep Learning with Small Batch Sizes on Multi-GPU Servers

Alexandros Koliousis, Pijika Watcharapichat, Matthias Weidlich, Luo Mai, Paolo Costa, Peter Pietzuch

conference VLDB

2018

Chi: A Scalable and Programmable Control Plane for Distributed Stream Processing Systems

Luo Mai, Kai Zeng, Rahul Potharaju, Le Xu, Shivaram Venkataraman, Paolo Costa, Terry Kim, Saravanan Muthukrishnan, Vamsi Kuppa, Sudheer Dhulipalla, Sriram Rao

conference VLDB

2017

Emu: Rapid Prototyping of Networking Services

Nik Sultana, Salvator Galea, David Greaves, Marcin Wojcik, Jonny Shipton, Richard Clegg, Luo Mai, Pietro Bressana, Robert Soule, Richard Mortier, Paolo Costa, Peter Pietzuch, Jon Crowcroft, Andrew W Moore, Noa Zilberman

conference USENIX ATC

TensorLayer: A Versatile Library for Efficient Deep Learning Development

Hao Dong, Akara Supratak, Luo Mai, Fangde Liu, Axel Oehmichen, Simiao Yu, Yike Guo

conference ACM Multimedia (Best Open-source Software Award)

2016

Flick: Developing and Running Application-specific Network Services

Abdul Alim, Richard G. Clegg, Luo Mai, Lukas Rupprecht, Eric Seckler, Paolo Costa, Peter Pietzuch, Alexander L. Wolf, Nik Sultana, Jon Crowcroft, Anil Madhavapeddy, Andrew Moore, Richard Mortier, Luis Oviedo, Masoud Koleni, Derek McAuley, Matteo Migliavacca

conference USENIX ATC

Towards a Network Marketplace in a Cloud

Da Yu, Luo Mai, Somaya Arianfar, Rodrigo Fonseca, Orran Krieger, David Oran

conference USENIX HotCloud

2015

Optimizing Network Performance in Distributed Machine Learning

Luo Mai, Chuntao Hong, Paolo Costa

conference USENIX HotCloud

2014

NetAgg: Using Middleboxes for Application-specific On-path Aggregation in Data Centres

Luo Mai, Lukas Rupprecht, Abdul Alim, Paolo Costa, Matteo Migliavacca, Peter Pietzuch, Alexander L. Wolf

conference ACM CoNEXT (Best Paper Finalist)

2013

Exploiting Time-malleability in Cloud-based Batch Processing Systems

Luo Mai, Evangelia Kalyvianaki, Paolo Costa

conference ACM LADIS

2011

Load Balanced Rendezvous Data Collection in Wireless Sensor Networks

Luo Mai, Longfei Shangguan, Chao Lang, Junzhao Du, Zhenjiang Li, Mo Li

conference IEEE MASS (Best Paper Finalist)