Publications
30 publications across machine learning systems, AI compute, efficient LLM inference/training, systems software, and hardware-aware co-design.
2026
BatchGen: An Architecture for Scalable and Efficient Batch Inference
Tairan Xu, Leyang Xue, Zhan Lu, Jinfu Deng, Hongyang Xiao, Yinsicheng Jiang, Congjie He, Matej Sandor, Le Xu, Luo Mai
ContextPilot: Fast Long-Context Inference via Context Reuse
Yinsicheng Jiang, Yeqi Huang, Liang Cheng, Cheng Deng, Xuan Sun, Luo Mai
BitDecoding: Unlocking Tensor Cores for Long-Context LLMs with Low-Bit KV Cache
Dayou Du, Shijie Cao, Jianyi Cheng, Luo Mai, Ting Cao, Mao Yang
2025
MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems
Yinsicheng Jiang, Yao Fu, Yeqi Huang, Ping Nie, Zhan Lu, Leyang Xue, Congjie He, Man-Kit Sit, Jilong Xue, Li Dong, Ziming Miao, Dayou Du, Tairan Xu, Kai Zou, Edoardo Ponti, Luo Mai
WaferLLM: Large Language Model Inference at Wafer Scale
Congjie He, Yeqi Huang, Pei Mu, Ziming Miao, Jilong Xue, Lingxiao Ma, Fan Yang, Luo Mai
2024
Tenplex: Dynamic Parallelism for Deep Learning using Parallelizable Tensor Collections
Marcel Wagenländer, Guo Li, Bo Zhao, Luo Mai, Peter Pietzuch
Learning high-frequency functions made easy with sinusoidal positional encoding
Chuanhao Sun, Zhihang Yuan, Kai Xu, Luo Mai, N Siddharth, Shuo Chen, Mahesh K Marina
ServerlessLLM: Low-Latency Serverless Inference for Large Language Models
Yao Fu, Leyang Xue, Yeqi Huang, Andrei-Octavian Brabete, Dmitrii Ustiugov, Yuvraj Patel, Luo Mai
MoE-Infinity: Activation-Aware Expert Offloading for Efficient MoE Serving
Leyang Xue, Yao Fu, Zhan Lu, Luo Mai, Mahesh Marina
2023
TorchOpt: An Efficient Library for Differentiable Optimization
Jie Ren, Xidong Feng, Bo Liu, Xuehai Pan, Yao Fu, Luo Mai, Yaodong Yang
GEAR: A GPU-Centric Experience Replay System for Large Reinforcement Learning Models
Hanjing Wang, Man-Kit Sit, Congjie He, Ying Wen, Weinan Zhang, Jun Wang, Yaodong Yang, Luo Mai
2022
Ekko: A Large-Scale Deep Learning Recommender System with Low-Latency Model Update
Chijun Sima, Yao Fu, Man-Kit Sit, Liyi Guo, Xuri Gong, Feng Lin, Junyu Wu, Yongsheng Li, Haidong Rong, Pierre-Louis Aublin, Luo Mai
A Theoretical Understanding of Gradient Bias in Meta-Reinforcement Learning
Bo Liu, Xidong Feng, Jie Ren, Luo Mai, Rui Zhu, Haifeng Zhang, Jun Wang, Yaodong Yang
MegBA: A GPU-Based Distributed Library for Large-Scale Bundle Adjustment
Jie Ren, Wenteng Liang, Ran Yan, Luo Mai, Shiwen Liu, Xiao Liu
2021
Move Fast and Meet Deadlines: Fine-grained Real-time Stream Processing with Cameo
Le Xu, Shivaram Venkataraman, Indranil Gupta, Luo Mai, Rahul Potharaju
Fast and Flexible Human Pose Estimation with HyperPose
Yixiao Guo, Jiawei Liu, Guo Li, Luo Mai, Hao Dong
Efficient Reinforcement Learning Development with RLzoo
Zihan Ding, Tianyang Yu, Yanhua Huang, Hongming Zhang, Guo Li, Quancheng Guo, Luo Mai, Hao Dong
2020
KungFu: Making Training in Distributed Machine Learning Adaptive
Luo Mai, Guo Li, Marcel Wagenlander, Konstantinos Fertakis, Andrei-Octavian Brabete, Peter Pietzuch
Spotnik: Designing Distributed Machine Learning for Transient Cloud Resources
Marcel Wagenlander, Luo Mai, Guo Li, Peter Pietzuch
2019
Taming Hyper-parameters in Deep Learning Systems
Luo Mai, Alexandros Koliousis, Guo Li, Andrei-Octavian Brabete, Peter Pietzuch
CrossBow: Scaling Deep Learning with Small Batch Sizes on Multi-GPU Servers
Alexandros Koliousis, Pijika Watcharapichat, Matthias Weidlich, Luo Mai, Paolo Costa, Peter Pietzuch
2018
Chi: A Scalable and Programmable Control Plane for Distributed Stream Processing Systems
Luo Mai, Kai Zeng, Rahul Potharaju, Le Xu, Shivaram Venkataraman, Paolo Costa, Terry Kim, Saravanan Muthukrishnan, Vamsi Kuppa, Sudheer Dhulipalla, Sriram Rao
2017
Emu: Rapid Prototyping of Networking Services
Nik Sultana, Salvator Galea, David Greaves, Marcin Wojcik, Jonny Shipton, Richard Clegg, Luo Mai, Pietro Bressana, Robert Soule, Richard Mortier, Paolo Costa, Peter Pietzuch, Jon Crowcroft, Andrew W Moore, Noa Zilberman
TensorLayer: A Versatile Library for Efficient Deep Learning Development
Hao Dong, Akara Supratak, Luo Mai, Fangde Liu, Axel Oehmichen, Simiao Yu, Yike Guo
2016
Flick: Developing and Running Application-specific Network Services
Abdul Alim, Richard G. Clegg, Luo Mai, Lukas Rupprecht, Eric Seckler, Paolo Costa, Peter Pietzuch, Alexander L. Wolf, Nik Sultana, Jon Crowcroft, Anil Madhavapeddy, Andrew Moore, Richard Mortier, Luis Oviedo, Masoud Koleni, Derek McAuley, Matteo Migliavacca
Towards a Network Marketplace in a Cloud
Da Yu, Luo Mai, Somaya Arianfar, Rodrigo Fonseca, Orran Krieger, David Oran
2015
Optimizing Network Performance in Distributed Machine Learning
Luo Mai, Chuntao Hong, Paolo Costa
2014
NetAgg: Using Middleboxes for Application-specific On-path Aggregation in Data Centres
Luo Mai, Lukas Rupprecht, Abdul Alim, Paolo Costa, Matteo Migliavacca, Peter Pietzuch, Alexander L. Wolf
2013
Exploiting Time-malleability in Cloud-based Batch Processing Systems
Luo Mai, Evangelia Kalyvianaki, Paolo Costa
2011
Load Balanced Rendezvous Data Collection in Wireless Sensor Networks
Luo Mai, Longfei Shangguan, Chao Lang, Junzhao Du, Zhenjiang Li, Mo Li