Luo Mai
Luo Mai
Home
News
Publications
Open-Source
Group
Teaching
Service and Awards
Contact
Light
Dark
Automatic
BatchGen: An Architecture for Scalable and Efficient Batch Inference
Tairan Xu*
,
Leyang Xue*
,
Zhan Lu*
,
Jinfu Deng
,
Hongyang Xiao
,
Yinsicheng Jiang
,
Congjie He
,
Matej Sandor
,
Le Xu
,
Luo Mai
July 2026
Abstract
TBD
Type
Conference paper
Publication
In
USENIX Symposium on Operating Systems Design and Implementation (OSDI 2026)
Machine Learning Systems
Tairan Xu
PhD Student
Leyang Xue
PhD Student (Primary supervisor Mahesh Marina)
Zhan Lu
PhD Student
Yinsicheng Jiang
PhD Student
Congjie He
PhD Student
Matej Sandor
PhD Student
Le Xu
Research Fellow
Luo Mai
Associate Professor
My research interests include computer systems, machine learning systems and data management.
Related
MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems
WaferLLM: Large Language Model Inference at Wafer Scale
GEAR: A GPU-Centric Experience Replay System for Large Reinforcement Learning Models
ContextPilot: Fast Long-Context Inference via Context Reuse
BitDecoding: Unlocking Tensor Cores for Long-Context LLMs with Low-Bit KV Cache
Cite
×