[03/26, Paper] BatchGen, the first batch-native infernece engine, accepted to OSDI 2026.

Mar 20, 2026

Excited to share that our paper BatchGen has been accepted to OSDI 2026! 🎉

BatchGen is the first batch-native inference engine designed to accommodate extremely large MoE-style models, including DeepSeek-R1, GLM 5.1, and MiniMax 2.5/2.7, while optimizing for batch-first time-to-completion rather than latency-first metrics. Compared with today’s state-of-the-art industry solutions, it improves system throughput by 3–4× on large GPU clusters and by up to 31× on memory-constrained GPUs.

Luo Mai

Associate Professor

My research interests include computer systems, machine learning systems and data management.