[03/26, Paper] BatchGen, the first batch-native infernece engine, accepted to OSDI 2026.
Excited to share that our paper BatchGen has been accepted to OSDI 2026! π
BatchGen is the first batch-native inference engine designed to accommodate extremely large MoE-style models, including DeepSeek-R1, GLM 5.1, and MiniMax 2.5/2.7, while optimizing for batch-first time-to-completion rather than latency-first metrics. Compared with todayβs state-of-the-art industry solutions, it improves system throughput by 3β4Γ on large GPU clusters and by up to 31Γ on memory-constrained GPUs.