[09/25, Paper] MoE-CAP accepted to NeurIPS 2025 (Dataset and Benchmark Track).

Excited to share that MoE-CAP has been accepted to NeurIPS 2025 (Datasets and Benchmarks Track)! 🎉

The Problem: Sparse Mixture-of-Experts (MoE) models are the go-to architecture for scaling LLMs efficiently—but deploying them is hard. They depend on heterogeneous compute and memory, making three-way trade-offs between Cost, Accuracy, and Performance (CAP) inevitable. Yet no benchmark captures these trade-offs well.

Our Contribution: MoE-CAP is the first benchmark purpose-built for MoE systems. Our key finding: achieving an optimal balance across all three CAP dimensions is fundamentally difficult—MoE systems typically optimize two at the expense of the third. We call this the MoE-CAP trade-off.

What we introduce:

  • 📊 CAP Radar Diagram — a new visualization tool to intuitively map Cost-Accuracy-Performance trade-offs
  • S-MBU (Sparse Memory Bandwidth Utilization) — a sparsity-aware memory metric
  • S-MFU (Sparse Model FLOPS Utilization) — a sparsity-aware compute metric

These metrics enable accurate, apples-to-apples benchmarking of MoE systems across diverse hardware and deployment scenarios.

Paper: https://arxiv.org/abs/2412.07067

Looking forward to presenting at NeurIPS 2025!

Luo Mai
Luo Mai
Associate Professor

My research interests include computer systems, machine learning systems and data management.