[09/25, Paper] MoE-CAP accepted to NeurIPS 2025 (Dataset and Benchmark Track).
Excited to share that MoE-CAP has been accepted to NeurIPS 2025 (Datasets and Benchmarks Track)! 🎉
The Problem: Sparse Mixture-of-Experts (MoE) models are the go-to architecture for scaling LLMs efficiently—but deploying them is hard. They depend on heterogeneous compute and memory, making three-way trade-offs between Cost, Accuracy, and Performance (CAP) inevitable. Yet no benchmark captures these trade-offs well.
Our Contribution: MoE-CAP is the first benchmark purpose-built for MoE systems. Our key finding: achieving an optimal balance across all three CAP dimensions is fundamentally difficult—MoE systems typically optimize two at the expense of the third. We call this the MoE-CAP trade-off.
What we introduce:
- 📊 CAP Radar Diagram — a new visualization tool to intuitively map Cost-Accuracy-Performance trade-offs
- S-MBU (Sparse Memory Bandwidth Utilization) — a sparsity-aware memory metric
- S-MFU (Sparse Model FLOPS Utilization) — a sparsity-aware compute metric
These metrics enable accurate, apples-to-apples benchmarking of MoE systems across diverse hardware and deployment scenarios.
Paper: https://arxiv.org/abs/2412.07067
Looking forward to presenting at NeurIPS 2025!