MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems

Yinsicheng Jiang *, Yao Fu *, Yeqi Huang *, Ping Nie, Zhan Lu, Leyang Xue, Congjie He, Man-Kit Sit, Jilong Xue, Li Dong, Ziming Miao, Dayou Du, Tairan Xu, Kai Zou, Edoardo Ponti, Luo Mai

September 2025

PDF

Abstract

The sparse Mixture-of-Experts (MoE) architecture is increasingly favored for scaling Large Language Models (LLMs) efficiently, but it depends on heterogeneous compute and memory resources. These factors jointly affect system Cost, Accuracy, and Performance (CAP), making trade-offs inevitable. Existing benchmarks often fail to capture these trade-offs accurately, complicating practical deployment decisions. To address this, we introduce MoE-CAP, a benchmark specifically designed for MoE systems. Our analysis reveals that achieving an optimal balance across CAP is difficult with current hardware; MoE systems typically optimize two of the three dimensions at the expense of the third-a dynamic we term the MoE-CAP trade-off. To visualize this, we propose the CAP Radar Diagram. We further introduce sparsity-aware performance metrics-Sparse Memory Bandwidth Utilization (S-MBU) and Sparse Model FLOPS Utilization (S-MFU)-to enable accurate performance benchmarking of MoE systems across diverse hardware platforms and deployment scenarios.

Type

Conference paper

Publication

In Annual Conference on Neural Information Processing Systems (NeurIPS) 2025

Machine Learning Systems

Luo Mai

Associate Professor

My research interests include computer systems, machine learning systems and data management.

MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems

Abstract

Yinsicheng Jiang

PhD Student

Yao Fu

PhD Student

Yeqi Huang

PhD Student

Zhan Lu

PhD Student

Leyang Xue

PhD Student (Primary supervisor Mahesh Marina)

Congjie He

PhD Student

Man-Kit Sit

PhD Student

Dayou Du

PhD Student (Primary supervisor Jianyi Cheng)

Tairan Xu

PhD Student

Luo Mai

Associate Professor

Related