[02/26, Paper] ContextPilot accepted to MLSys 2026.

Excited to share that our paper ContextPilot has been accepted to MLSys 2026! 🎉

The Challenge: As LLMs handle increasingly long contexts (think RAG, agent memory, multi-turn conversations), prefill latency becomes the bottleneck. Current acceleration methods face a tough trade-off: preserve reasoning quality OR improve efficiency—but not both.

Our Solution: ContextPilot introduces context reuse as a new mechanism for faster long-context inference. The system intelligently identifies and reuses overlapping context blocks across different LLM interactions, maximizing KV-cache efficiency without sacrificing quality.

The Results:

  • Up to 3× faster prefill compared to state-of-the-art methods
  • Quality preserved (or even improved!) at longer context lengths
  • Modular architecture that integrates with existing inference engines

Open-sourced and ready to use: https://github.com/EfficientContext/ContextPilot

Looking forward to presenting this work at MLSys 2026!

Luo Mai
Luo Mai
Associate Professor

My research interests include computer systems, machine learning systems and data management.