ServerlessLLM in OSDI 2024.

ServerlessLLM is a new system that enables cost-effective serverless inference for LLMs. Its key contribution is to make “Checkpoint Locality” effectively realized. It achieves this through a combination of a new LLM checkpoint format and a fast loader on multi-tier storage, an efficient algorithm for LLM inference live migration, and a new locality-friendly GPU serverless architecture.

The ServerlessLLM paper has been accepted to the top system conference OSDI'24, and we are in the process of open-sourcing it. Stay tuned.

Luo Mai
Luo Mai
Assistant Professor

My research interests include computer systems, machine learning and data management.