[07/24, Paper] ServerlessLLM, the first serverless LLM system, accepted to OSDI 2024.

ServerlessLLM is a new system enabling cost-effective serverless inference for LLMs by implementing a scalable and high-performance “checkpoint storage layer” on GPU servers. It achieves this through an innovative LLM checkpoint format, a multi-tier checkpoint loading subsystem, an efficient live migration algorithm, and a locality-friendly GPU serverless architecture.

The ServerlessLLM paper has been accepted to the top systems conference, OSDI’24, and we are preparing to open-source the project. Stay tuned.

Luo Mai
Luo Mai
Assistant Professor

My research interests include computer systems, machine learning systems and data management.