[07/24, Paper] ServerlessLLM, the first serverless LLM system, accepted to OSDI 2024.
ServerlessLLM is a new system enabling cost-effective serverless inference for LLMs by implementing a scalable and high-performance “checkpoint storage layer” on GPU servers. It achieves this through an innovative LLM checkpoint format, a multi-tier checkpoint loading subsystem, an efficient live migration algorithm, and a locality-friendly GPU serverless architecture.
The ServerlessLLM paper has been accepted to the top systems conference, OSDI’24, and we are preparing to open-source the project. Stay tuned.