[10/24, Paper] Tenplex, the first elastic LLM system, accepted to SOSP 2024.
Tenplex is a new system enabling elastic training of LLMs with advanced multi-dimensional parallelism. The Tenplex paper has been accepted for presentation at the 30th Symposium on Operating Systems Principles (SOSP’24). We are currently preparing to open-source the project. Stay tuned.