March 20, 2026
BatchGen accepted to OSDI 2026
BatchGen targets throughput-first inference for very large MoE-style models.
Associate Professor (Reader), Large-Scale Machine Learning Systems Group
University of Edinburgh
I am an Associate Professor, equivalent to Reader in the UK system, in the School of Informatics at the University of Edinburgh, where I lead the Large-Scale Machine Learning Systems Group. I also co-lead the UK EPSRC Centre for Doctoral Training in Machine Learning Systems and a UK ARIA project on scaling AI compute.
My research focuses on machine learning systems across the full stack, from models and data to runtimes, systems software, and emerging AI hardware. A long-term goal of my group is to rethink the machine learning systems stack so that future AI infrastructure can achieve up to a 1000X improvement in efficiency, scalability, and reliability. Our work combines research publications with reusable open-source systems and libraries, which have collectively received over 20,000 GitHub stars. I also co-edited the open-source textbook Machine Learning Systems: Design and Implementation.
Before joining Edinburgh, I was a Research Associate at Imperial College London and a Visiting Researcher at Microsoft Research. I received my PhD under the supervision of Paolo Costa and Alexander L. Wolf, supported by a Google Fellowship in Cloud Computing.
Updates
March 20, 2026
BatchGen targets throughput-first inference for very large MoE-style models.
February 20, 2026
ContextPilot speeds long-context inference through context reuse.
November 25, 2025
BitDecoding accelerates low-bit KV-cache inference by unlocking Tensor Cores.
September 25, 2025
MoE-CAP appears in the Dataset and Benchmark Track for mixture-of-experts evaluation.
August 25, 2025
New funding to build systems that support AI-driven mathematical discovery.
People
Current researchers, PhD students, and alumni building efficient machine learning systems.
5
staff
13
PhDs
9
alumni
Research
Recent work on ML systems, AI compute, efficient inference, and large-scale learning.
Tairan Xu, Leyang Xue, Zhan Lu, Jinfu Deng, Hongyang Xiao, Yinsicheng Jiang, Congjie He, Matej Sandor, Le Xu, Luo Mai
Yinsicheng Jiang, Yeqi Huang, Liang Cheng, Cheng Deng, Xuan Sun, Luo Mai
Dayou Du, Shijie Cao, Jianyi Cheng, Luo Mai, Ting Cao, Mao Yang
Yinsicheng Jiang, Yao Fu, Yeqi Huang, Ping Nie, Zhan Lu, Leyang Xue, Congjie He, Man-Kit Sit, Jilong Xue, Li Dong, Ziming Miao, Dayou Du, Tairan Xu, Kai Zou, Edoardo Ponti, Luo Mai
Congjie He, Yeqi Huang, Pei Mu, Ziming Miao, Jilong Xue, Lingxiao Ma, Fan Yang, Luo Mai
Marcel Wagenländer, Guo Li, Bo Zhao, Luo Mai, Peter Pietzuch
Chuanhao Sun, Zhihang Yuan, Kai Xu, Luo Mai, N Siddharth, Shuo Chen, Mahesh K Marina
Yao Fu, Leyang Xue, Yeqi Huang, Andrei-Octavian Brabete, Dmitrii Ustiugov, Yuvraj Patel, Luo Mai
Leyang Xue, Yao Fu, Zhan Lu, Luo Mai, Mahesh Marina
Jie Ren, Xidong Feng, Bo Liu, Xuehai Pan, Yao Fu, Luo Mai, Yaodong Yang
Academic activity
Selected recognitions, courses, and community roles supporting machine learning systems research.
AI for Math Fund award
Renaissance Philanthropy
ARIA project: Benchmarking AI Evolution
ARIA
ARIA project: Scaling AI Compute
ARIA
Microsoft Research Asia StarTrack Scholar Award
Microsoft Research Asia
INFR11269 · 2025 · Course designer and organiser
INFR11088 · 2024 · Course designer and organiser
INFR11088 · 2023 · Course designer and organiser
General Co-Chair
EuroSys 2026
General Co-Chair
International Workshop on Efficient Generative AI
Program committee member
ISCA
Program committee member
ASPLOS
Open-source impact
Open-source systems and software artifacts that turn research ideas into reusable infrastructure.
Software
Deep learning and reinforcement learning library with reusable layers, models, and training utilities.
Real-time pose-estimation framework with high-level APIs and optimized CPU/GPU execution.
Software
Checkpoint-aware serverless LLM serving for fast, cost-efficient custom model deployment.
Reinforcement-learning model zoo with ready-to-run algorithms, environments, and training utilities.
PyTorch library for differentiable optimization, meta-learning, and implicit or zero-order gradients.
Distributed GPU bundle-adjustment library for large-scale 3D reconstruction workloads.
Low-latency GPU graph-learning runtime for scaling PyG workloads across machines.
Adaptive distributed training runtime with monitoring and control APIs for large GPU clusters.