Skip to content
Luo Mai

Luo Mai

Associate Professor (Reader), Large-Scale Machine Learning Systems Group

University of Edinburgh

I am an Associate Professor, equivalent to Reader in the UK system, in the School of Informatics at the University of Edinburgh, where I lead the Large-Scale Machine Learning Systems Group. I also co-lead the UK EPSRC Centre for Doctoral Training in Machine Learning Systems and a UK ARIA project on scaling AI compute.

My research focuses on machine learning systems across the full stack, from models and data to runtimes, systems software, and emerging AI hardware. A long-term goal of my group is to rethink the machine learning systems stack so that future AI infrastructure can achieve up to a 1000X improvement in efficiency, scalability, and reliability. Our work combines research publications with reusable open-source systems and libraries, which have collectively received over 20,000 GitHub stars. I also co-edited the open-source textbook Machine Learning Systems: Design and Implementation.

Before joining Edinburgh, I was a postdoctoral Research Associate at Imperial College London, where I worked with Peter Pietzuch, and a Visiting Researcher at Microsoft Research. I received my PhD under the supervision of Paolo Costa and Alexander L. Wolf, supported by a Google Fellowship in Cloud Computing.

Updates

Latest News

All news →

July 5, 2026

Two papers accepted to SOSP 2026

Wavel and MeshRT were accepted to the ACM Symposium on Operating Systems Principles.

June 27, 2026

Ryze published at ACL 2026 System Demonstrations

Ryze synthesizes evidence-enriched biomedical VLM training data from papers.

March 20, 2026

BatchGen accepted to OSDI 2026

BatchGen targets throughput-first inference for very large MoE-style models.

February 20, 2026

ContextPilot accepted to MLSys 2026

ContextPilot speeds long-context inference through context reuse.

November 25, 2025

BitDecoding accepted to HPCA 2026

BitDecoding accelerates low-bit KV-cache inference by unlocking Tensor Cores.

People

Large-Scale Machine Learning Systems Group

Current researchers, PhD students, and alumni building efficient machine learning systems.

5

staff

13

PhDs

9

alumni

Full group →

Research

Recent Publications

Recent work on ML systems, AI compute, efficient inference, and large-scale learning.

All publications →
01

Wavel: A Fast and Efficient Compilation System for Wafer-Scale Accelerators

Yeqi Huang, Congjie He, Haocheng Xiao, Yanwei Ye, Yi-Chieh Wang, Boyao Song, Yangshen Deng, Ziming Miao, Lingxiao Ma, Fan Yang, Luo Mai

SOSP 2026
02

MeshRT: Compile-Time Governed Wafer-Scale Runtime for Low-Latency High-Throughput Inference

Congjie He, Le Xu, Zhan Lu, Yeqi Huang, Haocheng Xiao, Cheng Deng, Lingxiao Ma, Ziming Miao, Fan Yang, Luo Mai

SOSP 2026

Academic activity

Awards, Teaching & Service

Selected recognitions, courses, and community roles supporting machine learning systems research.

Awards & Grants

All awards →
2025

AI for Math Fund award

Renaissance Philanthropy

2025

ARIA project: Benchmarking AI Evolution

ARIA

2024

ARIA project: Scaling AI Compute

ARIA

2024

Microsoft Research Asia StarTrack Scholar Award

Microsoft Research Asia

Teaching

All teaching →

Machine Learning Systems

INFR11269 · 2025 · Course designer and organiser

Extreme Computing

INFR11088 · 2024 · Course designer and organiser

Extreme Computing

INFR11088 · 2023 · Course designer and organiser

Professional Service

All service →
2026

General Co-Chair

EuroSys 2026

2024

General Co-Chair

International Workshop on Efficient Generative AI

2026

Program committee member

ISCA

2026

Program committee member

ASPLOS

Open-source impact

Research Software & Projects

Open-source systems and software artifacts that turn research ideas into reusable infrastructure.

8 projects 11.7K stars All projects →

Deep learning and reinforcement learning library with reusable layers, models, and training utilities.

machine-learning-systems

Real-time pose-estimation framework with high-level APIs and optimized CPU/GPU execution.

machine-learning-systems

Checkpoint-aware serverless LLM serving for fast, cost-efficient custom model deployment.

machine-learning-systems

RLzoo

Software

641

Reinforcement-learning model zoo with ready-to-run algorithms, environments, and training utilities.

machine-learning-systems

PyTorch library for differentiable optimization, meta-learning, and implicit or zero-order gradients.

machine-learning-systems

MegBA

Software

491

Distributed GPU bundle-adjustment library for large-scale 3D reconstruction workloads.

machine-learning-systems

Low-latency GPU graph-learning runtime for scaling PyG workloads across machines.

machine-learning-systems

Adaptive distributed training runtime with monitoring and control APIs for large GPU clusters.

machine-learning-systems