Skip to content
Luo Mai

Luo Mai

Associate Professor (Reader), Large-Scale Machine Learning Systems Group

University of Edinburgh

I am an Associate Professor, equivalent to Reader in the UK system, in the School of Informatics at the University of Edinburgh, where I lead the Large-Scale Machine Learning Systems Group. I also co-lead the UK EPSRC Centre for Doctoral Training in Machine Learning Systems and a UK ARIA project on scaling AI compute.

My research focuses on machine learning systems across the full stack, from models and data to runtimes, systems software, and emerging AI hardware. A long-term goal of my group is to rethink the machine learning systems stack so that future AI infrastructure can achieve up to a 1000X improvement in efficiency, scalability, and reliability. Our work combines research publications with reusable open-source systems and libraries, which have collectively received over 20,000 GitHub stars. I also co-edited the open-source textbook Machine Learning Systems: Design and Implementation.

Before joining Edinburgh, I was a Research Associate at Imperial College London and a Visiting Researcher at Microsoft Research. I received my PhD under the supervision of Paolo Costa and Alexander L. Wolf, supported by a Google Fellowship in Cloud Computing.

Updates

Latest News

All news →

March 20, 2026

BatchGen accepted to OSDI 2026

BatchGen targets throughput-first inference for very large MoE-style models.

February 20, 2026

ContextPilot accepted to MLSys 2026

ContextPilot speeds long-context inference through context reuse.

November 25, 2025

BitDecoding accepted to HPCA 2026

BitDecoding accelerates low-bit KV-cache inference by unlocking Tensor Cores.

September 25, 2025

MoE-CAP accepted to NeurIPS 2025

MoE-CAP appears in the Dataset and Benchmark Track for mixture-of-experts evaluation.

August 25, 2025

Award for AI4Math systems research

New funding to build systems that support AI-driven mathematical discovery.

People

Large-Scale Machine Learning Systems Group

Current researchers, PhD students, and alumni building efficient machine learning systems.

5

staff

13

PhDs

9

alumni

Full group →

Research

Recent Publications

Recent work on ML systems, AI compute, efficient inference, and large-scale learning.

All publications →

Academic activity

Awards, Teaching & Service

Selected recognitions, courses, and community roles supporting machine learning systems research.

Awards & Grants

All awards →
2025

AI for Math Fund award

Renaissance Philanthropy

2025

ARIA project: Benchmarking AI Evolution

ARIA

2024

ARIA project: Scaling AI Compute

ARIA

2024

Microsoft Research Asia StarTrack Scholar Award

Microsoft Research Asia

Teaching

All teaching →

Machine Learning Systems

INFR11269 · 2025 · Course designer and organiser

Extreme Computing

INFR11088 · 2024 · Course designer and organiser

Extreme Computing

INFR11088 · 2023 · Course designer and organiser

Professional Service

All service →
2026

General Co-Chair

EuroSys 2026

2024

General Co-Chair

International Workshop on Efficient Generative AI

2026

Program committee member

ISCA

2026

Program committee member

ASPLOS

Open-source impact

Research Software & Projects

Open-source systems and software artifacts that turn research ideas into reusable infrastructure.

8 projects 11.7K stars All projects →

Deep learning and reinforcement learning library with reusable layers, models, and training utilities.

machine-learning-systems

Real-time pose-estimation framework with high-level APIs and optimized CPU/GPU execution.

machine-learning-systems

Checkpoint-aware serverless LLM serving for fast, cost-efficient custom model deployment.

machine-learning-systems

RLzoo

Software

641

Reinforcement-learning model zoo with ready-to-run algorithms, environments, and training utilities.

machine-learning-systems

PyTorch library for differentiable optimization, meta-learning, and implicit or zero-order gradients.

machine-learning-systems

MegBA

Software

491

Distributed GPU bundle-adjustment library for large-scale 3D reconstruction workloads.

machine-learning-systems

Low-latency GPU graph-learning runtime for scaling PyG workloads across machines.

machine-learning-systems

Adaptive distributed training runtime with monitoring and control APIs for large GPU clusters.

machine-learning-systems