
An AI stack: from scaling AI workloads to evaluating LLMs
Strachey Lectures · Oxford University
February 26, 202655m 58s
Audio is streamed directly from the publisher (media.podcasts.ox.ac.uk) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.
Show Notes
Hilary Term 2026 Strachey Lecture with Professor Ion Stoica, An AI stack: from scaling AI workloads to evaluating LLMs Large language models (LLMs) have taken the world by storm, enabling new applications, intensifying GPU shortages, and raising
concerns about the accuracy of their outputs. In this talk, I will present several projects I have worked on to address these
challenges. Specifically, I will focus on Ray, a distributed framework for scaling AI workloads, vLLM and SGLang, two
high-throughput inference engines for LLMs, and LMArena, a platform for accurate LLM benchmarking. I will conclude with key
lessons learned and outline directions for future research.