
Deep Dive into Inference Optimization for LLMs with Philip Kiely
November 5, 20241h 4m
Audio is streamed directly from the publisher (traffic.megaphone.fm) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.
Show Notes
Today we have Philip Kiely from Baseten on the show. Baseten is a Series B startup focused on providing infrastructure for AI workloads.
We go deep on Inference Optimization. We cover choosing a model, discuss the hype around Compound AI, choosing an Inference Engine, Optimization Techniques like Quantization and Speculative Decoding all the way down to your GPU choice.