PLAY PODCASTS
GPU Scaling: The "Go Wide or Go Tall" Dilemma
Season 2 · Episode 346

GPU Scaling: The "Go Wide or Go Tall" Dilemma

Should you use a fleet of cheap GPUs or one powerhouse? Learn the math behind serverless GPU costs, cold starts, and batching efficiency.

My Weird Prompts · Daniel Rosehill

January 29, 202625m 19s

Audio is streamed directly from the publisher (dts.podtrac.com) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.

Show Notes

In this episode, Herman and Corn dive deep into the engineering trade-offs of serverless GPU workloads. Using a real-world text-to-speech example on the Modal platform, they explore whether it’s better to scale horizontally with many small workers or vertically with a single high-end GPU like the H100. They break down the hidden costs of cold starts, the importance of memory bandwidth over raw compute, and how to find the "sweet spot" on the cost-efficiency curve to get the most bang for your buck.