Season 1 · Episode 110

Building the Ultimate Local AI Inference Server

Learn how to build a high-performance local AI server for agentic coding, from dual-GPU PC builds to the power of Mac's unified memory.

My Weird Prompts · Daniel Rosehill

December 27, 202521m 13s

Audio is streamed directly from the publisher (dts.podtrac.com) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.

Original episode page View transcript

Show Notes

Are you struggling to run the latest AI models on your aging hardware? In this deep dive, Herman and Corn break down the technical requirements for building a dedicated local inference server in late 2025. They move beyond simple chatbots to discuss "agentic" code generation—systems that can autonomously debug and test projects—and why these sophisticated tools demand massive amounts of VRAM. From the technical hurdles of the KV cache to a step-by-step shopping list for a dual-RTX 3090 PC build, this episode provides a comprehensive hardware roadmap for developers. They also weigh the pros and cons of Apple’s unified memory architecture versus the raw power of DIY Linux builds, exploring how quantization can help you squeeze more performance out of your budget. If you value privacy and need the speed of local execution, this is the hardware guide you've been waiting for.

← All episodes of My Weird Prompts