
Season 2 · Episode 1839
AI's Data Kitchen: From Hoovering to Fine-Tuning
We go behind the curtain of the AI data pipeline, revealing the messy, multi-billion-dollar war over data curation.
My Weird Prompts · Daniel Rosehill
March 31, 202627m 32s
Audio is streamed directly from the publisher (dts.podtrac.com) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.
Show Notes
Everyone talks about the magic of AI, but the real war is over data. This episode pulls back the curtain on the messy, multi-billion-dollar process of finding, cleaning, and filtering the information that trains large language models. We explore why the era of simply "hoovering" the internet is over, how deduplication and quality filtering work, and why the "well of high-quality data" might be running dry.