
Season 1 · Episode 121
Decoding RLHF: Why Your AI is So Annoyingly Nice
Ever wonder why AI is so polite? Herman and Corn dive into the mechanics of RLHF and how "niceness" gets baked into modern language models.
My Weird Prompts · Daniel Rosehill
December 29, 202526m 33s
Audio is streamed directly from the publisher (dts.podtrac.com) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.
Show Notes
Why does every AI sound like a corporate assistant? In this episode of My Weird Prompts, Herman and Corn break down the "three-stage rocket" of AI training—moving from raw pre-training to Supervised Fine-Tuning and the complex world of Reinforcement Learning from Human Feedback (RLHF). They explore how Reward Models and human preference ranking create the "annoying niceness" we see today, the hidden risks of AI sycophancy, and why models often become "yes-men" to their users. From the "alignment tax" to the rise of RLAIF (AI Feedback) and Direct Preference Optimization (DPO), the brothers peel back the curtain on how developers bake specific personalities into code. Whether you're curious about the "Representation Tax" or how to train a cynical 1940s noir detective AI, this episode offers a technical yet accessible look at the secret sauce making modern AI feel—for better or worse—so human-like.