Season 2 · Episode 1112

Inside the Neural Cathedral: Cracking the AI Black Box

Peek inside the "black box" of AI to discover how models use high-dimensional geometry and superposition to organize complex human concepts.

My Weird Prompts · Daniel Rosehill

March 11, 202625m 53s

Audio is streamed directly from the publisher (dts.podtrac.com) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.

Original episode page View transcript

Show Notes

For years, the inner workings of large language models have been treated as a mysterious "black box" where inputs turn into outputs through a process that looks more like magic than math. This episode dives into the cutting-edge field of mechanistic interpretability, exploring how researchers are finally reverse-engineering the "neural cathedrals" of AI to map out the specific circuits that drive machine logic. From the strange geometry of high-dimensional superposition to the discovery of "Golden Gate Claude" via sparse autoencoders, we explore how these models organize millions of concepts across a limited number of neurons. By understanding these emergent digital blueprints, we move one step closer to ensuring that the alien intelligences we are building remain safe, transparent, and aligned with human values.

← All episodes of My Weird Prompts