Season 2 · Episode 1831

The 79% AI Coder: Genius or Just Memorization?

AI models now score 79% on coding benchmarks, but a 40-point drop on harder tests reveals the truth.

March 31, 202623m 31s

Audio is streamed directly from the publisher (dts.podtrac.com) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.

Original episode page View transcript

Show Notes

The latest SWE-bench results show AI coding agents hitting 79% accuracy, nearly matching human engineers. But is this real progress or just sophisticated memorization? We explore the hidden role of agent scaffolds, the shocking cost differences between models, and why harder benchmarks reveal a 40-point performance drop.

← All episodes of My Weird Prompts