PLAY PODCASTS
The 79% AI Coder: Genius or Just Memorization?
Season 2 · Episode 1831

The 79% AI Coder: Genius or Just Memorization?

AI models now score 79% on coding benchmarks, but a 40-point drop on harder tests reveals the truth.

My Weird Prompts · Daniel Rosehill

March 31, 202623m 31s

Audio is streamed directly from the publisher (dts.podtrac.com) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.

Show Notes

The latest SWE-bench results show AI coding agents hitting 79% accuracy, nearly matching human engineers. But is this real progress or just sophisticated memorization? We explore the hidden role of agent scaffolds, the shocking cost differences between models, and why harder benchmarks reveal a 40-point performance drop.