LLM Benchmarks Are Full of Noise: Statistical Rigor in AI Evals

April 25, 202628m 3s

Audio is streamed directly from the publisher (dts.podtrac.com) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.

← All episodes of My Weird Prompts