
Audio is streamed directly from the publisher (api.substack.com) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.
Show Notes
Read the full post here: https://www.interconnects.ai/p/building-on-evaluation-quicksand
Chapters
00:00 Building on evaluation quicksand
01:26 The causes of closed evaluation silos
06:35 The challenge facing open evaluation tools
10:47 Frontiers in evaluation
11:32 New types of synthetic data contamination
13:57 Building harder evaluations
Figures
This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.interconnects.ai/subscribe