PLAY PODCASTS
(Voiceover) Building on evaluation quicksand

(Voiceover) Building on evaluation quicksand

Interconnects

October 16, 202416m 36s

Audio is streamed directly from the publisher (api.substack.com) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.

Show Notes

Read the full post here: https://www.interconnects.ai/p/building-on-evaluation-quicksand

Chapters

00:00 Building on evaluation quicksand

01:26 The causes of closed evaluation silos

06:35 The challenge facing open evaluation tools

10:47 Frontiers in evaluation

11:32 New types of synthetic data contamination

13:57 Building harder evaluations

Figures

Fig 1: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/manual/openai-predictions.webp



This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.interconnects.ai/subscribe