[Week 2] "Learning from human preferences" (Blog Post) by Dario Amodei, Paul Christiano & Alex Ray

TYPE III AUDIO (All episodes) · TYPE III AUDIO

April 28, 20236m 33s

Audio is streamed directly from the publisher (buzzsprout.com) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.

Chapters

Show Notes

---
client: agi_sf
project_id: core_readings
feed_id: agi_sf__alignment
narrator: pw
qa: mds
qa_time: 0h15m
---
One step towards building safe AI systems is to remove the need for humans to write goal functions, since using a simple proxy for a complex goal, or getting the complex goal a bit wrong, can lead to undesirable and even dangerous behavior. In collaboration with DeepMind’s safety team, we’ve developed an algorithm which can infer what humans want by being told which of two proposed behaviors is better.

Original article:
https://openai.com/research/learning-from-human-preferences

Authors:
Dario Amodei, Paul Christiano, Alex Ray

---
This article is featured on the AGI Safety Fundamentals: Alignment course curriculum.

Narrated by TYPE III AUDIO on behalf of BlueDot Impact.

Share feedback on this narration.

← All episodes of TYPE III AUDIO (All episodes)

[Week 2] &quot;Learning from human preferences&quot; (Blog Post) by Dario Amodei, Paul Christiano &amp; Alex Ray

Show Notes

[Week 2] "Learning from human preferences" (Blog Post) by Dario Amodei, Paul Christiano & Alex Ray