[Week 3] "The alignment problem from a deep learning perspective" (Sections 2, 3 and 4) by Richard Ngo, Lawrence Chan & Sören Mindermann

TYPE III AUDIO (All episodes) · TYPE III AUDIO

March 27, 202333m 47s

Audio is streamed directly from the publisher (buzzsprout.com) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.

Chapters

Show Notes

---
client: agi_sf
project_id: core_readings
feed_id: agi_sf__alignment
narrator: pw
qa: mds
qa_time: 1h00m
---

Within the coming decades, artificial general intelligence (AGI) may surpass human capabilities at a wide range of important tasks. We outline a case for expecting that, without substantial effort to prevent it, AGIs could learn to pursue goals which are undesirable (i.e. misaligned) from a human perspective. We argue that if AGIs are trained in ways similar to today's most capable models, they could learn to act deceptively to receive higher reward, learn internally-represented goals which generalize beyond their training distributions, and pursue those goals using power-seeking strategies. We outline how the deployment of misaligned AGIs might irreversibly undermine human control over the world, and briefly review research directions aimed at preventing this outcome.

Original article:
https://arxiv.org/abs/2209.00626

Authors:
Richard Ngo, Lawrence Chan, Sören Mindermann

---
This article is featured on the AGI Safety Fundamentals: Alignment course curriculum.

Narrated by TYPE III AUDIO on behalf of BlueDot Impact.

Share feedback on this narration.

← All episodes of TYPE III AUDIO (All episodes)

[Week 3] &quot;The alignment problem from a deep learning perspective&quot; (Sections 2, 3 and 4) by Richard Ngo, Lawrence Chan &amp; Sören Mindermann

Show Notes

[Week 3] "The alignment problem from a deep learning perspective" (Sections 2, 3 and 4) by Richard Ngo, Lawrence Chan & Sören Mindermann