AISN #17: Automatically Circumventing LLM Guardrails, the Frontier Model Forum, and Senate Hearing on AI Oversight.

AI Safety Newsletter · Center for AI Safety

August 1, 202315m 44s

Audio is streamed directly from the publisher (dl.type3.audio) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.

Original episode page

Show Notes

Automatically Circumventing LLM GuardrailsLarge language models (LLMs) can generate hazardous information, such as step-by-step instructions on how to create a pandemic pathogen. To combat the risk of malicious use, companies typically build safety guardrails intended to prevent LLMs from misbehaving. But these safety controls are almost useless against a new attack developed by researchers at Carnegie Mellon University and the Center for AI Safety. By studying the vulnerabilities in open source models such as Meta’s LLaMA 2, the researchers can automatically generate a nearly unlimited supply of “adversarial suffixes,” which are words and characters that cause any model’s safety controls to fail. This discovery calls into question the fundamental limits of safety [...] ---Outline:(00:12) Automatically Circumventing LLM Guardrails(05:40) AI Labs Announce the Frontier Model Forum(07:54) Senate Hearing on AI Oversight(14:42) Links --- First published: August 1st, 2023 Source: <a href="https://newsletter.safe.ai/p/ai-safety-newsletter-17?utm_source=TYPE_III_AUDIO&utm_medium=Podcast&utm_content=Source+URL+in+episode+description&utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank">https://newsletter.safe.ai/p/ai-safety-newsletter-17</a> --- Want more? Check out our <a href="https://newsletter.mlsafety.org/?utm_source=TYPE_III_AUDIO&utm_medium=Podcast&utm_content=Episode+description+footer" target="_blank" rel="noreferrer">ML Safety Newsletter</a> for technical safety research. Narrated by <a href="https://type3.audio/?utm_source=TYPE_III_AUDIO&utm_medium=Podcast&utm_content=Narrated+by+TYPE+III+AUDIO&utm_term=center_for_ai_safety&utm_campaign=ai_narration" rel="noopener noreferrer" target="_blank">TYPE III AUDIO</a>.

← All episodes of AI Safety Newsletter