Episode 46

Policy Puppetry: How a Single Prompt Can Trick ChatGPT, Gemini & More Into Revealing Secrets

April 28, 202512m 44s

Audio is streamed directly from the publisher (media.transistor.fm) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.

Original episode page

Show Notes

Recent research by HiddenLayer has uncovered a shocking new AI vulnerability—dubbed the "Policy Puppetry Attack"—that can bypass safety guardrails in all major LLMs, including ChatGPT, Gemini, Claude, and more.

In this episode, we dive deep into:
🔓 How a single, cleverly crafted prompt can trick AI into generating harmful content—from bomb-making guides to uranium enrichment.
💻 The scary simplicity of system prompt extraction—how researchers (and hackers) can force AI to reveal its hidden instructions.
🛡️ Why this flaw is "systemic" and nearly impossible to patch, exposing a fundamental weakness in how AI models are trained.
⚖️ The ethical dilemma: Should AI be censored? Or is the real danger in what it can do, not just what it says?
🔮 What this means for the future of AI security—and whether regulation can keep up with rapidly evolving threats.

We’ll also explore slopsquatting, a new AI cyberattack where fake software libraries hallucinated by chatbots can lead users to malware.

Is AI safety a lost cause? Or can developers outsmart the hackers? Tune in for a gripping discussion on the dark side of large language models.

Topics

Policy PuppetryHacking AIAI technologyAI newsAI AgentsChatGPTGoogle Gemini

← All episodes of Daily Security Review

Policy Puppetry: How a Single Prompt Can Trick ChatGPT, Gemini &amp; More Into Revealing Secrets

Show Notes

Topics

Policy Puppetry: How a Single Prompt Can Trick ChatGPT, Gemini & More Into Revealing Secrets