
Season 2 · Episode 1762
Testing AI Truthfulness: Beyond Vibes
Stop trusting confident AI. We explore the formal science of testing LLMs for hallucinations and knowledge cutoffs.
My Weird Prompts · Daniel Rosehill
March 29, 202624m 56s
Audio is streamed directly from the publisher (dts.podtrac.com) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.
Show Notes
Is your AI making up facts? As LLMs surge in enterprise, "vibes-based" testing is causing real-world failures. We dive into the formal science of AI evaluation, moving beyond random prompts to statistical significance. Learn how frameworks like TruthfulQA, adversarial prompting, and calibration metrics actually measure if a model is resilient to hallucinations.