Season 2 · Episode 1762

Testing AI Truthfulness: Beyond Vibes

Stop trusting confident AI. We explore the formal science of testing LLMs for hallucinations and knowledge cutoffs.

March 29, 202624m 56s

Audio is streamed directly from the publisher (dts.podtrac.com) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.

Original episode page View transcript

Show Notes

Is your AI making up facts? As LLMs surge in enterprise, "vibes-based" testing is causing real-world failures. We dive into the formal science of AI evaluation, moving beyond random prompts to statistical significance. Learn how frameworks like TruthfulQA, adversarial prompting, and calibration metrics actually measure if a model is resilient to hallucinations.

← All episodes of My Weird Prompts