
Season 1 · Episode 54
Tokenizing Everything: How Omnimodal AI Handles Any Input
Omnimodal AI: How do models process images, audio, video, and text all at once? Discover the engineering behind AI that accepts anything.
My Weird Prompts · Daniel Rosehill
December 11, 202532m 58s
Audio is streamed directly from the publisher (dts.podtrac.com) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.
Show Notes
How do AI models process images, audio, video, and text all at once? Herman and Corn dive deep into the technical complexity of multimodal tokenization, exploring how modern omnimodal models compress vastly different data types into a unified format that a single neural network can understand. From vision encoders to spectrograms to temporal compression, discover the engineering behind the AI systems that can accept anything and output anything.