PLAY PODCASTS
Can AI Map Your House Just by Looking Around?
Season 1 · Episode 132

Can AI Map Your House Just by Looking Around?

Discover how spatial-temporal tokenization and 3D world modeling are revolutionizing real-time video-to-video AI interaction.

My Weird Prompts · Daniel Rosehill

January 2, 202622m 6s

Audio is streamed directly from the publisher (dts.podtrac.com) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.

Show Notes

In this episode of My Weird Prompts, hosts Herman and Corn dive into the cutting-edge landscape of 2026’s video-based multimodal AI. They explore how the industry moved beyond simple frame-sampling to adopt spatial-temporal tokenization, allowing models to treat time as a physical dimension. The discussion covers the technical hurdles of real-time video-to-video interaction, including Simultaneous Localization and Mapping (SLAM) for floor plan generation and the use of speculative decoding to minimize latency. By examining the integration of Neural Radiance Fields (NeRFs) and native multimodality, Herman and Corn reveal how AI is finally crossing the uncanny valley to create digital avatars that are indistinguishable from reality.