Season 2 · Episode 210

Predictive Motion: How Transformers Are Learning to Walk

Explore how the same transformer architecture behind chatbots is now enabling robots to navigate the physical world using action tokens.

My Weird Prompts · Daniel Rosehill

January 9, 202623m 4s

Audio is streamed directly from the publisher (dts.podtrac.com) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.

Original episode page View transcript

Show Notes

In this deep dive, Herman and Corn explore the radical convergence of large language models and robotics, marking a transition from digital logic to physical embodiment. They break down the mechanics of Vision-Language-Action (VLA) models, explaining how the transformer architecture is being repurposed to predict motor commands just as it predicts words. By treating physical movements as "action tokens," researchers are bridging the gap between abstract reasoning and real-world coordination. The discussion covers the critical "reality gap," the role of high-fidelity simulations like NVIDIA Isaac Sim, and the necessity of low-latency edge computing for the next generation of humanoid robots. Whether it’s a robot arm grasping a cup or a humanoid navigating a kitchen, the duo questions if true intelligence can only be achieved when AI finally has a body to call its own.

← All episodes of My Weird Prompts