Season 2013 · Episode 7

Podcast with Marc Toussaint on planning as inference and graphical models

How collaboration arrises and why it fails · Dr. Paul F.M.J. Verschure / Prof. Dr. Tony Prescott / Dr. Anna Mura

March 15, 202652m 18s

Audio is streamed directly from the publisher (content.rss.com) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.

Original episode page View transcript

Show Notes

What if planning is not about computing value functions but about performing probabilistic inference? Marc Toussaint shows how recasting optimal control as message passing opens new computational pathways for robotics and decision-making.

Subscribe for more from the Convergent Science Network podcast series.

Marc Toussaint presents a theoretical framework that reformulates planning and optimal control as probabilistic inference in graphical models. Rather than iterating backward through Bellman equations to compute value functions, his approach computes both forward and backward messages whose product yields a posterior distribution over actions. This shift in perspective is not merely notational: it leads to genuinely different approximation algorithms, particularly for complex problems like partially observable Markov decision processes and factored state spaces where traditional value function methods struggle.

The conversation traces the intellectual lineage from Kalman's duality between control and filtering through Bert Kappen's work on path integrals to Toussaint's own generalization that operates over joint state-control processes without restrictive assumptions about dynamics or cost structure. A key theoretical achievement is demonstrating that many existing reinforcement learning algorithms emerge as special cases of this unified formulation, providing both theoretical elegance and inherited empirical validation.

Toussaint derives a model-free reinforcement learning algorithm from this framework where the policy is represented as a Boltzmann distribution. Analysis of its fixed-point properties reveals a surprising result: for non-optimal actions, the Boltzmann energy diverges to negative infinity, making them vanishingly improbable, while for optimal actions, it converges exactly to the optimal value function. The framework handles goal conflicts through the natural machinery of probabilistic inference, where inconsistent evidence simply reduces likelihood and the system finds probabilistic compromises.

The episode also explores Toussaint's robotics applications, where model-based approaches using stochastic relational rules enable robots to generalize from minimal experience. Active exploration strategies that maximize information gain prove essential in the exponentially large state spaces created by relational representations of multi-object environments, allowing a robot that has observed balls rolling to intelligently seek out non-ball-shaped objects to test next.

← All episodes of How collaboration arrises and why it fails