PLAY PODCASTS
Scaling Properties of Diffusion Models for Perceptual Tasks
Episode 73

Scaling Properties of Diffusion Models for Perceptual Tasks

Daily Paper Cast

November 14, 202425m 9s

Audio is streamed directly from the publisher (media.transistor.fm) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.

Show Notes

🤗 Paper Upvotes: 7 | cs.CV, cs.AI

Authors:
Rahul Ravishankar, Zeeshan Patel, Jathushan Rajasegaran, Jitendra Malik

Title:
Scaling Properties of Diffusion Models for Perceptual Tasks

Arxiv:
http://arxiv.org/abs/2411.08034v2

Abstract:
In this paper, we argue that iterative computation with diffusion models offers a powerful paradigm for not only generation but also visual perception tasks. We unify tasks such as depth estimation, optical flow, and amodal segmentation under the framework of image-to-image translation, and show how diffusion models benefit from scaling training and test-time compute for these perceptual tasks. Through a careful analysis of these scaling properties, we formulate compute-optimal training and inference recipes to scale diffusion models for visual perception tasks. Our models achieve competitive performance to state-of-the-art methods using significantly less data and compute. To access our code and models, see https://scaling-diffusion-perception.github.io .