PLAY PODCASTS
$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning
Episode 985

$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning

Daily Paper Cast

July 19, 202520m 25s

Audio is streamed directly from the publisher (media.transistor.fm) as published in their RSS feed. Play Podcasts does not host this file. Rights-holders can request removal through the copyright & takedown page.

Show Notes

🤗 Upvotes: 36 | cs.CV

Authors:
Yifan Wang, Jianjun Zhou, Haoyi Zhu, Wenzheng Chang, Yang Zhou, Zizun Li, Junyi Chen, Jiangmiao Pang, Chunhua Shen, Tong He

Title:
$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning

Arxiv:
http://arxiv.org/abs/2507.13347v1

Abstract:
We introduce $\pi^3$, a feed-forward neural network that offers a novel approach to visual geometry reconstruction, breaking the reliance on a conventional fixed reference view. Previous methods often anchor their reconstructions to a designated viewpoint, an inductive bias that can lead to instability and failures if the reference is suboptimal. In contrast, $\pi^3$ employs a fully permutation-equivariant architecture to predict affine-invariant camera poses and scale-invariant local point maps without any reference frames. This design makes our model inherently robust to input ordering and highly scalable. These advantages enable our simple and bias-free approach to achieve state-of-the-art performance on a wide range of tasks, including camera pose estimation, monocular/video depth estimation, and dense point map reconstruction. Code and models are publicly available.