PLAY PODCASTS
Daily Paper Cast

Daily Paper Cast

1,918 episodes — Page 1 of 39

Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent Pretraining

May 22, 202619 min

Mega-ASR: Towards In-the-wild^2 Speech Recognition via Scaling up Real-world Acoustic Simulation

May 22, 202623 min

Enhancing Train-Free Infinite-Frame Generation for Consistent Long Videos

May 22, 202625 min

IndusAgent: Reinforcing Open-Vocabulary Industrial Anomaly Detection with Agentic Tools

May 22, 202624 min

When Vision Speaks for Sound

May 21, 202623 min

Active Learners as Efficient PRP Rerankers

May 21, 202623 min

Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information

May 21, 202622 min

AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration

May 21, 202623 min

OpenComputer: Verifiable Software Worlds for Computer-Use Agents

May 21, 202624 min

GoLongRL: Capability-Oriented Long Context Reinforcement Learning with Multitask Alignment

May 21, 202624 min

Process Rewards with Learned Reliability

May 21, 202623 min

EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL

May 21, 202627 min

CogOmniControl: Reasoning-Driven Controllable Video Generation via Creative Intent Cognition

May 21, 202623 min

Harnessing LLM Agents with Skill Programs

May 21, 202622 min

Code as Agent Harness

May 20, 202625 min

SkillsVote: Lifecycle Governance of Agent Skills from Collection, Recommendation to Evolution

May 20, 202622 min

LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation

May 20, 202622 min

Lance: Unified Multimodal Modeling by Multi-Task Synergy

May 20, 202623 min

AI for Auto-Research: Roadmap & User Guide

May 20, 202622 min

CHI-Bench: Can AI Agents Automate End-to-End, Long-Horizon, Policy-Rich Healthcare Workflows?

May 20, 202623 min

KVPO: ODE-Native GRPO for Autoregressive Video Alignment via KV Semantic Exploration

May 20, 202623 min

CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence

May 19, 202623 min

PhysBrain 1.0 Technical Report

May 19, 202625 min

MMSkills: Towards Multimodal Skills for General Visual Agents

May 19, 202622 min

DexJoCo: A Benchmark and Toolkit for Task-Oriented Dexterous Manipulation on MuJoCo

May 19, 202623 min

Distilling Long-CoT Reasoning through Collaborative Step-wise Multi-Teacher Decoding

May 19, 202621 min

InsightTok: Improving Text and Face Fidelity in Discrete Tokenization for Autoregressive Image Generation

May 19, 202624 min

Flash-GRPO: Efficient Alignment for Video Diffusion via One-Step Policy Optimization

May 19, 202621 min

Nudging Beyond the Comfort Zone: Efficient Strategy-Guided Exploration for RLVR

May 19, 202621 min

Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling

May 16, 202622 min

Causal Forcing++: Scalable Few-Step Autoregressive Diffusion Distillation for Real-Time Interactive Video Generation

May 16, 202623 min

Self-Distilled Agentic Reinforcement Learning

May 16, 202624 min

MemLens: Benchmarking Multimodal Long-Term Memory in Large Vision-Language Models

May 16, 202626 min

SANA-WM: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer

May 16, 202622 min

MemEye: A Visual-Centric Evaluation Framework for Multimodal Agent Memory

May 16, 202622 min

Darwin Family: MRI-Trust-Weighted Evolutionary Merging for Training-Free Scaling of Language-Model Reasoning

May 16, 202623 min

Beyond Individual Intelligence: Surveying Collaboration, Failure Attribution, and Self-Evolution in LLM-based Multi-Agent Systems

May 16, 202621 min

STALE: Can LLM Agents Know When Their Memories Are No Longer Valid?

May 16, 202623 min

WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation

May 16, 202624 min

MinT: Managed Infrastructure for Training and Serving Millions of LLMs

May 15, 202623 min

MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image

May 15, 202624 min

AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation

May 15, 202623 min

Training Long-Context Vision-Language Models Effectively with Generalization Beyond 128K Context

May 15, 202623 min

EVA-Bench: A New End-to-end Framework for Evaluating Voice Agents

May 15, 202625 min

Predicting Decisions of AI Agents from Limited Interaction through Text-Tabular Modeling

May 15, 202624 min

Qwen-Image-VAE-2.0 Technical Report

May 15, 202624 min

TrackCraft3R: Repurposing Video Diffusion Transformers for Dense 3D Tracking

May 15, 202623 min

Edit-Compass & EditReward-Compass: A Unified Benchmark for Image Editing and Reward Modeling

May 15, 202623 min

Many-Shot CoT-ICL: Making In-Context Learning Truly Learn

May 15, 202624 min

MemPrivacy: Privacy-Preserving Personalized Memory Management for Edge-Cloud Agents

May 14, 202624 min