
Daily Paper Cast
1,918 episodes — Page 1 of 39
Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent Pretraining
May 22, 202619 min
Mega-ASR: Towards In-the-wild^2 Speech Recognition via Scaling up Real-world Acoustic Simulation
May 22, 202623 min
Enhancing Train-Free Infinite-Frame Generation for Consistent Long Videos
May 22, 202625 min
IndusAgent: Reinforcing Open-Vocabulary Industrial Anomaly Detection with Agentic Tools
May 22, 202624 min
When Vision Speaks for Sound
May 21, 202623 min
Active Learners as Efficient PRP Rerankers
May 21, 202623 min
Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information
May 21, 202622 min
AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration
May 21, 202623 min
OpenComputer: Verifiable Software Worlds for Computer-Use Agents
May 21, 202624 min
GoLongRL: Capability-Oriented Long Context Reinforcement Learning with Multitask Alignment
May 21, 202624 min
Process Rewards with Learned Reliability
May 21, 202623 min
EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL
May 21, 202627 min
CogOmniControl: Reasoning-Driven Controllable Video Generation via Creative Intent Cognition
May 21, 202623 min
Harnessing LLM Agents with Skill Programs
May 21, 202622 min
Code as Agent Harness
May 20, 202625 min
SkillsVote: Lifecycle Governance of Agent Skills from Collection, Recommendation to Evolution
May 20, 202622 min
LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation
May 20, 202622 min
Lance: Unified Multimodal Modeling by Multi-Task Synergy
May 20, 202623 min
AI for Auto-Research: Roadmap & User Guide
May 20, 202622 min
CHI-Bench: Can AI Agents Automate End-to-End, Long-Horizon, Policy-Rich Healthcare Workflows?
May 20, 202623 min
KVPO: ODE-Native GRPO for Autoregressive Video Alignment via KV Semantic Exploration
May 20, 202623 min
CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence
May 19, 202623 min
PhysBrain 1.0 Technical Report
May 19, 202625 min
MMSkills: Towards Multimodal Skills for General Visual Agents
May 19, 202622 min
DexJoCo: A Benchmark and Toolkit for Task-Oriented Dexterous Manipulation on MuJoCo
May 19, 202623 min
Distilling Long-CoT Reasoning through Collaborative Step-wise Multi-Teacher Decoding
May 19, 202621 min
InsightTok: Improving Text and Face Fidelity in Discrete Tokenization for Autoregressive Image Generation
May 19, 202624 min
Flash-GRPO: Efficient Alignment for Video Diffusion via One-Step Policy Optimization
May 19, 202621 min
Nudging Beyond the Comfort Zone: Efficient Strategy-Guided Exploration for RLVR
May 19, 202621 min
Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling
May 16, 202622 min
Causal Forcing++: Scalable Few-Step Autoregressive Diffusion Distillation for Real-Time Interactive Video Generation
May 16, 202623 min
Self-Distilled Agentic Reinforcement Learning
May 16, 202624 min
MemLens: Benchmarking Multimodal Long-Term Memory in Large Vision-Language Models
May 16, 202626 min
SANA-WM: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer
May 16, 202622 min
MemEye: A Visual-Centric Evaluation Framework for Multimodal Agent Memory
May 16, 202622 min
Darwin Family: MRI-Trust-Weighted Evolutionary Merging for Training-Free Scaling of Language-Model Reasoning
May 16, 202623 min
Beyond Individual Intelligence: Surveying Collaboration, Failure Attribution, and Self-Evolution in LLM-based Multi-Agent Systems
May 16, 202621 min
STALE: Can LLM Agents Know When Their Memories Are No Longer Valid?
May 16, 202623 min
WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation
May 16, 202624 min
MinT: Managed Infrastructure for Training and Serving Millions of LLMs
May 15, 202623 min
MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image
May 15, 202624 min
AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation
May 15, 202623 min
Training Long-Context Vision-Language Models Effectively with Generalization Beyond 128K Context
May 15, 202623 min
EVA-Bench: A New End-to-end Framework for Evaluating Voice Agents
May 15, 202625 min
Predicting Decisions of AI Agents from Limited Interaction through Text-Tabular Modeling
May 15, 202624 min
Qwen-Image-VAE-2.0 Technical Report
May 15, 202624 min
TrackCraft3R: Repurposing Video Diffusion Transformers for Dense 3D Tracking
May 15, 202623 min
Edit-Compass & EditReward-Compass: A Unified Benchmark for Image Editing and Reward Modeling
May 15, 202623 min
Many-Shot CoT-ICL: Making In-Context Learning Truly Learn
May 15, 202624 min
MemPrivacy: Privacy-Preserving Personalized Memory Management for Edge-Cloud Agents
May 14, 202624 min