
Daily Paper Cast
1,918 episodes — Page 3 of 39
Programming with Data: Test-Driven Data Engineering for Self-Improving LLMs from Raw Corpora
Apr 30, 202624 min
DV-World: Benchmarking Data Visualization Agents in Real-World Scenarios
Apr 30, 202623 min
AutoResearchBench: Benchmarking AI Agents on Complex Scientific Literature Discovery
Apr 30, 202621 min
Meta-CoT: Enhancing Granularity and Generalization in Image Editing
Apr 30, 202622 min
Refinement via Regeneration: Enlarging Modification Space Boosts Image Refinement in Unified Multimodal Models
Apr 30, 202623 min
World-R1: Reinforcing 3D Constraints for Text-to-Video Generation
Apr 29, 202621 min
From Skills to Talent: Organising Heterogeneous Agents as a Real-World Company
Apr 29, 202625 min
ReVSI: Rebuilding Visual Spatial Intelligence Evaluation for Accurate Assessment of VLM 3D Reasoning
Apr 29, 202622 min
Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation
Apr 29, 202622 min
Vision-Language-Action Safety: Threats, Challenges, Evaluations, and Mechanisms
Apr 29, 202629 min
ClawMark: A Living-World Benchmark for Multi-Turn, Multi-Day, Multimodal Coworker Agents
Apr 29, 202622 min
SketchVLM: Vision language models can annotate images to explain thoughts and guide users
Apr 29, 202621 min
Video Analysis and Generation via a Semantic Progress Function
Apr 28, 202620 min
DiffNR: Diffusion-Enhanced Neural Representation Optimization for Sparse-View 3D Tomographic Reconstruction
Apr 28, 202625 min
LLM Safety From Within: Detecting Harmful Content with Internal Representations
Apr 28, 202623 min
LLaTiSA: Towards Difficulty-Stratified Time Series Reasoning from Visual Perception to Semantics
Apr 25, 202626 min
WorldMark: A Unified Benchmark Suite for Interactive Video World Models
Apr 25, 202625 min
UniT: Toward a Unified Physical Language for Human-to-Humanoid Policy Learning and World Modeling
Apr 25, 202626 min
LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model
Apr 24, 202625 min
Near-Future Policy Optimization
Apr 24, 202622 min
DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data
Apr 24, 202624 min
OpenMobile: Building Open Mobile Agents with Task and Trajectory Synthesis
Apr 24, 202626 min
DeVI: Physics-based Dexterous Human-Object Interaction via Synthetic Video Imitation
Apr 24, 202623 min
Tstars-Tryon 1.0: Robust and Realistic Virtual Try-On for Diverse Fashion Items
Apr 23, 202623 min
CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation
Apr 23, 202621 min
AgentSPEX: An Agent SPecification and EXecution Language
Apr 23, 202622 min
AnyRecon: Arbitrary-View 3D Reconstruction with Video Diffusion Model
Apr 23, 202624 min
TEMPO: Scaling Test-time Training for Large Reasoning Models
Apr 23, 202623 min
Extending One-Step Image Generation from Class Labels to Text via Discriminative Text Representation
Apr 22, 202620 min
OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation
Apr 22, 202626 min
Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence
Apr 22, 202624 min
OpenGame: Open Agentic Coding for Games
Apr 22, 202625 min
MultiWorld: Scalable Multi-Agent Multi-View Video World Models
Apr 22, 202621 min
EasyVideoR1: Easier RL for Video Understanding
Apr 22, 202627 min
Elucidating the SNR-t Bias of Diffusion Probabilistic Models
Apr 21, 202622 min
Maximal Brain Damage Without Data or Optimization: Disrupting Neural Networks via Sign-Bit Flips
Apr 21, 202622 min
PersonaVLM: Long-Term Personalized Multimodal LLMs
Apr 21, 202624 min
Qwen3.5-Omni Technical Report
Apr 21, 202624 min
HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds
Apr 18, 202624 min
RAD-2: Scaling Reinforcement Learning in a Generator-Discriminator Framework
Apr 18, 202622 min
DR$^{3}$-Eval: Towards Realistic and Reproducible Deep Research Evaluation
Apr 18, 202624 min
Seedance 2.0: Advancing Video Generation for World Complexity
Apr 17, 202627 min
GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents
Apr 17, 202626 min
RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time
Apr 17, 202624 min
SpatialEvo: Self-Evolving Spatial Intelligence via Deterministic Geometric Environments
Apr 17, 202624 min
OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language Environment Simulation
Apr 17, 202627 min
Memory Transfer Learning: How Memories are Transferred Across Domains in Coding Agents
Apr 17, 202626 min
From $P(y|x)$ to $P(y)$: Investigating Reinforcement Learning in Pre-train Space
Apr 17, 202623 min
Exploration and Exploitation Errors Are Measurable for Language Model Agents
Apr 17, 202622 min
ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents
Apr 16, 202623 min