PLAY PODCASTS
Daily Paper Cast

Daily Paper Cast

1,918 episodes — Page 3 of 39

Programming with Data: Test-Driven Data Engineering for Self-Improving LLMs from Raw Corpora

Apr 30, 202624 min

DV-World: Benchmarking Data Visualization Agents in Real-World Scenarios

Apr 30, 202623 min

AutoResearchBench: Benchmarking AI Agents on Complex Scientific Literature Discovery

Apr 30, 202621 min

Meta-CoT: Enhancing Granularity and Generalization in Image Editing

Apr 30, 202622 min

Refinement via Regeneration: Enlarging Modification Space Boosts Image Refinement in Unified Multimodal Models

Apr 30, 202623 min

World-R1: Reinforcing 3D Constraints for Text-to-Video Generation

Apr 29, 202621 min

From Skills to Talent: Organising Heterogeneous Agents as a Real-World Company

Apr 29, 202625 min

ReVSI: Rebuilding Visual Spatial Intelligence Evaluation for Accurate Assessment of VLM 3D Reasoning

Apr 29, 202622 min

Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation

Apr 29, 202622 min

Vision-Language-Action Safety: Threats, Challenges, Evaluations, and Mechanisms

Apr 29, 202629 min

ClawMark: A Living-World Benchmark for Multi-Turn, Multi-Day, Multimodal Coworker Agents

Apr 29, 202622 min

SketchVLM: Vision language models can annotate images to explain thoughts and guide users

Apr 29, 202621 min

Video Analysis and Generation via a Semantic Progress Function

Apr 28, 202620 min

DiffNR: Diffusion-Enhanced Neural Representation Optimization for Sparse-View 3D Tomographic Reconstruction

Apr 28, 202625 min

LLM Safety From Within: Detecting Harmful Content with Internal Representations

Apr 28, 202623 min

LLaTiSA: Towards Difficulty-Stratified Time Series Reasoning from Visual Perception to Semantics

Apr 25, 202626 min

WorldMark: A Unified Benchmark Suite for Interactive Video World Models

Apr 25, 202625 min

UniT: Toward a Unified Physical Language for Human-to-Humanoid Policy Learning and World Modeling

Apr 25, 202626 min

LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model

Apr 24, 202625 min

Near-Future Policy Optimization

Apr 24, 202622 min

DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data

Apr 24, 202624 min

OpenMobile: Building Open Mobile Agents with Task and Trajectory Synthesis

Apr 24, 202626 min

DeVI: Physics-based Dexterous Human-Object Interaction via Synthetic Video Imitation

Apr 24, 202623 min

Tstars-Tryon 1.0: Robust and Realistic Virtual Try-On for Diverse Fashion Items

Apr 23, 202623 min

CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation

Apr 23, 202621 min

AgentSPEX: An Agent SPecification and EXecution Language

Apr 23, 202622 min

AnyRecon: Arbitrary-View 3D Reconstruction with Video Diffusion Model

Apr 23, 202624 min

TEMPO: Scaling Test-time Training for Large Reasoning Models

Apr 23, 202623 min

Extending One-Step Image Generation from Class Labels to Text via Discriminative Text Representation

Apr 22, 202620 min

OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation

Apr 22, 202626 min

Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence

Apr 22, 202624 min

OpenGame: Open Agentic Coding for Games

Apr 22, 202625 min

MultiWorld: Scalable Multi-Agent Multi-View Video World Models

Apr 22, 202621 min

EasyVideoR1: Easier RL for Video Understanding

Apr 22, 202627 min

Elucidating the SNR-t Bias of Diffusion Probabilistic Models

Apr 21, 202622 min

Maximal Brain Damage Without Data or Optimization: Disrupting Neural Networks via Sign-Bit Flips

Apr 21, 202622 min

PersonaVLM: Long-Term Personalized Multimodal LLMs

Apr 21, 202624 min

Qwen3.5-Omni Technical Report

Apr 21, 202624 min

HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds

Apr 18, 202624 min

RAD-2: Scaling Reinforcement Learning in a Generator-Discriminator Framework

Apr 18, 202622 min

DR$^{3}$-Eval: Towards Realistic and Reproducible Deep Research Evaluation

Apr 18, 202624 min

Seedance 2.0: Advancing Video Generation for World Complexity

Apr 17, 202627 min

GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents

Apr 17, 202626 min

RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time

Apr 17, 202624 min

SpatialEvo: Self-Evolving Spatial Intelligence via Deterministic Geometric Environments

Apr 17, 202624 min

OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language Environment Simulation

Apr 17, 202627 min

Memory Transfer Learning: How Memories are Transferred Across Domains in Coding Agents

Apr 17, 202626 min

From $P(y|x)$ to $P(y)$: Investigating Reinforcement Learning in Pre-train Space

Apr 17, 202623 min

Exploration and Exploitation Errors Are Measurable for Language Model Agents

Apr 17, 202622 min

ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents

Apr 16, 202623 min