PLAY PODCASTS
Daily Paper Cast

Daily Paper Cast

1,918 episodes — Page 2 of 39

SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture

May 14, 202625 min

$δ$-mem: Efficient Online Memory for Large Language Models

May 14, 202624 min

RubricEM: Meta-RL with Rubric-guided Policy Decomposition beyond Verifiable Rewards

May 14, 202622 min

Do Enterprise Systems Need Learned World Models? The Importance of Context to Infer Dynamics

May 14, 202622 min

World Action Models: The Next Frontier in Embodied AI

May 14, 202624 min

Beyond the Last Layer: Multi-Layer Representation Fusion for Visual Tokenization

May 14, 202625 min

Efficient Pre-Training with Token Superposition

May 14, 202624 min

AlphaGRPO: Unlocking Self-Reflective Multimodal Generation in UMMs via Decompositional Verifiable Reward

May 14, 202623 min

MCP-Cosmos: World Model-Augmented Agents for Complex Task Execution in MCP Environments

May 14, 202621 min

Qwen-Image-2.0 Technical Report

May 13, 202623 min

Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs

May 13, 202623 min

CollabVR: Collaborative Video Reasoning with Vision-Language and Video Generation Models

May 13, 202625 min

TMAS: Scaling Test-Time Compute via Multi-Agent Synergy

May 13, 202623 min

PaperFit: Vision-in-the-Loop Typesetting Optimization for Scientific Documents

May 13, 202622 min

Model Merging Scaling Laws in Large Language Models

May 13, 202621 min

SEIF: Self-Evolving Reinforcement Learning for Instruction Following

May 13, 202621 min

WorldReasonBench: Human-Aligned Stress Testing of Video Generators as Future World-State Predictors

May 13, 202622 min

Memory-Efficient Looped Transformer: Decoupling Compute from Memory in Looped Language Models

May 13, 202622 min

Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers

May 12, 202621 min

Flow-OPD: On-Policy Distillation for Flow Matching Models

May 12, 202626 min

HyperEyes: Dual-Grained Efficiency-Aware Reinforcement Learning for Parallel Multimodal Search Agents

May 12, 202625 min

Anisotropic Modality Align

May 12, 202622 min

Beyond Retrieval: A Multitask Benchmark and Model for Code Search

May 12, 202621 min

MiA-Signature: Approximating Global Activation for Long-Context Understanding

May 9, 202611 min

When to Trust Imagination: Adaptive Action Execution for World Action Models

May 9, 202612 min

Continuous-Time Distribution Matching for Few-Step Diffusion Distillation

May 9, 202615 min

Stream-R1: Reliability-Perplexity Aware Reward Distillation for Streaming Video Generation

May 8, 202622 min

Stream-T1: Test-Time Scaling for Streaming Video Generation

May 8, 202621 min

RLDX-1 Technical Report

May 8, 202623 min

OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents

May 8, 202625 min

HERMES++: Toward a Unified Driving World Model for 3D Scene Understanding and Generation

May 8, 202623 min

PhysForge: Generating Physics-Grounded 3D Assets for Interactive Virtual World

May 8, 202621 min

ARIS: Autonomous Research via Adversarial Multi-Agent Collaboration

May 7, 202624 min

OpenSeeker-v2: Pushing the Limits of Search Agents with Informative and High-Difficulty Trajectories

May 7, 202621 min

Beyond SFT-to-RL: Pre-alignment via Black-Box On-Policy Distillation for Multimodal RL

May 7, 202621 min

MolmoAct2: Action Reasoning Models for Real-world Deployment

May 6, 202625 min

From Context to Skills: Can Language Models Learn from Context Skillfully?

May 6, 202619 min

UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors

May 5, 202622 min

Web2BigTable: A Bi-Level Multi-Agent LLM System for Internet-Scale Information Search and Extraction

May 5, 202623 min

Heterogeneous Scientific Foundation Model Collaboration

May 2, 202625 min

Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling

May 2, 202620 min

Co-Evolving Policy Distillation

May 2, 202622 min

ExoActor: Exocentric Video Generation as Generalizable Interactive Humanoid Control

May 2, 202627 min

Efficient Training on Multiple Consumer GPUs with RoundPipe

May 2, 202623 min

GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents

May 1, 202625 min

Large Language Models Explore by Latent Distilling

May 1, 202622 min

RADIO-ViPE: Online Tightly Coupled Multi-Modal Fusion for Open-Vocabulary Semantic SLAM in Dynamic Environments

May 1, 202624 min

ClawGym: A Scalable Framework for Building Effective Claw Agents

May 1, 202626 min

Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models

May 1, 202622 min

Recursive Multi-Agent Systems

Apr 30, 202625 min