
Daily Paper Cast
1,918 episodes — Page 2 of 39
SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture
May 14, 202625 min
$δ$-mem: Efficient Online Memory for Large Language Models
May 14, 202624 min
RubricEM: Meta-RL with Rubric-guided Policy Decomposition beyond Verifiable Rewards
May 14, 202622 min
Do Enterprise Systems Need Learned World Models? The Importance of Context to Infer Dynamics
May 14, 202622 min
World Action Models: The Next Frontier in Embodied AI
May 14, 202624 min
Beyond the Last Layer: Multi-Layer Representation Fusion for Visual Tokenization
May 14, 202625 min
Efficient Pre-Training with Token Superposition
May 14, 202624 min
AlphaGRPO: Unlocking Self-Reflective Multimodal Generation in UMMs via Decompositional Verifiable Reward
May 14, 202623 min
MCP-Cosmos: World Model-Augmented Agents for Complex Task Execution in MCP Environments
May 14, 202621 min
Qwen-Image-2.0 Technical Report
May 13, 202623 min
Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs
May 13, 202623 min
CollabVR: Collaborative Video Reasoning with Vision-Language and Video Generation Models
May 13, 202625 min
TMAS: Scaling Test-Time Compute via Multi-Agent Synergy
May 13, 202623 min
PaperFit: Vision-in-the-Loop Typesetting Optimization for Scientific Documents
May 13, 202622 min
Model Merging Scaling Laws in Large Language Models
May 13, 202621 min
SEIF: Self-Evolving Reinforcement Learning for Instruction Following
May 13, 202621 min
WorldReasonBench: Human-Aligned Stress Testing of Video Generators as Future World-State Predictors
May 13, 202622 min
Memory-Efficient Looped Transformer: Decoupling Compute from Memory in Looped Language Models
May 13, 202622 min
Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers
May 12, 202621 min
Flow-OPD: On-Policy Distillation for Flow Matching Models
May 12, 202626 min
HyperEyes: Dual-Grained Efficiency-Aware Reinforcement Learning for Parallel Multimodal Search Agents
May 12, 202625 min
Anisotropic Modality Align
May 12, 202622 min
Beyond Retrieval: A Multitask Benchmark and Model for Code Search
May 12, 202621 min
MiA-Signature: Approximating Global Activation for Long-Context Understanding
May 9, 202611 min
When to Trust Imagination: Adaptive Action Execution for World Action Models
May 9, 202612 min
Continuous-Time Distribution Matching for Few-Step Diffusion Distillation
May 9, 202615 min
Stream-R1: Reliability-Perplexity Aware Reward Distillation for Streaming Video Generation
May 8, 202622 min
Stream-T1: Test-Time Scaling for Streaming Video Generation
May 8, 202621 min
RLDX-1 Technical Report
May 8, 202623 min
OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents
May 8, 202625 min
HERMES++: Toward a Unified Driving World Model for 3D Scene Understanding and Generation
May 8, 202623 min
PhysForge: Generating Physics-Grounded 3D Assets for Interactive Virtual World
May 8, 202621 min
ARIS: Autonomous Research via Adversarial Multi-Agent Collaboration
May 7, 202624 min
OpenSeeker-v2: Pushing the Limits of Search Agents with Informative and High-Difficulty Trajectories
May 7, 202621 min
Beyond SFT-to-RL: Pre-alignment via Black-Box On-Policy Distillation for Multimodal RL
May 7, 202621 min
MolmoAct2: Action Reasoning Models for Real-world Deployment
May 6, 202625 min
From Context to Skills: Can Language Models Learn from Context Skillfully?
May 6, 202619 min
UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors
May 5, 202622 min
Web2BigTable: A Bi-Level Multi-Agent LLM System for Internet-Scale Information Search and Extraction
May 5, 202623 min
Heterogeneous Scientific Foundation Model Collaboration
May 2, 202625 min
Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling
May 2, 202620 min
Co-Evolving Policy Distillation
May 2, 202622 min
ExoActor: Exocentric Video Generation as Generalizable Interactive Humanoid Control
May 2, 202627 min
Efficient Training on Multiple Consumer GPUs with RoundPipe
May 2, 202623 min
GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents
May 1, 202625 min
Large Language Models Explore by Latent Distilling
May 1, 202622 min
RADIO-ViPE: Online Tightly Coupled Multi-Modal Fusion for Open-Vocabulary Semantic SLAM in Dynamic Environments
May 1, 202624 min
ClawGym: A Scalable Framework for Building Effective Claw Agents
May 1, 202626 min
Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models
May 1, 202622 min
Recursive Multi-Agent Systems
Apr 30, 202625 min