今日从 arXiv 订阅中筛选 8 篇论文。

⚡ HiVLA A Visual-Grounded-Centric Hierarchical Embodied Manipulation System

HiVLA A Visual-Grounded-Centric Hierarchical Embodied Manipulation System

⚡ SpatialEvo Self-Evolving Spatial Intelligence via Deterministic Geometric Environments

SpatialEvo Self-Evolving Spatial Intelligence via Deterministic Geometric Environments

⚡ Reward Design for Physical Reasoning in Vision-Language Models

Reward Design for Physical Reasoning in Vision-Language Models

⚡ POINTS-Seeker Towards Training a Multimodal Agentic Search Model from Scratch

POINTS-Seeker Towards Training a Multimodal Agentic Search Model from Scratch

⚡ Training-Free Semantic Multi-Object Tracking with Vision-Language Models

Training-Free Semantic Multi-Object Tracking with Vision-Language Models

⚡ Beyond State Consistency Behavior Consistency in Text-Based World Models

Beyond State Consistency Behavior Consistency in Text-Based World Models

⚡ One Token per Highly Selective Frame Towards Extreme Compression for Long Video Understanding

One Token per Highly Selective Frame Towards Extreme Compression for Long Video Understanding

⚡ Exploration and Exploitation Errors Are Measurable for Language Model Agents

Exploration and Exploitation Errors Are Measurable for Language Model Agents

自动生成于 2026-04-17 · 基于 arXiv Daily Digest