今日从 arXiv 订阅中筛选 8 篇论文。
⚡ Frozen LLMs as Map-Aware Spatio-Temporal Reasoners for Vehicle Trajectory Prediction

⚡ Grounding Video Reasoning in Physical Signals
⚡ HiCrew Hierarchical Reasoning for Long-Form Video Understanding via Question-Aware Multi-Agent Collaboration

⚡ Thinking Like a Botanist Challenging Multimodal Language Models with Intent-Driven Chain-of-Inquiry
⚡ Sink-Token-Aware Pruning for Fine-Grained Video Understanding in Efficient Video LLMs

⚡ Reinforcing 3D Understanding in Point-VLMs via Geometric Reward Credit Assignment

⚡ Do MLLMs Understand Pointing Benchmarking and Enhancing Referential Reasoning in Egocentric Vision

⚡ Encoder-Free Human Motion Understanding via Structured Motion Descriptions

自动生成于 2026-04-24 · 基于 arXiv Daily Digest