今日从 arXiv 订阅中筛选 8 篇论文。
⚡ Forecasting the Past Gradient-Based Distribution Shift Detection in Trajectory Prediction
⚡ Don’t Show Pixels, Show Cues Unlocking Visual Tool Reasoning in Language Models via Perception Programs

⚡ All in One A Unified Synthetic Data Pipeline for Multimodal Video Understanding

⚡ Unlocking the Potential of Grounding DINO in Videos Parameter-Efficient Adaptation for Limited-Data Spatial-Temporal Loc

⚡ GeoAlign Geometric Feature Realignment for MLLM Spatial Reasoning

⚡ Why and When Visual Token Pruning Fails A Study on Relevant Visual Information Shift in MLLMs Decoding

自动生成于 2026-04-16 · 基于 arXiv Daily Digest

