今日从 arXiv 订阅中筛选 8 篇论文。

⚡ Boosting Visual Instruction Tuning with Self-Supervised Guidance

Boosting Visual Instruction Tuning with Self-Supervised Guidance

⚡ Forecasting the Past Gradient-Based Distribution Shift Detection in Trajectory Prediction

⚡ Don’t Show Pixels, Show Cues Unlocking Visual Tool Reasoning in Language Models via Perception Programs

Don't Show Pixels, Show Cues Unlocking Visual Tool Reasoning in Language Models via Perception Programs

⚡ All in One A Unified Synthetic Data Pipeline for Multimodal Video Understanding

All in One A Unified Synthetic Data Pipeline for Multimodal Video Understanding

⚡ Unlocking the Potential of Grounding DINO in Videos Parameter-Efficient Adaptation for Limited-Data Spatial-Temporal Loc

Unlocking the Potential of Grounding DINO in Videos Parameter-Efficient Adaptation for Limited-Data Spatial-Temporal Loc

⚡ SceneCritic A Symbolic Evaluator for 3D Indoor Scene Synthesis

SceneCritic A Symbolic Evaluator for 3D Indoor Scene Synthesis

⚡ GeoAlign Geometric Feature Realignment for MLLM Spatial Reasoning

GeoAlign Geometric Feature Realignment for MLLM Spatial Reasoning

⚡ Why and When Visual Token Pruning Fails A Study on Relevant Visual Information Shift in MLLMs Decoding

Why and When Visual Token Pruning Fails A Study on Relevant Visual Information Shift in MLLMs Decoding

自动生成于 2026-04-16 · 基于 arXiv Daily Digest