← Back to Blog

Research

2026-05-27 Paper Reading

2026-05-27 2 min read paper reading arXiv

今日 arXiv 论文速读：10 篇入选 shortlist。

今日从 arXiv 订阅中筛选 10 篇论文。

Arxiv ID2605.26282 幻觉翻译2605.26282

⚡ Scaling World-Model RL Through Diffusion Policy Optimization

用扩散策略优化统一世界模型中的搜索与价值学习，解决模型偏差和错误累积问题。

Scaling World-Model RL Through Diffusion Policy Optimization

Arxiv ID2605.26520 幻觉翻译2605.26520

⚡ InterSketch: An Interleaved Reasoning Model with Self-correcting Visual Sketch and Stepwise Reward

交错视觉-文本思维链 (VT-CoT) + 自修正草图，提升 VLM 多轮视觉推理深度。

InterSketch: An Interleaved Reasoning Model with Self-correcting Visual Sketch and Stepwise Reward

Arxiv ID2605.27310 幻觉翻译2605.27310

⚡ How and What to Imagine? Visual Thinking in Unified Multimodal Models for Cross-View Spatial Reasoning

揭示 VLM 在跨视角空间推理中"用语言思考而非真正视觉思考"的局限。

How and What to Imagine? Visual Thinking in Unified Multimodal Models for Cross-View Spatial Reasoning

Arxiv ID2605.26316 幻觉翻译2605.26316

⚡ E3C: Video Generation with 3D Environmental Memory and Ego-Exo Human Pose Control

可控的第一人称视频生成，结合 3D 点云环境记忆与人体姿态控制。

E3C: Video Generation with 3D Environmental Memory and Ego-Exo Human Pose Control

Arxiv ID2605.26500 幻觉翻译2605.26500

3DGS 做 VLN 语义建图，开放集语义分组替代稠密特征采样。

Arxiv ID2605.26680 幻觉翻译2605.26680

⚡ DynFrame: Adaptive Reasoning-Driven Multimodal Framework with Dynamic Frame Augmentation

自适应帧采样密度作为原生 token，配合 SD-GRPO 实现单步多粒度证据获取。

Arxiv ID2605.27365 幻觉翻译2605.27365

⚡ LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding

并行 box 解码替代序列化坐标生成，提升 grounding 吞吐与精度。

LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding

Arxiv ID2605.27101 幻觉翻译2605.27101

⚡ Pop-Up Distractions Reveal Bag-of-Events Behavior in Video Large Language Models

发现 VideoLLM 存在"事件袋"行为：跨段实体关联失败、幻觉交互。

Arxiv ID2605.26239 幻觉翻译2605.26239

⚡ Sentinel: Embodied Cooperative Spatial Reasoning and Planning

去中心化具身智能体在城市场景中的协同空间推理与重新规划。

Sentinel: Embodied Cooperative Spatial Reasoning and Planning

Arxiv ID2605.26642 幻觉翻译2605.26642

⚡ Adaptation-Free Heterogeneous Collaborative Perception with Unseen Agent Configurations

零适配异构协同感知，box 级消息转 ego 兼容特征，仅需 120 字节/帧。

自动生成于 2026-05-27 · 基于 arXiv Daily Digest

2026-05-27 Paper Reading

https://eric-zhang007.github.io/astro-github-pages-site/blog/2026-05-27-paper-reading/

Author: Eric Zhang
Published at: May 27, 2026
Copyright: CC BY-NC-SA 4.0

Buy me a cup of coffee ☕. $

Comments