# Research Radar · 2026-05-25 (Mon)

_Auto-generated daily arXiv monitoring dashboard_

## 🔬 Fine-Grained CV (5 papers)

### 1. DFSAttn: Dynamic Fine-grained Sparse Attention for Efficient Video Generation

- **arXiv:** `2605.23445` | [Abstract](https://arxiv.org/abs/2605.23445v1) | [PDF](https://arxiv.org/pdf/2605.23445v1)
- **Authors:** Jie Hu, Zixiang Gao, Yutong He, Kun Yuan
- **Published:** 2026-05-22

Diffusion transformers have achieved remarkable success in high-quality video generation, yet their reliance on spatiotemporal 3D full attention incurs prohibitive computational cost due to the quadratic complexity of attention. Block sparse attention is a common approach to mitigate this by focusin

---

### 2. Decoupling Spatio-Temporal Adapter for Fine-Grained Badminton Action Localization

- **arXiv:** `2605.23355` | [Abstract](https://arxiv.org/abs/2605.23355v1) | [PDF](https://arxiv.org/pdf/2605.23355v1)
- **Authors:** Tianyu Wang, Junjie Wu, Jingquan Gao, Shishuo Li
- **Published:** 2026-05-22

Temporal Action Localization (TAL) has been extensively studied in generic video understanding, while fine-grained sports scenarios, such as professional badminton, remain underexplored due to their complex and subtle spatio-temporal dynamics. In this paper, we focus on fine-grained TAL in professio

---

### 3. ETCHR: Editing To Clarify and Harness Reasoning

- **arXiv:** `2605.23897` | [Abstract](https://arxiv.org/abs/2605.23897v1) | [PDF](https://arxiv.org/pdf/2605.23897v1)
- **Authors:** Beichen Zhang, Yuhong Liu, Jinsong Li, Yuhang Zang, Jiaqi Wang et al.
- **Published:** 2026-05-22

Multimodal Large Language Models have advanced visual reasoning, yet a purely textual chain of thought remains a bottleneck for questions that require fine-grained focus or view transformations. The ''think with images'' paradigm narrows this gap, but existing approaches are either constrained by fi

---

### 4. PGT: Procedurally Generated Tasks for improving visual grounding in MLLMs

- **arXiv:** `2605.23883` | [Abstract](https://arxiv.org/abs/2605.23883v1) | [PDF](https://arxiv.org/pdf/2605.23883v1)
- **Authors:** Rim Assouel, Amir Bar, Michal Drozdzal, Adriana Romero-Soriano
- **Published:** 2026-05-22

Despite remarkable progress in Multimodal Large Language Models (MLLMs), these models still struggle with fine-grained understanding tasks. In this work, we propose Procedurally Generated Tasks (PGT), a simple data-driven framework that serves a dual purpose: inducing fine-grained visual understandi

---

### 5. MuellerPT: Decomposition Driven Pretraining for Dense Learning in Mueller Polarimetry

- **arXiv:** `2605.23840` | [Abstract](https://arxiv.org/abs/2605.23840v1) | [PDF](https://arxiv.org/pdf/2605.23840v1)
- **Authors:** Adam Tlemsani, Yingdian Li, Maxime Giot, Naim Slim, Christopher J. Peters et al.
- **Published:** 2026-05-22

Mueller matrix imaging provides rich, physically meaningful contrast for biomedical tissue analysis, but supervised learning is hindered by scarce dense annotations and strong domain shifts across specimens and acquisition settings. We introduce MuellerPT, a physics guided pre-training approach that

---

## 📊 CTR / Ranking / IR (5 papers)

### 1. TubiFM: Unified Item, Carousel, and Search Ranking for Streaming Discovery

- **arXiv:** `2605.23702` | [Abstract](https://arxiv.org/abs/2605.23702v1) | [PDF](https://arxiv.org/pdf/2605.23702v1)
- **Authors:** Alexandre Salle, Chenglei Niu, Suchismit Mahapatra, Xiaoxiao Chen, Suvash Sedhain et al.
- **Published:** 2026-05-22

Personalized discovery systems often train separate models for item ranking, carousel ranking, and search, even though these tasks expose complementary signals from the same viewer journey: watches shape carousel and item ranking, search queries reveal intent even when they do not lead to a catalog 

---

### 2. Synthetic Sources?: Auditing Generative Search Engine Citations for Evidence of AI-Generated Sources

- **arXiv:** `2605.23684` | [Abstract](https://arxiv.org/abs/2605.23684v1) | [PDF](https://arxiv.org/pdf/2605.23684v1)
- **Authors:** Mowafak Allaham, Nicholas Diakopoulos
- **Published:** 2026-05-22

The growing accessibility of Large Language Models via conversational interfaces capable of responding to users' questions by drawing on, synthesizing, and citing information from the web (i.e., Generative Search Engines) has simplified the information-seeking process for users. However, with the pr

---

### 3. Towards Generalizable and Efficient Large-Scale Generative Recommenders

- **arXiv:** `2605.23312` | [Abstract](https://arxiv.org/abs/2605.23312v1) | [PDF](https://arxiv.org/pdf/2605.23312v1)
- **Authors:** Qiuling Xu, Ko-Jen Hsiao, Moumita Bhattacharya
- **Published:** 2026-05-22

Generative recommendation models can model user behavior as sequences of events and provide a shared backbone for multiple recommendation tasks. In production, however, pre-training gains do not automatically translate into downstream application improvements: task headroom, repeated-training cost, 

---

### 4. From Head to Tail: Asymmetric Knowledge Transfer in Long-tail Recommendation with Generative Semantic IDs

- **arXiv:** `2605.23310` | [Abstract](https://arxiv.org/abs/2605.23310v1) | [PDF](https://arxiv.org/pdf/2605.23310v1)
- **Authors:** Chenyi Yan, Ruocong Tang, Xing Fang, Yang Huang, He Guo et al.
- **Published:** 2026-05-22

Long-tail recommendation in real-world e-commerce platforms remains challenging due to severe data imbalance. Existing methods often struggle to combine content-based multimodal features with collaborative signals. Many of these methods also ignore an important asymmetry in knowledge transfer betwee

---

### 5. Expand More, Shrink Less: Shaping Effective-Rank Dynamics for Dense Scaling in Recommendation

- **arXiv:** `2605.23191` | [Abstract](https://arxiv.org/abs/2605.23191v1) | [PDF](https://arxiv.org/pdf/2605.23191v1)
- **Authors:** Guoming Li, Shangyu Zhang, Junwei Pan, Wentao Ning, Jin Chen et al.
- **Published:** 2026-05-22

Scaling recommendation models is a central challenge in recommender systems. Recently, RankMixer has emerged as an effective solution, operating on a unified token representation and alternating between token mixing and per-token feedforward networks (P-FFNs) to achieve scalable performance. However

---

## 🤖 Agent / Auto Research (5 papers)

### 1. SkillOpt: Executive Strategy for Self-Evolving Agent Skills

- **arXiv:** `2605.23904` | [Abstract](https://arxiv.org/abs/2605.23904v1) | [PDF](https://arxiv.org/pdf/2605.23904v1)
- **Authors:** Yifan Yang, Ziyang Gong, Weiquan Huang, Qihao Yang, Ziwei Zhou et al.
- **Published:** 2026-05-22

Agent skills today are hand-crafted, generated one-shot, or evolved through loosely controlled self-revision, none of which behaves like a deep-learning optimizer for the skill, and none of which reliably improves over its starting point under feedback. We argue the skill should instead be trained a

---

### 2. From Raw Experience to Skill Consumption: A Systematic Study of Model-Generated Agent Skills

- **arXiv:** `2605.23899` | [Abstract](https://arxiv.org/abs/2605.23899v1) | [PDF](https://arxiv.org/pdf/2605.23899v1)
- **Authors:** Zisu Huang, Jingwen Xu, Yifan Yang, Ziyang Gong, Qihao Yang et al.
- **Published:** 2026-05-22

Language agents increasingly improve by reusing \emph{skills} -- structured procedural artifacts distilled from past experience. In particular, \emph{domain-level} and \emph{model-generated} skills are especially promising. They offer fast adaptation within a domain by encoding domain-specific recur

---

### 3. CHRONOS: Temporally-Aware Multi-Agent Coordination for Evolving Data Marketplaces

- **arXiv:** `2605.23887` | [Abstract](https://arxiv.org/abs/2605.23887v1) | [PDF](https://arxiv.org/pdf/2605.23887v1)
- **Authors:** Joydeep Chandra
- **Published:** 2026-05-22

Temporal knowledge-graph data marketplaces face three coupled failures in static designs: stale hybrid index shortcuts reduce recall as edges evolve, stationary Shapley pricing misattributes value after distribution shifts, and uncoordinated agents over-consume a shared differential-privacy budget. 

---

### 4. Agentic Proving for Program Verification

- **arXiv:** `2605.23772` | [Abstract](https://arxiv.org/abs/2605.23772v1) | [PDF](https://arxiv.org/pdf/2605.23772v1)
- **Authors:** Alessandro Sosso, Akhil Arora, Bas Spitters
- **Published:** 2026-05-22

Agentic systems have recently emerged as state-of-the-art approaches for automated theorem proving in formal mathematics. To assess how far these capabilities extend to program verification, we evaluate Claude Code in an agentic proving framework on CLEVER, a Lean 4 benchmark for verifiable code gen

---

### 5. PhotoFlow: Agentic 3D Virtual Photography Missions

- **arXiv:** `2605.23771` | [Abstract](https://arxiv.org/abs/2605.23771v1) | [PDF](https://arxiv.org/pdf/2605.23771v1)
- **Authors:** Jiarui Guo, Haojia Wei, Yiming Zhang, Yifei Liu, Yuning Gong et al.
- **Published:** 2026-05-22

Virtual photography asks an agent to enter a prepared 3D scene with no preselected camera pose or reference image, infer a suitable shot from scene information and a language intent, choose executable camera parameters, and render the final photograph. Recent progress in vision-language models makes

---

_Report generated at 2026-05-25 14:00:00 CST_