🧪 Auto-Research Harness Report

auto-research-harness-v1 · 2026-05-23 23:26 CST · 3/5 tasks passed

3/5
Tasks Passed
89%
Average Score
5
Total Tasks
✅ 论文要点提取 generation
📝 https://arxiv.org/abs/2605.14434
Score: 8/8 · Passed
min_length >= 200 · actual: 1028
max_length <= 2000 · actual: 1028
keyword: CQ-SID
keyword: EG-GRPO
keyword: generative retrieval
keyword: GMV
comparison table present
≥3 markdown sections
✅ Related Work 检索 retrieval
📝 生成式文档检索 (Generative Information Retrieval)
Score: 5/5 · Passed
contains: DSI
contains: Differentiable Search Index
≥4 papers listed · actual: 15
has links to papers
one-line summaries present
❌ 方案对比表 comparison
📝 生成式召回 vs 稠密检索做电商搜索召回
Score: 6/7 · Failed
≥4 comparison dimensions · actual: 2
dimension: 延迟
dimension: 覆盖
dimension: 精度
dimension: 增量更新
trade-off analysis present
conclusion/summary row
✅ 局限性批判 critique
📝 https://arxiv.org/abs/2605.14434
Score: 4/4 · Passed
≥3 limitations mentioned · actual: 5
evidence level tags present
specific references to paper
min_length >= 300 · actual: 1443
❌ 公式/算法解读 explanation
📝 论文中 EG-GRPO 的 Expert Guidance 机制(expert samples 注入 group,不计入梯度,只参与 mean/std 计算)
Score: 3/5 · Failed
min_length >= 150 · actual: 1042
explains: 为什么 GRPO 会 mode collapse
explains: expert guidance 怎么缓解
intuition / analogy present
jargon density OK (≤8 terms, got 1)