🧪 Auto-Research Harness Report
auto-research-harness-v1 · 2026-05-23 23:26 CST · 3/5 tasks passed
📝 https://arxiv.org/abs/2605.14434
Score: 8/8 · Passed
| ✅ | min_length >= 200 · actual: 1028 |
| ✅ | max_length <= 2000 · actual: 1028 |
| ✅ | keyword: CQ-SID |
| ✅ | keyword: EG-GRPO |
| ✅ | keyword: generative retrieval |
| ✅ | keyword: GMV |
| ✅ | comparison table present |
| ✅ | ≥3 markdown sections |
📝 生成式文档检索 (Generative Information Retrieval)
Score: 5/5 · Passed
| ✅ | contains: DSI |
| ✅ | contains: Differentiable Search Index |
| ✅ | ≥4 papers listed · actual: 15 |
| ✅ | has links to papers |
| ✅ | one-line summaries present |
📝 生成式召回 vs 稠密检索做电商搜索召回
Score: 6/7 · Failed
| ❌ | ≥4 comparison dimensions · actual: 2 |
| ✅ | dimension: 延迟 |
| ✅ | dimension: 覆盖 |
| ✅ | dimension: 精度 |
| ✅ | dimension: 增量更新 |
| ✅ | trade-off analysis present |
| ✅ | conclusion/summary row |
📝 https://arxiv.org/abs/2605.14434
Score: 4/4 · Passed
| ✅ | ≥3 limitations mentioned · actual: 5 |
| ✅ | evidence level tags present |
| ✅ | specific references to paper |
| ✅ | min_length >= 300 · actual: 1443 |
📝 论文中 EG-GRPO 的 Expert Guidance 机制(expert samples 注入 group,不计入梯度,只参与 mean/std 计算)
Score: 3/5 · Failed
| ✅ | min_length >= 150 · actual: 1042 |
| ❌ | explains: 为什么 GRPO 会 mode collapse |
| ❌ | explains: expert guidance 怎么缓解 |
| ✅ | intuition / analogy present |
| ✅ | jargon density OK (≤8 terms, got 1) |