Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement Learning for General LLM Reasoning Paper • 2508.16949 • Published Aug 23, 2025 • 24
OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for Over-Reasoning Mitigation Paper • 2506.02397 • Published Jun 3, 2025 • 36