One Missing Piece for Open-Source Reasoning Models: A Dataset to Mitigate Cold-Starting Short CoT LLMs in RL Paper β’ 2506.02338 β’ Published Jun 3, 2025 β’ 5 β’ 2
Web-Shepherd: Advancing PRMs for Reinforcing Web Agents Paper β’ 2505.15277 β’ Published May 21, 2025 β’ 105 β’ 4
Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation Paper β’ 2410.13232 β’ Published Oct 17, 2024 β’ 44 β’ 2
Coffee-Gym: An Environment for Evaluating and Improving Natural Language Feedback on Erroneous Code Paper β’ 2409.19715 β’ Published Sep 29, 2024 β’ 10 β’ 3
Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models Paper β’ 2404.02575 β’ Published Apr 3, 2024 β’ 50 β’ 9
Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models Paper β’ 2404.02575 β’ Published Apr 3, 2024 β’ 50 β’ 9