Group Sequence Policy Optimization
Paper
• 2507.18071
• Published
• 318
LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy
Optimization
Paper
• 2507.15758
• Published
• 35
Hierarchical Budget Policy Optimization for Adaptive Reasoning
Paper
• 2507.15844
• Published
• 17
Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking
Reasoning
Paper
• 2507.16814
• Published
• 21
RePO: Replay-Enhanced Policy Optimization
Paper
• 2506.09340
• Published
Perception-Aware Policy Optimization for Multimodal Reasoning
Paper
• 2507.06448
• Published
• 48
On-Policy RL with Optimal Reward Baseline
Paper
• 2505.23585
• Published
• 14
EXPO: Stable Reinforcement Learning with Expressive Policies
Paper
• 2507.07986
• Published
Geometric-Mean Policy Optimization
Paper
• 2507.20673
• Published
• 32
Single-stream Policy Optimization
Paper
• 2509.13232
• Published
• 34
MAPO: Mixed Advantage Policy Optimization
Paper
• 2509.18849
• Published
• 27