FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization
Paper • 2603.19835 • Published • 313
None defined yet.
RealChart2Code: Advancing Chart-to-Code Generation with Real Data and Multi-Task Evaluation
Sparse but Critical: A Token-Level Analysis of Distributional Shifts in RLVR Fine-Tuning of LLMs