Accelerating Nash Learning from Human Feedback via Mirror Prox Paper • 2505.19731 • Published May 26, 2025 • 6 • 2
On Teacher Hacking in Language Model Distillation Paper • 2502.02671 • Published Feb 4, 2025 • 18 • 2