-
DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization
Paper • 2508.14460 • Published • 86 -
MEML-GRPO: Heterogeneous Multi-Expert Mutual Learning for RLVR Advancement
Paper • 2508.09670 • Published -
URPO: A Unified Reward & Policy Optimization Framework for Large Language Models
Paper • 2507.17515 • Published • 2
Emmanuel Sugutt
Sugutt
AI & ML interests
Reinforcement learning
Transformer models
Recent Activity
liked a model 3 days ago
LiquidAI/LFM2.5-8B-A1B updated a model 5 months ago
Sugutt/whisper-kalenjin-small-revised published a model 6 months ago
Sugutt/whisper-kalenjin-small-revised