Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
amar-bach 's Collections
Pretrain
Agents
RL-reasoning
VLM
VLA

RL-reasoning

updated 14 days ago
Upvote
-

  • The Art of Efficient Reasoning: Data, Reward, and Optimization

    Paper • 2602.20945 • Published about 1 month ago • 7

  • RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback

    Paper • 2309.00267 • Published Sep 1, 2023 • 53

  • Efficient Reinforcement Learning with Semantic and Token Entropy for LLM Reasoning

    Paper • 2512.04359 • Published Dec 4, 2025

  • How Far Can Unsupervised RLVR Scale LLM Training?

    Paper • 2603.08660 • Published 18 days ago • 57

  • In-Context Reinforcement Learning for Tool Use in Large Language Models

    Paper • 2603.08068 • Published 18 days ago • 41
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs