Self-Improving Agents
updated
Flow-DPO: Improving LLM Mathematical Reasoning through Online
Multi-Agent Learning
Paper
• 2410.22304
• Published
• 18
OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World
Exploration, Feedback and Optimization
Paper
• 2410.19609
• Published
• 18
Adapting While Learning: Grounding LLMs for Scientific Problems with
Intelligent Tool Usage Adaptation
Paper
• 2411.00412
• Published
• 10
Improving Autonomous AI Agents with Reflective Tree Search and
Self-Learning
Paper
• 2410.02052
• Published
• 9
VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit
Assignment
Paper
• 2410.01679
• Published
• 27
DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning
Trajectories Search
Paper
• 2410.03864
• Published
• 12
AFlow: Automating Agentic Workflow Generation
Paper
• 2410.10762
• Published
• 1
Boundless Socratic Learning with Language Games
Paper
• 2411.16905
• Published
• 2
Enabling Scalable Oversight via Self-Evolving Critic
Paper
• 2501.05727
• Published
• 72
Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains
Paper
• 2501.05707
• Published
• 20
Agent-R: Training Language Model Agents to Reflect via Iterative
Self-Training
Paper
• 2501.11425
• Published
• 109
Towards General-Purpose Model-Free Reinforcement Learning
Paper
• 2501.16142
• Published
• 31
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs
Paper
• 2508.16153
• Published
• 160
SIM-CoT: Supervised Implicit Chain-of-Thought
Paper
• 2509.20317
• Published
• 42
TTRL: Test-Time Reinforcement Learning
Paper
• 2504.16084
• Published
• 120
Toward Training Superintelligent Software Agents through Self-Play SWE-RL
Paper
• 2512.18552
• Published
• 2