Self-Improving Agents - a alexngai Collection

alexngai 's Collections

Latent Reasoning

Autonomous Research

Memory/Search/Retrieval/RAG

Automated Research

Test-Time Compute/Optimal Scaling

Self-Improving Agents

Codegen Benchmarks

Self-Improving Agents

updated Dec 25, 2025

Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning

Paper • 2410.22304 • Published Oct 29, 2024 • 18
OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization

Paper • 2410.19609 • Published Oct 25, 2024 • 18
Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation

Paper • 2411.00412 • Published Nov 1, 2024 • 10
Improving Autonomous AI Agents with Reflective Tree Search and Self-Learning

Paper • 2410.02052 • Published Oct 2, 2024 • 9
VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment

Paper • 2410.01679 • Published Oct 2, 2024 • 27
DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories Search

Paper • 2410.03864 • Published Oct 4, 2024 • 12
AFlow: Automating Agentic Workflow Generation

Paper • 2410.10762 • Published Oct 14, 2024 • 1
Boundless Socratic Learning with Language Games

Paper • 2411.16905 • Published Nov 25, 2024 • 2
Enabling Scalable Oversight via Self-Evolving Critic

Paper • 2501.05727 • Published Jan 10, 2025 • 72
Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains

Paper • 2501.05707 • Published Jan 10, 2025 • 20
Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training

Paper • 2501.11425 • Published Jan 20, 2025 • 109
Towards General-Purpose Model-Free Reinforcement Learning

Paper • 2501.16142 • Published Jan 27, 2025 • 31
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

Paper • 2508.16153 • Published Aug 22, 2025 • 160
SIM-CoT: Supervised Implicit Chain-of-Thought

Paper • 2509.20317 • Published Sep 24, 2025 • 42
TTRL: Test-Time Reinforcement Learning

Paper • 2504.16084 • Published Apr 22, 2025 • 120
Toward Training Superintelligent Software Agents through Self-Play SWE-RL

Paper • 2512.18552 • Published Dec 21, 2025 • 2