Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction Paper • 2512.04987 • Published about 1 month ago • 76
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm Paper • 2511.04570 • Published Nov 6, 2025 • 211
BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping Paper • 2510.18927 • Published Oct 21, 2025 • 83
Intern-S1: A Scientific Multimodal Foundation Model Paper • 2508.15763 • Published Aug 21, 2025 • 259
Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination Paper • 2507.10532 • Published Jul 14, 2025 • 89
Pre-Trained Policy Discriminators are General Reward Models Paper • 2507.05197 • Published Jul 7, 2025 • 39
Beyond Homogeneous Attention: Memory-Efficient LLMs via Fourier-Approximated KV Cache Paper • 2506.11886 • Published Jun 13, 2025 • 20
LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs Paper • 2506.14429 • Published Jun 17, 2025 • 44
100 Days After DeepSeek-R1: A Survey on Replication Studies and More Directions for Reasoning Language Models Paper • 2505.00551 • Published May 1, 2025 • 36
CritiQ: Mining Data Quality Criteria from Human Preferences Paper • 2502.19279 • Published Feb 26, 2025 • 10