Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing Paper โข 2509.08721 โข Published Sep 10, 2025 โข 660 โข 56
Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs Paper โข 2507.07996 โข Published Jul 10, 2025 โข 34 โข 14
Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs Paper โข 2507.07996 โข Published Jul 10, 2025 โข 34 โข 14
Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs Paper โข 2507.07996 โข Published Jul 10, 2025 โข 34 โข 14
Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models Paper โข 2506.19697 โข Published Jun 24, 2025 โข 44 โข 5
Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models Paper โข 2506.19697 โข Published Jun 24, 2025 โข 44 โข 5
What Matters in Transformers? Not All Attention is Needed Paper โข 2406.15786 โข Published Jun 22, 2024 โข 31 โข 3