floyed shen's picture

floyed shen

floyed

·

AI & ML interests

None yet

Recent Activity

upvoted a paper 2 days ago

From Generic Correlation to Input-Specific Credit in On-Policy Self Distillation

upvoted a paper 2 days ago

Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information

upvoted a paper 18 days ago

Safety Instincts: LLMs Learn to Trust Their Internal Compass for Self-Defense

View all activity

Organizations

commented a paper 2 months ago

VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training

Paper • 2602.10693 • Published Feb 11 • 220 •

commented a paper 3 months ago

VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training

Paper • 2602.10693 • Published Feb 11 • 220 •

New activity in Beijing-AISI/panda-bench 12 months ago

Upload benchmarks.zip

#2 opened 12 months ago by