1 275 32

jasonjiang

mikinyaa

jasonjiang8866

AI & ML interests

None yet

Recent Activity

upvoted a paper about 10 hours ago

DataFlex: A Unified Framework for Data-Centric Dynamic Training of Large Language Models

upvoted a paper 7 days ago

Rethinking Token-Level Policy Optimization for Multimodal Chain-of-Thought

upvoted a paper 8 days ago

UniGRPO: Unified Policy Optimization for Reasoning-Driven Visual Generation

View all activity

Organizations

None yet

upvoted a paper about 10 hours ago

DataFlex: A Unified Framework for Data-Centric Dynamic Training of Large Language Models

Paper • 2603.26164 • Published 8 days ago • 151

upvoted a paper 7 days ago

Rethinking Token-Level Policy Optimization for Multimodal Chain-of-Thought

Paper • 2603.22847 • Published 11 days ago • 25

upvoted a paper 8 days ago

UniGRPO: Unified Policy Optimization for Reasoning-Driven Visual Generation

Paper • 2603.23500 • Published 11 days ago • 35

upvoted 2 papers 11 days ago

On the Direction of RLVR Updates for LLM Reasoning: Identification and Exploitation

Paper • 2603.22117 • Published 12 days ago • 28

LongCat-Flash-Prover: Advancing Native Formal Reasoning via Agentic Tool-Integrated Reinforcement Learning

Paper • 2603.21065 • Published 13 days ago • 77

liked a model 17 days ago

Rakuten/RakutenAI-3.0

Text Generation • 671B • Updated 18 days ago • 11.7k • 70

upvoted a paper 20 days ago

SLA2: Sparse-Linear Attention with Learnable Routing and QAT

Paper • 2602.12675 • Published Feb 13 • 58

upvoted a paper 26 days ago

Memex(RL): Scaling Long-Horizon LLM Agents via Indexed Experience Memory

Paper • 2603.04257 • Published about 1 month ago • 19

upvoted 2 papers 27 days ago

AgentVista: Evaluating Multimodal Agents in Ultra-Challenging Realistic Visual Scenarios

Paper • 2602.23166 • Published Feb 26 • 44

Heterogeneous Agent Collaborative Reinforcement Learning

Paper • 2603.02604 • Published Mar 3 • 191

upvoted 8 papers about 1 month ago

Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models

Paper • 2602.12036 • Published Feb 12 • 93

The Devil Behind Moltbook: Anthropic Safety is Always Vanishing in Self-Evolving AI Societies

Paper • 2602.09877 • Published Feb 10 • 197

SQuTR: A Robustness Benchmark for Spoken Query to Text Retrieval under Acoustic Noise

Paper • 2602.12783 • Published Feb 13 • 170

DeepImageSearch: Benchmarking Multimodal Agents for Context-Aware Image Retrieval in Visual Histories

Paper • 2602.10809 • Published Feb 11 • 59

upvoted 2 papers about 2 months ago

CAR-bench: Evaluating the Consistency and Limit-Awareness of LLM Agents under Real-World Uncertainty

Paper • 2601.22027 • Published Jan 29 • 85

OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration

Paper • 2602.05400 • Published Feb 5 • 349

jasonjiang

AI & ML interests

Recent Activity

Organizations

mikinyaa's activity