models
24
rl-rag/rar_cb_bs_16_rollout_8__1__1759453746_checkpoints_step_100
333k
•
Updated
•
5
rl-rag/qwen3-8B-sft-mix-v20250921-plus-v20251001-onpolicy-rs-longform_0921
Text Generation
•
8B
•
Updated
•
28
rl-rag/qwen3-8B-sft-mix-v20250921_long_form_only
Text Generation
•
8B
•
Updated
•
14
rl-rag/qwen3-8B-sft-mix-v20250921_05
Text Generation
•
8B
•
Updated
•
9
rl-rag/qwen3-8B-sft-mix-v20250921_short_form_only
Text Generation
•
8B
•
Updated
•
8
rl-rag/qwen3-8B-sft-mix-v20250921_005
Text Generation
•
8B
•
Updated
•
32
rl-rag/qwen3-8B-sft-mix-v20250921_02
Text Generation
•
8B
•
Updated
•
8
rl-rag/qwen3-8B-sft-mix-v20250921_01
Text Generation
•
8B
•
Updated
•
8
rl-rag/qwen3-8B-sft_0921_no_simple_short_form
Text Generation
•
8B
•
Updated
•
10
rl-rag/qwen3-8B-sft_0921_no_search_arena
Text Generation
•
8B
•
Updated
•
12
datasets
44
rl-rag/1_sample_toy_rag_survey
Viewer
•
Updated
•
8
•
6
Viewer
•
Updated
•
30
•
13
rl-rag/rl-rag-RaR-Medicine-3k-o3-mini-converted
Viewer
•
Updated
•
3k
•
12
rl-rag/dpo_lf_sft0921_rubric_citation
Viewer
•
Updated
•
1.32k
•
10
rl-rag/sft_rejection_sampled_on_policy_long-_form_sft_0921
Viewer
•
Updated
•
2.22k
•
12
rl-rag/dpo_long_form_gpt5_sft_0921
Viewer
•
Updated
•
3.37k
•
8
rl-rag/sft_0921_onpolicy_rejection_sampled
Viewer
•
Updated
•
1.9k
•
11
rl-rag/dpo_gpt5_our_sft_0921
Preview
•
Updated
•
4
rl-rag/dpo_our_sft_0921_two_iterations
Viewer
•
Updated
•
705
•
8
rl-rag/sft-mix-v20250921_long_form_only_04
Viewer
•
Updated
•
3.5k
•
2