TongZheng1999/iter_1_reinforce_baseline_per_sample_200epoch_strong_init_step_150_ Updated 11 days ago
TongZheng1999/PF_Qwen-3-4B-Instruct-star-mixed_direct-OP-final_v2_10-2-5Rounds-iter-5 Text Generation • 196k • Updated Nov 20, 2025 • 5
TongZheng1999/PF_Qwen-3-4B-Instruct-star-mixed_direct-OP-final_v2_10-2-5Rounds-iter-4 Text Generation • 196k • Updated Nov 20, 2025 • 2
TongZheng1999/PF_Qwen-3-4B-Instruct-star-mixed_direct-OP-final_v2_10-2-5Rounds-iter-3 Text Generation • 196k • Updated Nov 20, 2025 • 1
TongZheng1999/PF_Qwen-3-4B-Instruct-star-mixed_direct-OP-final_v2_10-2-5Rounds-iter-2 Text Generation • 196k • Updated Nov 20, 2025 • 3
TongZheng1999/PF_Qwen-3-4B-Instruct-star-mixed_direct-OP-final_v2_10-2-5Rounds-iter-1 Text Generation • 196k • Updated Nov 20, 2025 • 2
TongZheng1999/FL_Qwen-3-4B-Instruct-star-mixed_direct-OP-final_v2_10-2-5Rounds-iter-5 Text Generation • 196k • Updated Nov 20, 2025 • 1
TongZheng1999/FL_Qwen-3-4B-Instruct-star-mixed_direct-OP-final_v2_10-2-5Rounds-iter-4 Text Generation • 196k • Updated Nov 20, 2025 • 1
TongZheng1999/FL_Qwen-3-4B-Instruct-star-mixed_direct-OP-final_v2_10-2-5Rounds-iter-3 Text Generation • 196k • Updated Nov 20, 2025 • 1
TongZheng1999/FL_Qwen-3-4B-Instruct-star-mixed_direct-OP-final_v2_10-2-5Rounds-iter-2 Text Generation • 196k • Updated Nov 20, 2025
TongZheng1999/FL_Qwen-3-4B-Instruct-star-mixed_direct-OP-final_v2_10-2-5Rounds-iter-1 Text Generation • 196k • Updated Nov 20, 2025 • 1
TongZheng1999/FL_Qwen-3-4B-nr-star-mixed_direct-OP-final_v2_10-2-3Rounds-iter-3 Text Generation • 196k • Updated Nov 19, 2025 • 1
TongZheng1999/FL_Qwen-3-4B-nr-star-mixed_direct-OP-final_v2_10-2-3Rounds-iter-2 Text Generation • 196k • Updated Nov 19, 2025 • 1
TongZheng1999/FL_Qwen-3-4B-nr-star-mixed_direct-OP-final_v2_10-2-3Rounds-iter-1 Text Generation • 196k • Updated Nov 19, 2025 • 1
TongZheng1999/FL_Qwen-3-4B-nr-star-mixed_direct-OP-final_v2_1-2-3Rounds-iter-2 Text Generation • 196k • Updated Nov 19, 2025 • 1